Possible ways to improve annotated gene numbers

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Possible ways to improve annotated gene numbers

Qihua Liang
Dear Maker Development Team,

Hi, I am using Maker for annotation and BUSCO to evaluate the completeness.

For de novo perditions, I am using Augustus, GeneMark, and SNAP, and the annotated proteins have completeness of ~80%, ~50%, ~50% correspondingly. When I cat all de novo annotated proteins of these three tools, the completeness is much higher as ~92%.

But for all.maker.proteins.fasta, the completeness is only ~80%.

1. Does this mean that some proteins annotated by Augustus/GeneMark/SNAP, are not included in the file all.maker.proteins.fasta? Does it because such excluded proteins do not have hits with the EST evidences?

2. To achieve a higher BUSCO completeness, what possible ways can be used? Including more EST evidences from other species?


Thank you
Qihua
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Possible ways to improve annotated gene numbers

Carson Holt-2
MAKER excludes models without evidence support (this is because gene predictors can overcall by as much as a factor of 10, i.e. lots of false positives). So you may be lacking in either protein or transcript evidence (you should alway supply a minimum of 2 related proteomes for any MAKER analysis - transcript evidence by itself is insufficient).

You can also try and rescue models based on protein domain content using iprscan. Details in this protocol paper —> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286374/

—Carson
 



On Jun 30, 2017, at 1:30 PM, Qihua Liang <[hidden email]> wrote:

Dear Maker Development Team,

Hi, I am using Maker for annotation and BUSCO to evaluate the completeness.

For de novo perditions, I am using Augustus, GeneMark, and SNAP, and the annotated proteins have completeness of ~80%, ~50%, ~50% correspondingly. When I cat all de novo annotated proteins of these three tools, the completeness is much higher as ~92%.

But for all.maker.proteins.fasta, the completeness is only ~80%.

1. Does this mean that some proteins annotated by Augustus/GeneMark/SNAP, are not included in the file all.maker.proteins.fasta? Does it because such excluded proteins do not have hits with the EST evidences?

2. To achieve a higher BUSCO completeness, what possible ways can be used? Including more EST evidences from other species?


Thank you
Qihua
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Loading...