How to evaluate maker proteins' quality?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to evaluate maker proteins' quality?

dcg@cau.edu.cn
Dear sir:
    After I finished my maker running, I should check the quality of my results.
    My annotation purpose is to find some new proteins.
    There is about 30K reviewed proteins of my species. If I want to see how many predicted proteins can support the reviewed proteins, how to do it?(Can blastp be OK? How to set the threshold? )
    I used Uniprot, ESTs and RNA-seq to do my annotation. From my perspective, if the protein is reviewed and used to train snap/augustus, we should get the same one after several training rounds. So I planned to align maker_proteins to Uniprot proteins(which I utilized to annotate). If the predicted proteins match Uniprot by 100% identity and coverage, they can be thought to support the reviewed proteins. Is it correct?    
    
    If not, maybe I can evaluate my proteins only by AED value and proteome domain?

    I'm looking forward to your help. Thanks a lot!

Yours sincerely!

Chao Chao


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: How to evaluate maker proteins' quality?

Carson Holt-2
Because of small differences in the assemblies, individual variants, annotated proteins used as reference being partial, as well as potential assembly error, a 100% identity expectation is too high. About 90+% would be more reasonable for a same species comparison. AED gives a good correlation with protein confidence. A perfect zero score will not happen often though since the way alignment algorithms work will leave alignment errors around splice sites and short exons. Also the evidence used is never perfect, so with AED lower values are better than higher values but can not be used as an overly specific measurement (it is only correlative and not exact).

—Carson


On May 5, 2017, at 7:43 AM, [hidden email] wrote:

Dear sir:
    After I finished my maker running, I should check the quality of my results.
    My annotation purpose is to find some new proteins.
    There is about 30K reviewed proteins of my species. If I want to see how many predicted proteins can support the reviewed proteins, how to do it?(Can blastp be OK? How to set the threshold? )
    I used Uniprot, ESTs and RNA-seq to do my annotation. From my perspective, if the protein is reviewed and used to train snap/augustus, we should get the same one after several training rounds. So I planned to align maker_proteins to Uniprot proteins(which I utilized to annotate). If the predicted proteins match Uniprot by 100% identity and coverage, they can be thought to support the reviewed proteins. Is it correct?    
    
    If not, maybe I can evaluate my proteins only by AED value and proteome domain?

    I'm looking forward to your help. Thanks a lot!

Yours sincerely!

Chao Chao

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org