evidence-only gene annotation

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

evidence-only gene annotation

nellerk
Hello,

I am using Maker to annotate a novel, non-model plant genome. 

Following the published protocol, I have run one evidence-only round (est2genome, prot2genome = 1) followed by two iterative rounds, re-training Snap and Augustus each time. 

I have a curious result in that the gene predictors do not seem to be finding many genes, but instead creating gene fusions. As such, my evidence-only round resulted in 29,773 genes (mean length=5071 bp), and my final round yielded 29,845 genes (mean length=6530 bp). If I am interpreting this correctly, the predictors found only 72 new genes but greatly increased the mean length of all genes. I have inspected the results visually in a genome viewer and it seems that the predictors often create fusions with nearby pseudogenes. I attempted to reduce this by changing pred_flank from 200 (default) to 100, but it didn't seem to make a difference (at least for the genes I was looking at). 

So although my final Maker round looks good (~30,000 genes, 95% of genes have AED < 0.5), I have greater confidence in the models created by the evidence-only round. 

I have two questions:
1) In this case, would it be acceptable to use evidence-only gene models (from Round 1), rather than those from Round 3 (which incorporated trained gene predictors)? I ask because I haven't seen reports of Maker being used in this way.
2) Do you have any suggestions to improve my ab initio training or prediction? Please note, I have already repeat-masked the genome with a species-specific repeat library.

Thank you for any assistance!

Kira

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: evidence-only gene annotation

Carson Holt-2
Fusions are generated by the evidence alignments. Either transcript assemblies wee falsely fused or proteins are bridging neighboring paralogs. For transcript data you can try building the assembly with Trinity and the jaccard_index option which will reduce the occurrence of transcript assembly fusion. Also set correct_est_fusion=1 in the options files.

For protein evidence driven fusions, you can try DeFusion which is a post process you run on the MAKER output that will search and attempt top correct for paralog driven fusions.

—Carson


On Apr 12, 2018, at 11:12 AM, [hidden email] wrote:

Hello,

I am using Maker to annotate a novel, non-model plant genome. 

Following the published protocol, I have run one evidence-only round (est2genome, prot2genome = 1) followed by two iterative rounds, re-training Snap and Augustus each time. 

I have a curious result in that the gene predictors do not seem to be finding many genes, but instead creating gene fusions. As such, my evidence-only round resulted in 29,773 genes (mean length=5071 bp), and my final round yielded 29,845 genes (mean length=6530 bp). If I am interpreting this correctly, the predictors found only 72 new genes but greatly increased the mean length of all genes. I have inspected the results visually in a genome viewer and it seems that the predictors often create fusions with nearby pseudogenes. I attempted to reduce this by changing pred_flank from 200 (default) to 100, but it didn't seem to make a difference (at least for the genes I was looking at). 

So although my final Maker round looks good (~30,000 genes, 95% of genes have AED < 0.5), I have greater confidence in the models created by the evidence-only round. 

I have two questions:
1) In this case, would it be acceptable to use evidence-only gene models (from Round 1), rather than those from Round 3 (which incorporated trained gene predictors)? I ask because I haven't seen reports of Maker being used in this way.
2) Do you have any suggestions to improve my ab initio training or prediction? Please note, I have already repeat-masked the genome with a species-specific repeat library.

Thank you for any assistance!

Kira
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org