large UTR overhang

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

large UTR overhang

Timo Metz
Hey guys,

Attached there is a picture of a recent MAKER run where I used pacbio reads and trinity assembled short reads plus proteins from Swissprot. I don't really get why MAKER assigns such a large fraction to be a UTR. From the name I can tell that the final gene model stems from a snap ab-initio prediction but even that gene prediction is not 100% identical to the part of the gene model which is not UTR.

Is there any setting in MAKER in which I can somehow have an influence on how MAKER assigns such UTRs? I already tried out the "correct_est_fusion" option but it did not help.

best

Timo

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

UTRoverhang.png (109K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: large UTR overhang

Carson Holt-2
MAKER 3 does not have any additional requirement for transcript support that MAKER 2 does not have. However, if you are using the correct_est_fusion=1 option, it will only use the polished protein evidence rather than the unpolished blastx alignments which is probably what you are seeing.

The model you show also likely corresponds to either a paralogous duplication or a broken ORF due to assembly error. You can see clearly that both SNAP and Augustus want to break the region into two separate models (they can’t find a single workable ORF). The raw BLASTX alignments and transcription data want to merge the region (I don’t see any support for merging from polished protein2genome alignments though - maybe you just cut that off in the image?). So when the predictors are fed hints suggesting the longer model, they build the best model they can, but the ORF is broken, so remaining exons will match the transcript evidence exactly, but have to be UTR given the broken ORF.  This means you are either merging things that shouldn’t be merged (based on bad evidence alignments) or the assembly has an error that keeps the ORF from functioning in that region as it should. The overall structure is still captured, but the translation is truncated.

Here is a secondary tool you can try called DeFusion that may help if you are getting false merges because of the evidence —> https://wjidea.github.io/defusion/

—Carson



On May 8, 2018, at 6:10 AM, Timo Metz <[hidden email]> wrote:

Hey guys,

Attached there is a picture of a recent MAKER run where I used pacbio reads and trinity assembled short reads plus proteins from Swissprot. I don't really get why MAKER assigns such a large fraction to be a UTR. From the name I can tell that the final gene model stems from a snap ab-initio prediction but even that gene prediction is not 100% identical to the part of the gene model which is not UTR.

Is there any setting in MAKER in which I can somehow have an influence on how MAKER assigns such UTRs? I already tried out the "correct_est_fusion" option but it did not help.

best

Timo
<UTRoverhang.png>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org