augustus exon calling ~

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

augustus exon calling ~

Salim Bougouffa
Hi Maker folks,

I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are:

1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01)
2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02)

info about the runs:
1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating
2/ umask=1 (seems to do better than umask=0; is this a good thing to do)
3/ evm = 1 (seems to perform better than emv=0)
4/ repeatmasking (denovo + repbase)

Best,
/SB


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

artemis01.png (222K) Download Attachment
artemis02.png (129K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: augustus exon calling ~

Salim Bougouffa
Hi,

I should have mentioned a third scenario where an exon is not called fully by maker despite augustus getting it right (figure artemis03)

artemis03.png


On Sun, 21 May 2017 at 10:48 Salim Bougouffa <[hidden email]> wrote:
Hi Maker folks,

I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are:

1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01)
2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02)

info about the runs:
1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating
2/ umask=1 (seems to do better than umask=0; is this a good thing to do)
3/ evm = 1 (seems to perform better than emv=0)
4/ repeatmasking (denovo + repbase)

Best,
/SB


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: augustus exon calling ~

Carson Holt-2
In reply to this post by Salim Bougouffa
EVM works extremely well when evidence closely matches the predictions and there are no assembly anomalies affecting ORF. Otherwise, EVM performs very very poorly. Also I would not set unmask=1. It adds noise to the calls.

Note in all cases given, gene models are from Augustus (MAKER doesn’t make predictions). MAKER just provides hints that Augustus can use for the second call set. Hints boost the score a model gets whenever a feature matches the hint. What you see as an Augustus match/match_part feature are just references of what Augustus calls without hints.

So if I tell Augustus there is probably an exon/intron at location X, then any model that includes that exon/intron will bump up its score thus causing Augustus to keep models that match the hints and report those over models that don’t match. However if there is an issue with the evidence (i.e. merge mRNA-seq assembly), or an issue with the assembly (base change generates an early stop codon or causes a frameshift), then Augustus may choose to truncate or skip an exon in order to capture the bonus from downstream hints. So it is unlikely that there is a workable model that capture the exact intron exon structure because it breaks the ORF at some point. So Augustus instead produces the best model it can to capture as many hint bonuses as it can.

That being said, look for any odd hint sources like very poor protein or transcript evidence alignments. Eliminating bad hints will improve performance (if using mRNA-seq assemblies Trinity has a jaccard_clip option which helps avoid false merging of transcript evidence for example). Or if an organism you used for protein evidence constantly produces bad protein alignments, then you may want to drop it completely from evidence.

Finally training Augustus on the genome being annotated will help improve performance (note just because a species is closely related in evolutionary space does not mean that its HMM's will perform well; it’s a common fallacy about ab initio prediction discussed in the SNAP paper). Also try adding another gene predictor like SNAP to see if it hurts or helps.

—Carson





On May 21, 2017, at 1:48 AM, Salim Bougouffa <[hidden email]> wrote:

Hi Maker folks,

I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are:

1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01)
2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02)

info about the runs:
1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating
2/ umask=1 (seems to do better than umask=0; is this a good thing to do)
3/ evm = 1 (seems to perform better than emv=0)
4/ repeatmasking (denovo + repbase)

Best,
/SB

<artemis01.png><artemis02.png>_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Loading...