Maker annotation AED scores are around 0.5

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Maker annotation AED scores are around 0.5

Wei Zhao

Dear maker team,

 

I am writing to ask for your help.

 

I am using make to annotate a big genome ~9 Gbp, I have 3 evidences: 1)  Transcriptome of this species; 2) protein sequence from relative species; 3) Augustus model trained from pasa.

 

When I use all of these 3 evidences to annotate the genome (basic pipeline), the distribution of AED score is weird (single peak around 0.5).

 

I have also tried to update the gene model I got from pasa  using maker, the distribution of AED scores is the same.

 

But when I try to only use  EST or protein as evidence (est2genome or protein2genome), the AED scores is normal (close to 0).

 

To my understand, it seems all the 3 evidences are conflict with each other, results in  the AED scores is higher  (~ 0.5) than expected,  could you give me some suggestion on how to fix this problem?

 

Best regards,

 

Wei

 

 

 


_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Optimal strategy and options for iterative maker2 runs

natassa
Hello maker community,
I am annotating  with maker2 a fungal genome for which I have transcript evidence, plus transcripts and proteins from closely related species, a genemark .mod file from self-training I have run outside of maker, and an augustus model from a closely related species. I plan to run it iteratively, updating snap (and maybe augustus) models each time. Reading several iterative-maker pipelines online, I am a bit confused on the optimal strategy, and some details on the options used in consecutive runs. Some questions:

1) How will MAKER behave in the case where I would supply my different lines of evidence (EST+protein) along with trained abinitio models in the same run? Here is -what seems to me conflicting- info from posts I read (not in this list): "if est2genome and protein2genome are set to 1 +  abinitio tools are also on,  the abinitio tools will not use the EST-protein evidence to improve their gene models." but: "In case you activated SNAP and Augustus and you have fed MAKER with lines of evidence (Transcripts and proteins), it will predict gene models using Augustus-Evidence-driven and SNAP-Evidence-driven. In loci where both are present, it will chose the best one according to the lines of evidence (EST / protein when they are present)." Which one is correct?
2) I see in  a few tutorials that genemark is trained at a 3rd/4th run and separately from other abinitio programs. I don't understand why, since genemark is self-trained on the genome, so it doesnot really interact with training from evidence or maker gff files?
3) Can I pass >1 abinitio models from one run to the next using the pred_gff option? For example  augustus+genemark hmms, separated by ","? In a 2017 post, Carson writes "I would avoid passing in Augustus results as GFF3, it removes the ability of MAKER to dynamically provide Augustus with hints as it runs". What is the correct way then?

Any input from experienced maker users is welcome!
Thank you in advance,
Anastasia Gioti

Anastasia Gioti
Researcher
IMBBC-HCMR Crete, Greece
https://scholar.google.com/citations?user=eMsnakoAAAAJ&hl=en


_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Maker annotation AED scores are around 0.5

Carson Holt-2
In reply to this post by Wei Zhao
Probably this —>

<a href="https://groups.google.com/forum/#!searchin/maker-devel/Curious$20pattern$20in$20AED$20distributions|sort:date/maker-devel/QS3VnxhvEks/q3lPmywjBQAJ" class="">https://groups.google.com/forum/#!searchin/maker-devel/Curious$20pattern$20in$20AED$20distributions%7Csort:date/maker-devel/QS3VnxhvEks/q3lPmywjBQAJ


Likely caused by an over abundance of single-exon models and under masking of repeats in the genome.

—Carson



On Mar 30, 2020, at 3:37 AM, Wei Zhao <[hidden email]> wrote:

Dear maker team,
 
I am writing to ask for your help.
 
I am using make to annotate a big genome ~9 Gbp, I have 3 evidences: 1)  Transcriptome of this species; 2) protein sequence from relative species; 3) Augustus model trained from pasa.
 
When I use all of these 3 evidences to annotate the genome (basic pipeline), the distribution of AED score is weird (single peak around 0.5).
 
I have also tried to update the gene model I got from pasa  using maker, the distribution of AED scores is the same.
 
But when I try to only use  EST or protein as evidence (est2genome or protein2genome), the AED scores is normal (close to 0).
 
To my understand, it seems all the 3 evidences are conflict with each other, results in  the AED scores is higher  (~ 0.5) than expected,  could you give me some suggestion on how to fix this problem?
 
Best regards,
 
Wei
 
 
<E6F3EF742C40408F8390EE9A1FF29894.png>
 
_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org