Filtering gene models based on eAED scores

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Filtering gene models based on eAED scores

Federico López
Hello,

I'm using MAKER's "quality_filter.pl" with the default option (AED<1). However, I have noticed cases in which models have low AED scores and high eAED scores (1.00), so presumably the good AED scores are the result of spurious evidence alignments. Is there a way to filter models based on eAED scores too?

Thank you.

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Filtering gene models based on eAED scores

Carson Holt-2
The eAED score also take protein reading frame into account and it can infers support for exons when both introns are validated (i.e. can be lower than AED in some cases). For your case where eAED is 1 but AED less than 1 means that you evidence support is from an overlapping protein, but it is never in the same reading frame as the gene model. So the positive evidence support may be suspect, or it may be real and the model is poor because of the assembly, gaps, etc. To use eAED instead in the quality_filter.pl script, you would have to to manually edit the script and replace ‘_AED' with ‘_eAED’. Using eAED instead will greatly drop sensitivity on lower quality assemblies (places where the predictors make the best model they can and not the correct model because the assembly won’t allow for the correct model but there is evidence that there is a gene locus). So make sure to always view suspect regions in browser first.

—Carson



On Jun 9, 2018, at 2:06 PM, Federico López <[hidden email]> wrote:

Hello,

I'm using MAKER's "quality_filter.pl" with the default option (AED<1). However, I have noticed cases in which models have low AED scores and high eAED scores (1.00), so presumably the good AED scores are the result of spurious evidence alignments. Is there a way to filter models based on eAED scores too?

Thank you.
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Filtering gene models based on eAED scores

Surya Saha
Hi Carson,

We have been using AED as a primary metric for evaluating predictions in our group but it sounds like we should be using both eAED and AED. Is there a detailed explanation of how exactly eAED and AED are computed besides Table 2 in the Cantarel 2008 paper? Thanks

-Surya

On Wed, Jun 13, 2018 at 2:03 PM Carson Holt <[hidden email]> wrote:
The eAED score also take protein reading frame into account and it can infers support for exons when both introns are validated (i.e. can be lower than AED in some cases). For your case where eAED is 1 but AED less than 1 means that you evidence support is from an overlapping protein, but it is never in the same reading frame as the gene model. So the positive evidence support may be suspect, or it may be real and the model is poor because of the assembly, gaps, etc. To use eAED instead in the quality_filter.pl script, you would have to to manually edit the script and replace ‘_AED' with ‘_eAED’. Using eAED instead will greatly drop sensitivity on lower quality assemblies (places where the predictors make the best model they can and not the correct model because the assembly won’t allow for the correct model but there is evidence that there is a gene locus). So make sure to always view suspect regions in browser first.

—Carson



On Jun 9, 2018, at 2:06 PM, Federico López <[hidden email]> wrote:

Hello,

I'm using MAKER's "quality_filter.pl" with the default option (AED<1). However, I have noticed cases in which models have low AED scores and high eAED scores (1.00), so presumably the good AED scores are the result of spurious evidence alignments. Is there a way to filter models based on eAED scores too?

Thank you.
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--

Surya Saha
Sol Genomics Network

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Filtering gene models based on eAED scores

Carson Holt-2
AED is documented in the 2011 MAKER2 paper, but eAED (extended AED) is not currently documented in a publication and is not used by any of the scripts that come with MAKER (it’s just there for reference right now). Basically AED is calculated with evidence overlap, but eAED will not count protein overlap unless it occurs in the same codon reading frame as the model (so evidence may count for a stretch, then stop counting for a few codons, then count again if there is an insertion in the alignment). Also eAED will infer support for exons if both introns are validated by evidence and the region in between is all ORF (this allows joint intron support to infer support for an internal exon). 99% of the time AED and eAED are the same, but eAED can be useful in identifying edge cases. Much of the time if AED and eAED are very different, it’s because there is a single base pair insertion or deletion in the assembly. The predictors still find the locus the best they can, but protein evidence and alignments will be out of sync with the reading frame on one of the exons. BLAST can’t really handle single bp INDELs in it’s alignments, but Exonerate can do mid alignment reading frame shifts to capture the assembly INDEL (and eAED is an attempt to use the extra Exonerate info in the score).

—Carson


On Jun 13, 2018, at 1:34 PM, Surya Saha <[hidden email]> wrote:

Hi Carson,

We have been using AED as a primary metric for evaluating predictions in our group but it sounds like we should be using both eAED and AED. Is there a detailed explanation of how exactly eAED and AED are computed besides Table 2 in the Cantarel 2008 paper? Thanks

-Surya

On Wed, Jun 13, 2018 at 2:03 PM Carson Holt <[hidden email]> wrote:
The eAED score also take protein reading frame into account and it can infers support for exons when both introns are validated (i.e. can be lower than AED in some cases). For your case where eAED is 1 but AED less than 1 means that you evidence support is from an overlapping protein, but it is never in the same reading frame as the gene model. So the positive evidence support may be suspect, or it may be real and the model is poor because of the assembly, gaps, etc. To use eAED instead in the quality_filter.pl script, you would have to to manually edit the script and replace ‘_AED' with ‘_eAED’. Using eAED instead will greatly drop sensitivity on lower quality assemblies (places where the predictors make the best model they can and not the correct model because the assembly won’t allow for the correct model but there is evidence that there is a gene locus). So make sure to always view suspect regions in browser first.

—Carson



On Jun 9, 2018, at 2:06 PM, Federico López <[hidden email]> wrote:

Hello,

I'm using MAKER's "quality_filter.pl" with the default option (AED<1). However, I have noticed cases in which models have low AED scores and high eAED scores (1.00), so presumably the good AED scores are the result of spurious evidence alignments. Is there a way to filter models based on eAED scores too?

Thank you.
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--

Surya Saha
Sol Genomics Network


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org