promiscuous mapping during annotation liftover

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

promiscuous mapping during annotation liftover

Maxwell C Coyle
Hello!

I’m using Maker to map forward annotations to a new genome assembly in a species of choanoflagellate. I followed the protocol from Campbell 2014, creating a transcript FASTA file from my old GFF3 file and using this as the only EST evidence. I set est2genome=1 and est_forward=1. The Maker ran great, except it seems that many of the old transcripts are mapping many places in the new assembly, up to 100+ times, so that my gene count has inflated from 11,624 to 25,905. Of the most highly multiply mapped genes, many but not all are rRNA genes.

I was wondering if there is a setting or tweak I can make so that each transcript maps uniquely to its best location in the new genome? Or maybe tweaking the exonerate stringency? This is not the final annotation step, as I will also be using RNA-seq evidence to improve my gene models after liftover.

Thanks for your help!

Best,
Max

Max Coyle
PhD Candidate, King Lab
Dept of Molecular and Cellular Biology, UC Berkeley




_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: promiscuous mapping during annotation liftover

Carson Holt-2
Mapping genes forward onto a new assembly is an iterative process.

If a previous model maps to multiple locations that is an indication that you accidentally annotated a repeat in the previous annotation set (i.e. transposes, etc). This is especially true if you get 100 copies. You should simply remove that gene. If it’s just 2 or 3 copies you can use the score column (see GFF3 format specification) which indicates the best match to the original copy to keep one or the other (value is 0-100% recovery when est_forward=1 is set). You can compare neighboring genes between assemblies to see if one copy is simply a paralog. If there are a lot of genes with 2 copies, you probably have high heterozygosity in the genome so maternal and paternal chromosomes are assembling independently (i.e. this means that both copies are real and are the exact same gene).

If you decide a gene belongs on a specific contig after reviewing neighboring models and score, you can anchor it to a contig or region by adding maker_coor=  to the fasta header (example: >transcriptA maker_coor=contig1:10000-11000; ).  Also consider removing models that have low scores. They may map to only one location, but if the % recovery is low, it may be best to just not try and recover the old model (a new gene prediction may prove much more accurate).

—Carson





On Jun 2, 2020, at 11:27 PM, Maxwell C Coyle <[hidden email]> wrote:

Hello!

I’m using Maker to map forward annotations to a new genome assembly in a species of choanoflagellate. I followed the protocol from Campbell 2014, creating a transcript FASTA file from my old GFF3 file and using this as the only EST evidence. I set est2genome=1 and est_forward=1. The Maker ran great, except it seems that many of the old transcripts are mapping many places in the new assembly, up to 100+ times, so that my gene count has inflated from 11,624 to 25,905. Of the most highly multiply mapped genes, many but not all are rRNA genes.

I was wondering if there is a setting or tweak I can make so that each transcript maps uniquely to its best location in the new genome? Or maybe tweaking the exonerate stringency? This is not the final annotation step, as I will also be using RNA-seq evidence to improve my gene models after liftover.

Thanks for your help!

Best,
Max

Max Coyle
PhD Candidate, King Lab
Dept of Molecular and Cellular Biology, UC Berkeley



_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org