Annotation of a new variant within a species

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Annotation of a new variant within a species

Lior Glick
Hello,

I am trying to annotate multiple  variants of tomato. While a good annotation of the reference genome is available, I have denovo-assembled other variants of the same species and wish to annotate them.
Most MAKER documentation refers to annotation of a new species, while using transcripts and proteins from either the exact same sample (individual) or from "an alternate organism", so I'm not sure what to do in this case, where I am annotating various samples from the same species. I have two questions:

1. Regarding transcripts data, how should I use transcripts from other variants of the same species? Namely, should I use the est or the altest parameter? What is the actual difference in behavior?

2. Is there a way to incorporate gene models (in gff format) from the reference annotation? I expect high similarity in my assembled variants, but not identity in terms of content and coordinates, so neither pred_gff nor model_gff sound like what I need, as far as I understand.
I could also use the reference annotation and sequence to extract cDNA and provide them as EST data. Is this the way to go? It feels like some information on introns might be lost this way.

Would highly appreciate your answers to these questions or any other advice.

Thank you very much!

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Annotation of a new variant within a species

Carson Holt-2
> 1. Regarding transcripts data, how should I use transcripts from other variants of the same species? Namely, should I use the est or the altest parameter? What is the actual difference in behavior?

Use est=. The alt_est option is for distant relationships (so distant that nucleotides won’t match but amino acids still do). It translates all transcripts to amino acids in six reading frames before alignment (very expensive computationally and more prone to spurious alignment). So different stains will still match in nucleotide space.


> 2. Is there a way to incorporate gene models (in gff format) from the reference annotation? I expect high similarity in my assembled variants, but not identity in terms of content and coordinates, so neither pred_gff nor model_gff sound like what I need, as far as I understand.

model_gff is what you want to always keep a model, and pred_gff is what you want to only keep models supported by evidence. But reguardless of which you choose, the GFF3 must be in the same coordinate space as what you are annotating. So you will have to lift over genes onto the new assembly and make a new GFF3. You can do that with a separate MAKER run where you provide the gene models to est= as fasta files, use est2genome=1, and add this option est_forward=1 (won’t already be there). It’s not perfect but it will produce a GFF3 with gene models based entirely on alignment of the old models. You can then give that GFF3 to model_gff or pred_gff for future runs.

—Carson



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org