choosing the right gene model

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

choosing the right gene model

Xabier Vázquez Campos
Hi there,

I was visualising the annotations and I realised that in some cases, what it seems to be a gene is splitted according to one of the gene models, despite that the other 2, est2genome and prot2genome suggest that it isn't the case.



Although the opposite also happens.


For some reason, the "out of place" model is always (or almost) the one from Genemark.

How much weight does carry the RNAseq and protein data on this decision (if any)?
How exactly is the final gene selected?

Cheers,
Xabi

--
Xabier Vázquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: choosing the right gene model

Carson Holt-2
Both transcript and protein evidence will go into the AED calculation for overlap support. So in both cases the chosen model had better overlap (protein evidence will not count toward the eAED overlap calculation if it is out of frame with the model it is supposed to be supporting). The larger merged model generates a clutering affect on it’s evidence, so it’s evidence set for AED calculation is slightly different than the SNAP and Augustus model would generate. In both cases, I think GeneMark is hurting more than it is helping. You may want to just drop it from the analysis (unless it’s a fungi, I often find GeneMark can have that affect).

—Carson


On Oct 12, 2017, at 12:09 AM, Xabier Vázquez-Campos <[hidden email]> wrote:

Hi there,

I was visualising the annotations and I realised that in some cases, what it seems to be a gene is splitted according to one of the gene models, despite that the other 2, est2genome and prot2genome suggest that it isn't the case.

<split-gene.png>

Although the opposite also happens.

<merged-gene.png>
For some reason, the "out of place" model is always (or almost) the one from Genemark.

How much weight does carry the RNAseq and protein data on this decision (if any)?
How exactly is the final gene selected?

Cheers,
Xabi

--
Xabier Vázquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: choosing the right gene model

Xabier Vázquez Campos
Actually, it's a fungal genome. Although not very typical, almost half of it are repeats. Worth mention that Genemark generates a lot of predictions that overlap LTRs and other complex repeats, something that neither SNAP or Augustus do. Have you seen this before? 

On 14 Oct. 2017 02:56, "Carson Holt" <[hidden email]> wrote:
Both transcript and protein evidence will go into the AED calculation for overlap support. So in both cases the chosen model had better overlap (protein evidence will not count toward the eAED overlap calculation if it is out of frame with the model it is supposed to be supporting). The larger merged model generates a clutering affect on it’s evidence, so it’s evidence set for AED calculation is slightly different than the SNAP and Augustus model would generate. In both cases, I think GeneMark is hurting more than it is helping. You may want to just drop it from the analysis (unless it’s a fungi, I often find GeneMark can have that affect).

—Carson


On Oct 12, 2017, at 12:09 AM, Xabier Vázquez-Campos <[hidden email]> wrote:

Hi there,

I was visualising the annotations and I realised that in some cases, what it seems to be a gene is splitted according to one of the gene models, despite that the other 2, est2genome and prot2genome suggest that it isn't the case.

<split-gene.png>

Although the opposite also happens.

<merged-gene.png>
For some reason, the "out of place" model is always (or almost) the one from Genemark.

How much weight does carry the RNAseq and protein data on this decision (if any)?
How exactly is the final gene selected?

Cheers,
Xabi

--
Xabier Vázquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org