training SNAP with ests and cegma proteins

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

training SNAP with ests and cegma proteins

claudia
Hi,
  Does anyone know if  it is safe to train SNAP by  running maker first
with specific organism ests and the 458 core proteins, then using the
generated gene models to train SNAP, or is there a better method, i.e
using the CEGMA pipeline to generate gene models first and using this
output in MAKER to train SNAP?

Thanks in advance,
Claudia

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: training SNAP with ests and cegma proteins

Marvin B Moore
Hi Claudia,

That sounds like a good way to train SNAP.  I think in general you'll come up with similar results with either training approach that you suggest after a few rounds of training SNAP.  I would think that using organism specific ESTs as as evidence while training SNAP will improve things, however, that will depend to some extent on the nature of your genome and the quality of the EST library.  The caveat to all of the above is that I haven't done a comparison of training under both of the ways you suggested, so my feedback is based on what I think, not what I've shown.

B

On Jul 26, 2011, at 2:23 PM, claudia wrote:

Hi,
 Does anyone know if  it is safe to train SNAP by  running maker first
with specific organism ests and the 458 core proteins, then using the
generated gene models to train SNAP, or is there a better method, i.e
using the CEGMA pipeline to generate gene models first and using this
output in MAKER to train SNAP?

Thanks in advance,
Claudia

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: training SNAP with ests and cegma proteins

Felix Bemm
In reply to this post by claudia
Am 26.07.2011 22:23, schrieb claudia:

> Hi,
> Does anyone know if it is safe to train SNAP by running maker first with
> specific organism ests and the 458 core proteins, then using the
> generated gene models to train SNAP, or is there a better method, i.e
> using the CEGMA pipeline to generate gene models first and using this
> output in MAKER to train SNAP?
>
> Thanks in advance,
> Claudia
>
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

dear claudia,

as long as the est set is species specific and covers more than one gene
family your strategy should work fine. For further improvements you
should use protein-based evidence as well. As posted on the maker list
very recently the usage of the swiss-prot database is very suitable for
that.

best regards
felix

--
Felix Bemm
Department of Bioinformatics
University of W├╝rzburg, Germany
Tel: +49 931 - 31 83696
Fax: +49 931 - 31 84552
[hidden email]

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: training SNAP with ests and cegma proteins

Carson Holt-2
In reply to this post by Marvin B Moore
Re: [maker-devel] training SNAP with ests and cegma proteins Training using est2genome works just fine.  I would recommended running SNAP once again inside of MAKER using both ESTs and proteins after the initial training (initial being either CEGMA or est2genome).  Use the resulting second gene set for final training.  This single round of bootstrapping is sufficient.

Interestingly I also have data showing that you can pick training files from any species at random, run a single round of bootstrapping with MAKER and achieve training accuracy levels equal to those of both the other training methods.  The evidence inclusion in the MAKER bootstrapping step auto-corrects for insufficiencies and bad data in the initial training data, i.e. SNAP inside MAKER using the Arabidopsis training file to annotate C. elegans outperforms SNAP on its own using the correct C. elegans training file to annotate C. elegans.  So MAKER will fix any bad training data, and make SNAP work better.

--Carson


On 7/26/11 6:26 PM, "Barry Moore" <bmoore@...> wrote:

Hi Claudia,

That sounds like a good way to train SNAP.  I think in general you'll come up with similar results with either training approach that you suggest after a few rounds of training SNAP.  I would think that using organism specific ESTs as as evidence while training SNAP will improve things, however, that will depend to some extent on the nature of your genome and the quality of the EST library.  The caveat to all of the above is that I haven't done a comparison of training under both of the ways you suggested, so my feedback is based on what I think, not what I've shown.

B

On Jul 26, 2011, at 2:23 PM, claudia wrote:

Hi,
  Does anyone know if  it is safe to train SNAP by  running maker first
with specific organism ests and the 458 core proteins, then using the
generated gene models to train SNAP, or is there a better method, i.e
using the CEGMA pipeline to generate gene models first and using this
output in MAKER to train SNAP?

Thanks in advance,
Claudia

_______________________________________________
maker-devel mailing list
maker-devel@...
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543






_______________________________________________
maker-devel mailing list
maker-devel@...
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: training SNAP with ests and cegma proteins

Reith, Michael
Re: [maker-devel] training SNAP with ests and cegma proteins

Hi Carson,

 

So if I understand your second point correctly, one could run MAKER once with ESTs, proteins (e.g. Cegma) & SNAP using a random training file, train SNAP using the output of that run and then run MAKER again with the new SNAP training file to get a fairly accurate set of gene calls.  Or have I misunderstood?

 

Thanks,

Mike

 


From: [hidden email] [mailto:[hidden email]] On Behalf Of Carson Holt
Sent: Wednesday, July 27, 2011 10:08 AM
To: Barry Moore; claudia
Cc: [hidden email]
Subject: Re: [maker-devel] training SNAP with ests and cegma proteins

 

Training using est2genome works just fine.  I would recommended running SNAP once again inside of MAKER using both ESTs and proteins after the initial training (initial being either CEGMA or est2genome).  Use the resulting second gene set for final training.  This single round of bootstrapping is sufficient.

Interestingly I also have data showing that you can pick training files from any species at random, run a single round of bootstrapping with MAKER and achieve training accuracy levels equal to those of both the other training methods.  The evidence inclusion in the MAKER bootstrapping step auto-corrects for insufficiencies and bad data in the initial training data, i.e. SNAP inside MAKER using the Arabidopsis training file to annotate C. elegans outperforms SNAP on its own using the correct C. elegans training file to annotate C. elegans.  So MAKER will fix any bad training data, and make SNAP work better.

--Carson


On 7/26/11 6:26 PM, "Barry Moore" <bmoore@...> wrote:

Hi Claudia,

That sounds like a good way to train SNAP.  I think in general you'll come up with similar results with either training approach that you suggest after a few rounds of training SNAP.  I would think that using organism specific ESTs as as evidence while training SNAP will improve things, however, that will depend to some extent on the nature of your genome and the quality of the EST library.  The caveat to all of the above is that I haven't done a comparison of training under both of the ways you suggested, so my feedback is based on what I think, not what I've shown.

B

On Jul 26, 2011, at 2:23 PM, claudia wrote:

Hi,
  Does anyone know if  it is safe to train SNAP by  running maker first
with specific organism ests and the 458 core proteins, then using the
generated gene models to train SNAP, or is there a better method, i.e
using the CEGMA pipeline to generate gene models first and using this
output in MAKER to train SNAP?

Thanks in advance,
Claudia

_______________________________________________
maker-devel mailing list
maker-devel@...
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





_______________________________________________
maker-devel mailing list
maker-devel@...
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: training SNAP with ests and cegma proteins

Carson Holt-2
Re: [maker-devel] training SNAP with ests and cegma proteins That is exactly correct.  I performed a number of simulations using various training data from a number of species, and it became clear that the resulting HMM converges for virtually all data after a single round of bootstrapping.  The convergence is due to the fact that MAKER is feeding “hints” to SNAP on the proper location of introns, exons, and CDS using data from the evidence alignments.  SNAP then alters its behavior on the fly and produces a dramatically better set of models for bootstrapped re-training.  In the most extreme cases SNAP achieve a 3 fold improvement in accuracy just using he hints (these 3 fold better models then served as the basis for re-training).  Accuracies then held steady upon further rounds of bootstrapping.

--Carson


On 7/27/11 9:46 AM, "Reith, Michael" <Michael.Reith@...> wrote:

Hi Carson,
 
So if I understand your second point correctly, one could run MAKER once with ESTs, proteins (e.g. Cegma) & SNAP using a random training file, train SNAP using the output of that run and then run MAKER again with the new SNAP training file to get a fairly accurate set of gene calls.  Or have I misunderstood?
 
Thanks,
Mike
 


From: maker-devel-bounces@... [[hidden email]] On Behalf Of Carson Holt
Sent: Wednesday, July 27, 2011 10:08 AM
To: Barry Moore; claudia
Cc: maker-devel@...
Subject: Re: [maker-devel] training SNAP with ests and cegma proteins

Training using est2genome works just fine.  I would recommended running SNAP once again inside of MAKER using both ESTs and proteins after the initial training (initial being either CEGMA or est2genome).  Use the resulting second gene set for final training.  This single round of bootstrapping is sufficient.

Interestingly I also have data showing that you can pick training files from any species at random, run a single round of bootstrapping with MAKER and achieve training accuracy levels equal to those of both the other training methods.  The evidence inclusion in the MAKER bootstrapping step auto-corrects for insufficiencies and bad data in the initial training data, i.e. SNAP inside MAKER using the Arabidopsis training file to annotate C. elegans outperforms SNAP on its own using the correct C. elegans training file to annotate C. elegans.  So MAKER will fix any bad training data, and make SNAP work better.

--Carson


On 7/26/11 6:26 PM, "Barry Moore" <bmoore@...> wrote:
Hi Claudia,

That sounds like a good way to train SNAP.  I think in general you'll come up with similar results with either training approach that you suggest after a few rounds of training SNAP.  I would think that using organism specific ESTs as as evidence while training SNAP will improve things, however, that will depend to some extent on the nature of your genome and the quality of the EST library.  The caveat to all of the above is that I haven't done a comparison of training under both of the ways you suggested, so my feedback is based on what I think, not what I've shown.

B

On Jul 26, 2011, at 2:23 PM, claudia wrote:
Hi,
  Does anyone know if  it is safe to train SNAP by  running maker first
with specific organism ests and the 458 core proteins, then using the
generated gene models to train SNAP, or is there a better method, i.e
using the CEGMA pipeline to generate gene models first and using this
output in MAKER to train SNAP?

Thanks in advance,
Claudia

_______________________________________________
maker-devel mailing list
maker-devel@...
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543




_______________________________________________
maker-devel mailing list
maker-devel@...
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: training SNAP with ests and cegma proteins

claudia
In reply to this post by claudia
Hi,
  Thanks for the help! I have one more question. With regards to
training any ab-initio gene predictor, it seems obvious that one should
turn on Repeat masking, however I am not clear on this as the maker
tutorial does not mention anything about RM. Can someone clarify this?

Thanks in advance!
Claudia

> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 26 Jul 2011 16:23:17 -0400
> From: claudia<[hidden email]>
> To: "[hidden email]"<[hidden email]>
> Subject: [maker-devel] training SNAP with ests and cegma proteins
> Message-ID:<[hidden email]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi,
>    Does anyone know if  it is safe to train SNAP by  running maker first
> with specific organism ests and the 458 core proteins, then using the
> generated gene models to train SNAP, or is there a better method, i.e
> using the CEGMA pipeline to generate gene models first and using this
> output in MAKER to train SNAP?
>
> Thanks in advance,
> Claudia
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 26 Jul 2011 16:21:55 -0400
> From: Claudia<[hidden email]>
> To: "[hidden email]"<[hidden email]>
> Subject: [maker-devel] training snap with ests and cegma proteins
> Message-ID:<[hidden email]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi,
>    Does anyone know if  it is safe to train SNAP by  running maker first
> with specific organism ests and the 458 core proteins, then using the
> generated gene models to train SNAP, or is there a better method, i.e
> using the CEGMA pipeline to generate gene models first and using this
> output in MAKER to train SNAP?
>
> Thanks in advance,
> Claudia
>
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 26 Jul 2011 16:26:59 -0600
> From: Barry Moore<[hidden email]>
> To: claudia<[hidden email]>
> Cc: "[hidden email]"<[hidden email]>
> Subject: Re: [maker-devel] training SNAP with ests and cegma proteins
> Message-ID:<[hidden email]>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi Claudia,
>
> That sounds like a good way to train SNAP.  I think in general you'll come up with similar results with either training approach that you suggest after a few rounds of training SNAP.  I would think that using organism specific ESTs as as evidence while training SNAP will improve things, however, that will depend to some extent on the nature of your genome and the quality of the EST library.  The caveat to all of the above is that I haven't done a comparison of training under both of the ways you suggested, so my feedback is based on what I think, not what I've shown.
>
> B
>
> On Jul 26, 2011, at 2:23 PM, claudia wrote:
>
> Hi,
>   Does anyone know if  it is safe to train SNAP by  running maker first
> with specific organism ests and the 458 core proteins, then using the
> generated gene models to train SNAP, or is there a better method, i.e
> using the CEGMA pipeline to generate gene models first and using this
> output in MAKER to train SNAP?
>
> Thanks in advance,
> Claudia
>
> _______________________________________________
> maker-devel mailing list
> [hidden email]<mailto:[hidden email]>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:<http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20110726/aec5a1ce/attachment-0001.htm>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 27 Jul 2011 07:58:36 +0200
> From: Felix Bemm<[hidden email]>
> To: [hidden email]
> Subject: Re: [maker-devel] training SNAP with ests and cegma proteins
> Message-ID:<[hidden email]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Am 26.07.2011 22:23, schrieb claudia:
>> Hi,
>> Does anyone know if it is safe to train SNAP by running maker first with
>> specific organism ests and the 458 core proteins, then using the
>> generated gene models to train SNAP, or is there a better method, i.e
>> using the CEGMA pipeline to generate gene models first and using this
>> output in MAKER to train SNAP?
>>
>> Thanks in advance,
>> Claudia
>>
>> _______________________________________________
>> maker-devel mailing list
>> [hidden email]
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> dear claudia,
>
> as long as the est set is species specific and covers more than one gene
> family your strategy should work fine. For further improvements you
> should use protein-based evidence as well. As posted on the maker list
> very recently the usage of the swiss-prot database is very suitable for
> that.
>
> best regards
> felix
>


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: training SNAP with ests and cegma proteins

Marvin B Moore
Hi Claudia,

Yes you want repeat masking on and this will be the case by default when you generate control files with maker -CTL.  The line in maker_opts.ctl that does this is:

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=all #select a model organism for RepBase masking in RepeatMasker

B

On Jul 27, 2011, at 12:22 PM, claudia wrote:

Hi,
 Thanks for the help! I have one more question. With regards to
training any ab-initio gene predictor, it seems obvious that one should
turn on Repeat masking, however I am not clear on this as the maker
tutorial does not mention anything about RM. Can someone clarify this?

Thanks in advance!
Claudia
----------------------------------------------------------------------

Message: 1
Date: Tue, 26 Jul 2011 16:23:17 -0400
From: claudia<[hidden email]>
To: "[hidden email]"<[hidden email]>
Subject: [maker-devel] training SNAP with ests and cegma proteins
Message-ID:<[hidden email]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,
  Does anyone know if  it is safe to train SNAP by  running maker first
with specific organism ests and the 458 core proteins, then using the
generated gene models to train SNAP, or is there a better method, i.e
using the CEGMA pipeline to generate gene models first and using this
output in MAKER to train SNAP?

Thanks in advance,
Claudia



------------------------------

Message: 2
Date: Tue, 26 Jul 2011 16:21:55 -0400
From: Claudia<[hidden email]>
To: "[hidden email]"<[hidden email]>
Subject: [maker-devel] training snap with ests and cegma proteins
Message-ID:<[hidden email]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,
  Does anyone know if  it is safe to train SNAP by  running maker first
with specific organism ests and the 458 core proteins, then using the
generated gene models to train SNAP, or is there a better method, i.e
using the CEGMA pipeline to generate gene models first and using this
output in MAKER to train SNAP?

Thanks in advance,
Claudia



------------------------------

Message: 3
Date: Tue, 26 Jul 2011 16:26:59 -0600
From: Barry Moore<[hidden email]>
To: claudia<[hidden email]>
Cc: "[hidden email]"<[hidden email]>
Subject: Re: [maker-devel] training SNAP with ests and cegma proteins
Message-ID:<[hidden email]>
Content-Type: text/plain; charset="us-ascii"

Hi Claudia,

That sounds like a good way to train SNAP.  I think in general you'll come up with similar results with either training approach that you suggest after a few rounds of training SNAP.  I would think that using organism specific ESTs as as evidence while training SNAP will improve things, however, that will depend to some extent on the nature of your genome and the quality of the EST library.  The caveat to all of the above is that I haven't done a comparison of training under both of the ways you suggested, so my feedback is based on what I think, not what I've shown.

B

On Jul 26, 2011, at 2:23 PM, claudia wrote:

Hi,
 Does anyone know if  it is safe to train SNAP by  running maker first
with specific organism ests and the 458 core proteins, then using the
generated gene models to train SNAP, or is there a better method, i.e
using the CEGMA pipeline to generate gene models first and using this
output in MAKER to train SNAP?

Thanks in advance,
Claudia

_______________________________________________
maker-devel mailing list
[hidden email]<[hidden email]>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543




-------------- next part --------------
An HTML attachment was scrubbed...
URL:<http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20110726/aec5a1ce/attachment-0001.htm>

------------------------------

Message: 4
Date: Wed, 27 Jul 2011 07:58:36 +0200
From: Felix Bemm<[hidden email]>
To: [hidden email]
Subject: Re: [maker-devel] training SNAP with ests and cegma proteins
Message-ID:<[hidden email]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Am 26.07.2011 22:23, schrieb claudia:
Hi,
Does anyone know if it is safe to train SNAP by running maker first with
specific organism ests and the 458 core proteins, then using the
generated gene models to train SNAP, or is there a better method, i.e
using the CEGMA pipeline to generate gene models first and using this
output in MAKER to train SNAP?

Thanks in advance,
Claudia

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
dear claudia,

as long as the est set is species specific and covers more than one gene
family your strategy should work fine. For further improvements you
should use protein-based evidence as well. As posted on the maker list
very recently the usage of the swiss-prot database is very suitable for
that.

best regards
felix



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org