Hi,
Does anyone know if it is safe to train SNAP by running maker first with specific organism ests and the 458 core proteins, then using the generated gene models to train SNAP, or is there a better method, i.e using the CEGMA pipeline to generate gene models first and using this output in MAKER to train SNAP? Thanks in advance, Claudia _______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org |
Hi Claudia,
That sounds like a good way to train SNAP. I think in general you'll come up with similar results with either training approach that you suggest after a few rounds of training SNAP. I would think that using organism specific ESTs as as evidence while training SNAP will improve things, however, that will depend to some extent on the nature of your genome and the quality of the EST library. The caveat to all of the above is that I haven't done a comparison of training under both of the ways you suggested, so my feedback is based on what I think, not what I've shown. B On Jul 26, 2011, at 2:23 PM, claudia wrote:
Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org |
In reply to this post by claudia
Am 26.07.2011 22:23, schrieb claudia:
> Hi, > Does anyone know if it is safe to train SNAP by running maker first with > specific organism ests and the 458 core proteins, then using the > generated gene models to train SNAP, or is there a better method, i.e > using the CEGMA pipeline to generate gene models first and using this > output in MAKER to train SNAP? > > Thanks in advance, > Claudia > > _______________________________________________ > maker-devel mailing list > [hidden email] > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org dear claudia, as long as the est set is species specific and covers more than one gene family your strategy should work fine. For further improvements you should use protein-based evidence as well. As posted on the maker list very recently the usage of the swiss-prot database is very suitable for that. best regards felix -- Felix Bemm Department of Bioinformatics University of Würzburg, Germany Tel: +49 931 - 31 83696 Fax: +49 931 - 31 84552 [hidden email] _______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org |
In reply to this post by Marvin B Moore
Interestingly I also have data showing that you can pick training files from any species at random, run a single round of bootstrapping with MAKER and achieve training accuracy levels equal to those of both the other training methods. The evidence inclusion in the MAKER bootstrapping step auto-corrects for insufficiencies and bad data in the initial training data, i.e. SNAP inside MAKER using the Arabidopsis training file to annotate C. elegans outperforms SNAP on its own using the correct C. elegans training file to annotate C. elegans. So MAKER will fix any bad training data, and make SNAP work better. --Carson On 7/26/11 6:26 PM, "Barry Moore" <bmoore@...> wrote: Hi Claudia, _______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org |
Hi Carson, So if I understand your second point correctly,
one could run MAKER once with ESTs, proteins (e.g. Cegma) & SNAP using a random
training file, train SNAP using the output of that run and then run MAKER again
with the new SNAP training file to get a fairly accurate set of gene calls. Or
have I misunderstood? Thanks, Mike From:
[hidden email]
[mailto:[hidden email]] On
Behalf Of Carson Holt Training using est2genome works
just fine. I would recommended running SNAP once again inside of MAKER
using both ESTs and proteins after the initial training (initial being either
CEGMA or est2genome). Use the resulting second gene set for final
training. This single round of bootstrapping is sufficient. Hi Claudia, Hi,
_______________________________________________ _______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org |
--Carson On 7/27/11 9:46 AM, "Reith, Michael" <Michael.Reith@...> wrote: Hi Carson, _______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org |
In reply to this post by claudia
Hi,
Thanks for the help! I have one more question. With regards to training any ab-initio gene predictor, it seems obvious that one should turn on Repeat masking, however I am not clear on this as the maker tutorial does not mention anything about RM. Can someone clarify this? Thanks in advance! Claudia > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 26 Jul 2011 16:23:17 -0400 > From: claudia<[hidden email]> > To: "[hidden email]"<[hidden email]> > Subject: [maker-devel] training SNAP with ests and cegma proteins > Message-ID:<[hidden email]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi, > Does anyone know if it is safe to train SNAP by running maker first > with specific organism ests and the 458 core proteins, then using the > generated gene models to train SNAP, or is there a better method, i.e > using the CEGMA pipeline to generate gene models first and using this > output in MAKER to train SNAP? > > Thanks in advance, > Claudia > > > > ------------------------------ > > Message: 2 > Date: Tue, 26 Jul 2011 16:21:55 -0400 > From: Claudia<[hidden email]> > To: "[hidden email]"<[hidden email]> > Subject: [maker-devel] training snap with ests and cegma proteins > Message-ID:<[hidden email]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi, > Does anyone know if it is safe to train SNAP by running maker first > with specific organism ests and the 458 core proteins, then using the > generated gene models to train SNAP, or is there a better method, i.e > using the CEGMA pipeline to generate gene models first and using this > output in MAKER to train SNAP? > > Thanks in advance, > Claudia > > > > ------------------------------ > > Message: 3 > Date: Tue, 26 Jul 2011 16:26:59 -0600 > From: Barry Moore<[hidden email]> > To: claudia<[hidden email]> > Cc: "[hidden email]"<[hidden email]> > Subject: Re: [maker-devel] training SNAP with ests and cegma proteins > Message-ID:<[hidden email]> > Content-Type: text/plain; charset="us-ascii" > > Hi Claudia, > > That sounds like a good way to train SNAP. I think in general you'll come up with similar results with either training approach that you suggest after a few rounds of training SNAP. I would think that using organism specific ESTs as as evidence while training SNAP will improve things, however, that will depend to some extent on the nature of your genome and the quality of the EST library. The caveat to all of the above is that I haven't done a comparison of training under both of the ways you suggested, so my feedback is based on what I think, not what I've shown. > > B > > On Jul 26, 2011, at 2:23 PM, claudia wrote: > > Hi, > Does anyone know if it is safe to train SNAP by running maker first > with specific organism ests and the 458 core proteins, then using the > generated gene models to train SNAP, or is there a better method, i.e > using the CEGMA pipeline to generate gene models first and using this > output in MAKER to train SNAP? > > Thanks in advance, > Claudia > > _______________________________________________ > maker-devel mailing list > [hidden email]<mailto:[hidden email]> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL:<http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20110726/aec5a1ce/attachment-0001.htm> > > ------------------------------ > > Message: 4 > Date: Wed, 27 Jul 2011 07:58:36 +0200 > From: Felix Bemm<[hidden email]> > To: [hidden email] > Subject: Re: [maker-devel] training SNAP with ests and cegma proteins > Message-ID:<[hidden email]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Am 26.07.2011 22:23, schrieb claudia: >> Hi, >> Does anyone know if it is safe to train SNAP by running maker first with >> specific organism ests and the 458 core proteins, then using the >> generated gene models to train SNAP, or is there a better method, i.e >> using the CEGMA pipeline to generate gene models first and using this >> output in MAKER to train SNAP? >> >> Thanks in advance, >> Claudia >> >> _______________________________________________ >> maker-devel mailing list >> [hidden email] >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > dear claudia, > > as long as the est set is species specific and covers more than one gene > family your strategy should work fine. For further improvements you > should use protein-based evidence as well. As posted on the maker list > very recently the usage of the swiss-prot database is very suitable for > that. > > best regards > felix > _______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org |
Hi Claudia,
Yes you want repeat masking on and this will be the case by default when you generate control files with maker -CTL. The line in maker_opts.ctl that does this is: #-----Repeat Masking (leave values blank to skip repeat masking) model_org=all #select a model organism for RepBase masking in RepeatMasker B On Jul 27, 2011, at 12:22 PM, claudia wrote:
Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org |
Free forum by Nabble | Edit this page |