Re: Maker

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Maker

Carson Hinton Holt
Re: Maker So gene models are actually produced by SNAP, Augustus, or the gene predictor you specify, and the quality of those models is dependent on how well they have been trained on the organism being annotated.  You may need to do some more training.  MAKER will let SNAP/Augustus produce gene models in ab initio runs (fully based on training) as well as hint based runs (MAKER feeds hints to the algorithms based on EST and protein alignments to supplement the training).  Only final gene models that are either supported by EST or protein alignments are accepted in the end, the remaining unsupported models are put in the non-overlapping fasta output file.  Most of the time the gene predictors (SNAP/Augustus) are smart enough to ignore pseudogenes, but not always and you can get models that lack stops from these algorithms.  I am currently working on the next MAKER update that will provide some options for handling models without stops as well as many other updates, and I hope to be done in the next 4-6 weeks.

In addition to training, the evidence alignments can have an effect.  If the evidence tends to be sparse or fragmented, this can result in fragmented gene models.  To compensate for fragmented evidence, increase the pred_flank value in the maker_opt.ctl file.  This will make evidence cluster together better even if there are larger gaps between alignments.

RepeatMasker handles repetitive elements for maker, and will help keep repetitive regions from falsely becoming part of a gene, but maker is capable of overriding the masking if there is very strong spliced EST or protein homology evidence suggesting that the repetitive region is in fact part of the gene.  If you are worried about loss of real gene containing repeats, you can set unmask to 1 in the maker_opt.ctl file.  This will allow SNAP and Augustus gene models based on completely unmasked sequence to be considered equally as an alternative to the masked genome models.  I almost always set unmask to 1 in at least to see if it improves gene models.

Basically what I usually do when annotating a genome is to manually view several contigs after MAKER finishes, and the I decide what parameters to tweak or if I need to retrain prediction algorithms.  I usually go through 3-4 rounds of annotation before I am satisfied with the set.  You can do this on a subset of the genome as well (i.e. About 10-20 megabases).  Repeat annotation is relatively fast because MAKER is able to reuse data from previous runs rather than rerunning everything.

Thanks,
Carson


On 8/25/10 3:36 AM, "Martin Kapun" <capoony@...> wrote:

Dear Carson,

thank you very much for your reply to my last email. I followed your advice and used the new version of RM which indeed is enormously faster.
I am sorry to bother you again, but I have some more questions. In order to find orthologos genes, we blasted the maker predicted proteome for the Drosophila simulans genome (ca. 14300 genes )against the flybase curated Drosophila melanogaster proteome. It turned out that app. 85% of the melanogaster genes had no maker equivalent. A certain proportion of those may well be novel gens, but I think some of those may not be predicted due to other reasons. I then searched for those missing genes in the blastx output within the maker gff (because the mel. proteome was use for maker) and found that 75% of these missing genes had blastx hits. Can you tell me what criteria genes have to fulfill to be considered in the final maker output as maker genes (I have provided a Proteom database and an EST database for the maker annotation ) and how does maker handle pseudogenes with premature stopcodons? How does maker handle genes containing repetitive elements (eg. poly Glutamine stretches) and are these genes affected by RepeatMasker? I am very sorry if I have overlooked this in the documentation!!

Thank you very much and sorry for bothering you again,

Martin

On Aug 3, 2010, at 7:08 PM, Carson Holt wrote:

Hello Martin,
 
 To use maker_functional_fasta, you need to have a wublast report using the –mformat=2 option against the uniprot/swissprot database.  You can then use maker_functional_fasta to add putative gene functions to all fasta entries. maker_functional_gff will add the same to GFF3 files as part of the ‘Note’ attribute.  Right now this only works with wublast report, this is primarily because I have not yet had time to update it to use NCBI BLAST reports, and because I am going to make this part of the overall MAKER pipeline soon rather than a separate script.
 
 With the newest version of RepeatMasker, the authors also created a version of NCBI blast called rmblast.  I would recommend using that.  It is much faster than cross_match.
 
 Thanks,
 Carson
 
 
 
 
 
 On 7/30/10 3:18 AM, "Martin Kapun" <capoony@...> wrote:
 
 
Dear Carson Holt,
 
 I was wondering wether you have received my last email form the 24th of Juni 2010, and thus I gave it a second try. In the meantime I found the script "maker_functional_fasta" within the maker folder and was wondering if this script is suitable to update the maker-gene predictions by protein names. Furthermore, I wanted to ask, whether it is possible to use NCBI Blast or CrossMatch other than WU-Blast, as we do not have a license for the latter.
 
 Thank you very much, Martin
 
 Begin forwarded message:
 
 
From: Martin Kapun <capoony@...>
 Date: 24. Juni 2010 15:30:42 MESZ
 To: carson.holt@...
 Subject: Re: Maker
 
Dear Carson Holt,
 
 I am a PhD student with Christian Schlötterer and I am  currently working on a reannotation of the D. simulans genome. First of all,  I want to thank you for this very useful software and the instructions on how to get it running under OSX 10.6.
 I wanted to ask whether there is an option in the settings  to incorporate the names of the orthologs found by blastx and exonerate in the final gene-model? It would be very useful to compare annotations and orthologs between species. I am sorry if I have overlooked this part in the manual!
 
 Thank you, Martin Kapun
 
 On Jun 18, 2010, at 7:00 AM, Christian Schlötterer wrote:
 
 
Dear all,
 
 I received from Carson Holt, the author of Maker, instructions on how to install exonerate on OS X 10.6
 
 please keep me posted on your success!
 
 c
 
 Begin forwarded message:
 
 
From: Carson Holt <carson.holt@...>
 Date: June 17, 2010 11:46:11 PM GMT+02:00
 To: Christian Schlötterer <christian.schloetterer@...>, MAKER <maker-devel@...>
 Subject: Re: Maker
 
 Here is a step by step guide to getting exonerate 2 to work on Mac OS X 10.6 (and only for 10.6). If it seems too intimidating just install exonerate 1.0, it still works although it can sometimes have weird problems.  You should be able to cut and paste one command line at a time.  New Macs with OS X 10.6 are quirky because they have the new 64-bit Intel processor and a hybrid operating system (some parts are 32 bit and some parts are 64 bit).  This has caused all kinds of problems for some Linux modules/libraries when installing on a new Mac with OS X 10.6.
  
  Follow these steps exactly in order to get exonerate installed.
  
  Delete any old version of fink:
  
 
rm -rf /sw
  
 

  Copy the file attached to this e-mail to your Desktop. Then add it to the folder ~/bin.  Make sure you specify to copy to ~/bin/gcc exactly so the file will have the correct name in the end, as downloading this file via e-mail for some reason almost  always tacks a .pl extension on the end:
  
 
mkdir  ~/bin
  cp  ~/Desktop/gcc*  ~/bin/gcc
  chmod  a+x  ~/bin/gcc
  
 

  Add ~/bin to you PATH variable in .bash_profile or .profile (you may have to create this file if it doesn’t exist):
  
 
export PATH=$HOME/bin:$PATH
  
 

  Reload your .bash_profile or .profile:
  
 
source .bash_profile
  source .profile
  
  
 
Now download and install fink from http://downloads.sourceforge.net/fink/ fink-0.29.10.tar.gz (clicking the link should download it into your ~/Downloads folder):
  
 
cd ~/Downloads
  tar -xvf fink-0.29.10.tar.gz
  cd fink-0.29.10
  ./bootstrap /sw
  /sw/bin/pathsetup.sh
  fink selfupdate
  
  
 
Now install libgettext3-dev:
  
 
fink scanpackages
  sudo apt-get update
  sudo apt-get install libgettext3-dev=0.14.5-2
  
  
 
Now install glib and glib2
  
 
fink install glib
  fink install glib2-dev
  
  
 
Then install exonerate from the source code http://www.ebi.ac.uk/~guy/exonerate/exonerate-2.2.0.tar.gz (clicking the link should download it into your ~/Downloads folder).
  
 
cd ~/Downloads/exonerate-2.2.0
  ./configure
  make
  make install
  
 

  Now everything should be ready to go.  Just type exonerate to test.  You can also remove the file ~/bin/gcc if you want.  The file is forcing fink and any other packages to always install things in 32-bit for compatibility.  You will need to remove it if you ever want to compile in 64-bit.  Many things you install via fink will not work in 64-bit on a Mac, and neither will exonerate, which is why I had you add the file.
  
  Thanks,
  Carson
  
 

  
 
--------------
 Christian Schlötterer
 Institut für Populationsgenetik
 Veterinärmedizinische Universität Wien
 Josef Baumann Gasse 1
 1210 Wien
 Austria/Europe
 
 phone: +43-1-25077-4300
 fax: +43-1-25077-4390
 http://i122server.vu-wien.ac.at/pop
 
 Vienna Graduate School of Population Genetics
 http://www.popgen-vienna.at <http://www.popgen-vienna.at/>
 
VetCore Illumina Sequencing Service
 http://i122server.vu-wien.ac.at/pop/seq/VetCore2.htm
 

  
 <gcc>
 

 

 
______________________________
 Martin Kapun
 Institut für Populationsgenetik
 Veterinärmedizinische Universität Wien
 Veterinärplatz 1
 1210 Vienna
 Austria
 
 
 

 
 

  



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org