Re: Question regarding MAKER

Re: Question regarding MAKER

Carson Holt
est2genome and protein2genome take BLAST hits, polish them with exonerate around splice sites and then turn the alignment directly into a gene model. So if the alignment is partial because the EST or mRNA-seq do not cross the entire transcript or the protein homology does not cross the entire CDS, then the resulting model will be partial. But hundreds of even partial models are sufficient to train SNAP. Then I usually do just one round of bootstrap training (more than that and you get into the overtraining paradox).

So you can use just est2genome, just protein2genome, or both. You just need something to train SNAP with.


On Jul 11, 2017, at 3:37 PM, Ghosh, Arnab <[hidden email]> wrote:

Hi Carson,
My name is Arnab and I am from Texas Tech University.
I am using MAKER for gene annotation in a new genome assembly for a non-model organism. I have mostly figured out everything of this amazing piece of software but had two questions.
  1. Is it okay to use only est2genome =1 and leave the protein2genome=0  option out in the first round of running MAKER ?  Will it hurt my prediction and eventual annotation of gene if I don’t use the protein2genomeoption ALONGSIDE est2genome in the first round? I have a protein fasta file for the same organism but using the transcript fasta file (same organism) AND  the protein fasta file for the whole genome (~ 2.2 GB in size) is just taking too long to finish.
  1. I will of course run SNAP in the second round which also leads me to my second question as to what according to you is an acceptable number of iterations to run bootstrapping of SNAP with MAKER?
Thanks and regards

