est2genome and protein2genome take BLAST hits, polish them with exonerate around splice sites and then turn the alignment directly into a gene model. So if the alignment is partial because the EST or mRNA-seq do not cross the entire transcript or the protein
homology does not cross the entire CDS, then the resulting model will be partial. But hundreds of even partial models are sufficient to train SNAP. Then I usually do just one round of bootstrap training (more than that and you get into the overtraining paradox).
So you can use just est2genome, just protein2genome, or both. You just need something to train SNAP with.
My name is Arnab and I am from Texas Tech University.
I am using MAKER for gene annotation in a new genome assembly for a non-model organism. I have mostly figured out everything of this amazing piece of software but had two questions.
Is it okay to use only est2genome =1 and leave the protein2genome=0 option out in the first round of running MAKER ? Will it hurt my prediction and eventual annotation of gene if I don’t use theprotein2genomeoption
ALONGSIDE est2genome in the first round? I have a protein fasta file for the same organism but using the transcript fasta file (same organism) AND the protein fasta file for the whole genome (~ 2.2 GB in size) is just taking too long to finish.
I will of course run SNAP in the second round which also leads me to my second question as to what according to you is an acceptable number of iterations to run bootstrapping of SNAP with MAKER?