I believe we are the final stage of annotation of a plant genome, having
previously trained snap following 3 rounds.
In our attempts at final annotation we have now added new transcriptome data,
and generated a repeat library for our species (so we now mask with that, as
well as database of plant repeats , and TE proteins).
In this final annotation run, we've set keep_pred=1 and then plan to
screen the final gff file retaining sequences with AED<= 0.5 (or there
abouts) and ones that possess a pfam domain .
I've compared some of the proteins obtained in our previous round of Maker with
the latest. Indeed the masking appears to have removed some that were TEs. A
number of proteins differ somewhat, likely the result of different intron/exon
boundary calls, and some are quite different in length.
In particular some are roughly twice the length in previous annotation, and
appear to be of the correct size previously , based upon online blasts.
It is this latter finding that I'm concerned about.
Why it has occurred.
I did set single-exon=1 and wonder if that is causing this effect?
Thanks and sorry for the long-winded email.
Dr. Joel S. Shore