Re-annotation of a genome

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re-annotation of a genome

Gerrit
Dear all:

We would like to re-annotate a genome and are not sure how
to accomplish this with MAKER. We have one complete MAKER
derived annotation, which we inferred using EST data from
an alternate organism, protein data, and SNAP, Augustus,
and GenMark-ES as ab initio gene predictors. To train SNAP
and Augustus we used the gene models from the CEGMA
pipeline. Now, we would like to re-annotate the genome
with MAKER by applying HMM for SNAP and Augustus that we
inferred using the first MAKER annotation.

Our questions:

1) Is it necessary to have MAKER start from scratch or can
we specify options so that the time consuming mapping of
the EST and Protein evidence is adopted from the first
annotation and so that only the ab initio gene prediction
and annotation steps are done again? If the latter works,
are the ab initio gene predictions from the first MAKER
run explicitly excluded by MAKER from the annotation set
(which they should) or do they have to be deleted
"manually".

2) If it is possible to use a previous annotation as
starting point, is it also possible to add additional
evidence? E.g. if we get EST data that we previously not
had, can we specify MAKER so that it takes the previous
annotation, maps the ESTs and then uses all previous data
(not the annotation!) plus the new EST data for a new
annotation. What if we would like to add additional
protein evidence. Would this also work, or is any previous
data for a given category automatically excluded from the
annotation when new data are specified within a given
category?

Our impression from the re-annotation options in the
maker_opts file is that the above procedures might be
supported by MAKER. However, we were unable to find them
described in the documentation.

Any advice would be greatly appreciated!

Best wishes,
Gerrit

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Re-annotation of a genome

Carson Hinton Holt
Re: [maker-devel] Re-annotation of a genome
1) Is it necessary to have MAKER start from scratch or can
we specify options so that the time consuming mapping of
the EST and Protein evidence is adopted from the first
annotation and so that only the ab initio gene prediction
and annotation steps are done again?

There are two ways to reuse old data.  The first is to supply a combine MAKER GFF3 file to genome_gff and then select what to keep using the ???_pass options.  The second is just to change your maker_opts.ctl file and then run again in the same directory.  MAKER will auto-detect what analyses must be re-run and will not remap things that have already been done.  I tend to use the latter method more often than the former.  Note in both cases that in order for MAKER to know settings have changed you must name things like the new hmm files differently than your last run.  MAKER need to  see a value change in the options to know there was a change, so just changing the contents of a file is not sufficient.  You must also change the name.  Also if you choose to use the genome_gff method, do not supply the same protein file to protein: as the data you are passing in with protein_pass, otherwise they will just be duplicated.

As a third way of doing this you could also parse files to break evidence types into different files, i.e. all proteins in one file, all models in another, etc.  You then would provide these files to the options model_gff, protein_gff, etc.  This is basically the same as the genome_gff option.

 If the latter works,
are the ab initio gene predictions from the first MAKER
run explicitly excluded by MAKER from the annotation set
(which they should) or do they have to be deleted
"manually".

All ab-initio gene predictions will be auto-deleted from the first run as long as the HMM file’s name has changes. MAKER will then generate new ones.

2) If it is possible to use a previous annotation as
starting point, is it also possible to add additional
evidence? E.g. if we get EST data that we previously not
had, can we specify MAKER so that it takes the previous
annotation, maps the ESTs and then uses all previous data
(not the annotation!) plus the new EST data for a new
annotation. What if we would like to add additional
protein evidence. Would this also work, or is any previous
data for a given category automatically excluded from the
annotation when new data are specified within a given
category?

Currently to add evidence you will need to provide a new file, so MAKER will rerun EST/protein mapping if you provide a new EST/protein file.  The next release of MAKER will support comma separated lists to make it easy to add and remove subsets of evidence, but with the current version you must provide a new file made of the old file combined with the new fasta entries.  Or you can provide the old data using the GFF3 pass-through options I described, with the completely new data provided in the file given to “protein:”.

Just let me know if there are other questions.  We could also set up a phone call if any of the options seem confusing or difficult to understand via e-mail.

Thanks,
Carson

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org