Statistics about genomes and annotation in 2019 ?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Statistics about genomes and annotation in 2019 ?

Patrick Tran Van-2

Hi Carson and Maker dev team,


Based on that website: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018


I cite:


"As of January 2018, 8,955 Eukaryotic genome projects were at various stages of completion (4,683 were still being sequenced and 4,272 had at least a draft assembly, but not necessarily gene annotations). "


"There are an additional 82,859 prokaryotic genome projects with  various stages of completion with hundred of millions of additional  potential gene annotations. "

I am wondering:

1) Where did you find these statistics ?

2) Do you have new statistics for 2019 ?

3) I am interesed especially for genomes that have been assembled and annotated "independently" ( so NOT by Ensembl or NCBI), do you have any number or documentations about it ?

Thanks.


Patrick T

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Statistics about genomes and annotation in 2019 ?

Carson Holt-2
Those numbers came from https://gold.jgi.doe.gov 

You can download their tables and filter the data on different attributes.

—Carson



On May 1, 2019, at 4:25 AM, Patrick Tran Van <[hidden email]> wrote:

Hi Carson and Maker dev team,


I cite:

"As of January 2018, 8,955 Eukaryotic genome projects were at various stages of completion (4,683 were still being sequenced and 4,272 had at least a draft assembly, but not necessarily gene annotations). "

"There are an additional 82,859 prokaryotic genome projects with  various stages of completion with hundred of millions of additional  potential gene annotations. "

I am wondering:

1) Where did you find these statistics ?

2) Do you have new statistics for 2019 ? 

3) I am interesed especially for genomes that have been assembled and annotated "independently" ( so NOT by Ensembl or NCBI), do you have any number or documentations about it ?

Thanks. 


Patrick T


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

RV: Problem training agustus

seoanezonjic
In reply to this post by Patrick Tran Van-2
Hi Maker author
I have been using Maker for long years and recently, I've tried to train agustus using the snap training files. To do this, I have used the train_augustus.pl script as follows:

zff2genbank.pl export.ann export.dna > final_genes.gb
train_augustus.pl final_genes.gb MyOrg

For each of my gene models the error is the following:

Constructing GenBank feature: Feature begins after it ends: 1006..1001,2051..1917,7791..7689,7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,13315..11920,13598..13511,14971..14945,18637..18471,18898..18821,20558..20389,21067..20923,23249..23004,23549..23354,23647..23624
GBProcessor::getGeneList(): GBFeature constructor:Format error when reading genbank format.
Encountered error after reading 0 annotations.

The export files are generated with SNAP as described by your reference guides (two maker rounds). The issue seems related with the sense of the gene model that can be inspected here:
(export.ann file)
>MODEL236
Eterm   23624   23647   MODEL236
Exon    23354   23549   MODEL236
Exon    23004   23249   MODEL236
Exon    20923   21067   MODEL236
Exon    20389   20558   MODEL236
Exon    18821   18898   MODEL236
Exon    18471   18637   MODEL236
Exon    14945   14971   MODEL236
Exon    13511   13598   MODEL236
Exon    11920   13315   MODEL236
Exon    10459   10467   MODEL236
Exon    8873    9050    MODEL236
Exon    8628    8775    MODEL236
Exon    8374    8485    MODEL236
Exon    7880    7993    MODEL236
Exon    7689    7791    MODEL236
Exon    1917    2051    MODEL236
Einit   1001    1006    MODEL236

(genbankfile)
LOCUS       MODEL236               24647 bp    dna     linear   UNK
ACCESSION   unknown
FEATURES             Location/Qualifiers
     source          1..24647
     CDS             complement(join(1006..1001,2051..1917,7791..7689,
                     7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,
                     13315..11920,13598..13511,14971..14945,18637..18471,
                     18898..18821,20558..20389,21067..20923,23249..23004,
                     23549..23354,23647..23624))

It seems that augustus needs the direct sense description of the gene model in order to read the gb file and perform the training. How  could I fix the problem?
Thank you in advance
Pedro Seoane

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: RV: Problem training agustus

Xabier Vázquez-Campos
Hi Pedro,
I checked some of my files and there is no issue with a model in the inverse order. And the gb files generated look fine.
You need to use zff2augustus_gbk.pl not zff2genbank.pl. I don't remember the differences but I know that zff2augustus_gbk.pl works for sure.

Cheers,
Xabi

On Tue, 14 May 2019 at 18:12, p sz <[hidden email]> wrote:
Hi Maker author
I have been using Maker for long years and recently, I've tried to train agustus using the snap training files. To do this, I have used the train_augustus.pl script as follows:

zff2genbank.pl export.ann export.dna > final_genes.gb
train_augustus.pl final_genes.gb MyOrg

For each of my gene models the error is the following:

Constructing GenBank feature: Feature begins after it ends: 1006..1001,2051..1917,7791..7689,7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,13315..11920,13598..13511,14971..14945,18637..18471,18898..18821,20558..20389,21067..20923,23249..23004,23549..23354,23647..23624
GBProcessor::getGeneList(): GBFeature constructor:Format error when reading genbank format.
Encountered error after reading 0 annotations.

The export files are generated with SNAP as described by your reference guides (two maker rounds). The issue seems related with the sense of the gene model that can be inspected here:
(export.ann file)
>MODEL236
Eterm   23624   23647   MODEL236
Exon    23354   23549   MODEL236
Exon    23004   23249   MODEL236
Exon    20923   21067   MODEL236
Exon    20389   20558   MODEL236
Exon    18821   18898   MODEL236
Exon    18471   18637   MODEL236
Exon    14945   14971   MODEL236
Exon    13511   13598   MODEL236
Exon    11920   13315   MODEL236
Exon    10459   10467   MODEL236
Exon    8873    9050    MODEL236
Exon    8628    8775    MODEL236
Exon    8374    8485    MODEL236
Exon    7880    7993    MODEL236
Exon    7689    7791    MODEL236
Exon    1917    2051    MODEL236
Einit   1001    1006    MODEL236

(genbankfile)
LOCUS       MODEL236               24647 bp    dna     linear   UNK
ACCESSION   unknown
FEATURES             Location/Qualifiers
     source          1..24647
     CDS             complement(join(1006..1001,2051..1917,7791..7689,
                     7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,
                     13315..11920,13598..13511,14971..14945,18637..18471,
                     18898..18821,20558..20389,21067..20923,23249..23004,
                     23549..23354,23647..23624))

It seems that augustus needs the direct sense description of the gene model in order to read the gb file and perform the training. How  could I fix the problem?
Thank you in advance
Pedro Seoane
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Xabier Vázquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: RV: Problem training agustus

seoanezonjic
Hi Xavier
I've changed from zff2genbank.pl to zff2augustus_gbk.pl  and the the problem is fixed. I used zff2genbank.pl  because it is packaged into the maker suite. I think that MAKER authors should include zff2augustus_gbk.pl into the main suite, I didn't know about this genome-scripts repository.
By the way, I would like to show you my training steps with augustus, in order to know if they are correct:

zff2augustus_gbk.pl export.ann export.dna > final_genes.gb
randomSplit.pl final_genes.gb 500
new_species.pl --species=Demo
etraining --species=Demo final_genes.gb
optimize_augustus.pl --species=Demo --onlytrain=final_genes.gb.train  final_genes.gb.test
etraining --species=Demo final_genes.gb

I have taken the training parameters (excepting the 500 parameter) from the train_augustus.pl script included in MAKER suite.
Thank you in advance
Pedro Seoane

De: Xabier Vázquez-Campos <[hidden email]>
Enviado: jueves, 16 de mayo de 2019 4:42
Para: p sz
Cc: [hidden email]
Asunto: Re: [maker-devel] RV: Problem training agustus
 
Hi Pedro,
I checked some of my files and there is no issue with a model in the inverse order. And the gb files generated look fine.
You need to use zff2augustus_gbk.pl not zff2genbank.pl. I don't remember the differences but I know that zff2augustus_gbk.pl works for sure.

Cheers,
Xabi

On Tue, 14 May 2019 at 18:12, p sz <[hidden email]> wrote:
Hi Maker author
I have been using Maker for long years and recently, I've tried to train agustus using the snap training files. To do this, I have used the train_augustus.pl script as follows:

zff2genbank.pl export.ann export.dna > final_genes.gb
train_augustus.pl final_genes.gb MyOrg

For each of my gene models the error is the following:

Constructing GenBank feature: Feature begins after it ends: 1006..1001,2051..1917,7791..7689,7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,13315..11920,13598..13511,14971..14945,18637..18471,18898..18821,20558..20389,21067..20923,23249..23004,23549..23354,23647..23624
GBProcessor::getGeneList(): GBFeature constructor:Format error when reading genbank format.
Encountered error after reading 0 annotations.

The export files are generated with SNAP as described by your reference guides (two maker rounds). The issue seems related with the sense of the gene model that can be inspected here:
(export.ann file)
>MODEL236
Eterm   23624   23647   MODEL236
Exon    23354   23549   MODEL236
Exon    23004   23249   MODEL236
Exon    20923   21067   MODEL236
Exon    20389   20558   MODEL236
Exon    18821   18898   MODEL236
Exon    18471   18637   MODEL236
Exon    14945   14971   MODEL236
Exon    13511   13598   MODEL236
Exon    11920   13315   MODEL236
Exon    10459   10467   MODEL236
Exon    8873    9050    MODEL236
Exon    8628    8775    MODEL236
Exon    8374    8485    MODEL236
Exon    7880    7993    MODEL236
Exon    7689    7791    MODEL236
Exon    1917    2051    MODEL236
Einit   1001    1006    MODEL236

(genbankfile)
LOCUS       MODEL236               24647 bp    dna     linear   UNK
ACCESSION   unknown
FEATURES             Location/Qualifiers
     source          1..24647
     CDS             complement(join(1006..1001,2051..1917,7791..7689,
                     7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,
                     13315..11920,13598..13511,14971..14945,18637..18471,
                     18898..18821,20558..20389,21067..20923,23249..23004,
                     23549..23354,23647..23624))

It seems that augustus needs the direct sense description of the gene model in order to read the gb file and perform the training. How  could I fix the problem?
Thank you in advance
Pedro Seoane
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Xabier Vázquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: RV: Problem training agustus

djerroud samia
Hello, thank your for the share, like I said the genbank format is quite important for me. What I really need is to get all this different informations 

cds: accession, protein product, protein sequence, start, end , locus_tag, gene, .....
My problem is I don't know the pipeline I should follow to get all this informaitions

thank you, Samia

Le mar. 21 mai 2019 à 07:25, p sz <[hidden email]> a écrit :
Hi Xavier
I've changed from zff2genbank.pl to zff2augustus_gbk.pl  and the the problem is fixed. I used zff2genbank.pl  because it is packaged into the maker suite. I think that MAKER authors should include zff2augustus_gbk.pl into the main suite, I didn't know about this genome-scripts repository.
By the way, I would like to show you my training steps with augustus, in order to know if they are correct:

zff2augustus_gbk.pl export.ann export.dna > final_genes.gb
randomSplit.pl final_genes.gb 500
new_species.pl --species=Demo
etraining --species=Demo final_genes.gb
optimize_augustus.pl --species=Demo --onlytrain=final_genes.gb.train  final_genes.gb.test
etraining --species=Demo final_genes.gb

I have taken the training parameters (excepting the 500 parameter) from the train_augustus.pl script included in MAKER suite.
Thank you in advance
Pedro Seoane

De: Xabier Vázquez-Campos <[hidden email]>
Enviado: jueves, 16 de mayo de 2019 4:42
Para: p sz
Cc: [hidden email]
Asunto: Re: [maker-devel] RV: Problem training agustus
 
Hi Pedro,
I checked some of my files and there is no issue with a model in the inverse order. And the gb files generated look fine.
You need to use zff2augustus_gbk.pl not zff2genbank.pl. I don't remember the differences but I know that zff2augustus_gbk.pl works for sure.

Cheers,
Xabi

On Tue, 14 May 2019 at 18:12, p sz <[hidden email]> wrote:
Hi Maker author
I have been using Maker for long years and recently, I've tried to train agustus using the snap training files. To do this, I have used the train_augustus.pl script as follows:

zff2genbank.pl export.ann export.dna > final_genes.gb
train_augustus.pl final_genes.gb MyOrg

For each of my gene models the error is the following:

Constructing GenBank feature: Feature begins after it ends: 1006..1001,2051..1917,7791..7689,7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,13315..11920,13598..13511,14971..14945,18637..18471,18898..18821,20558..20389,21067..20923,23249..23004,23549..23354,23647..23624
GBProcessor::getGeneList(): GBFeature constructor:Format error when reading genbank format.
Encountered error after reading 0 annotations.

The export files are generated with SNAP as described by your reference guides (two maker rounds). The issue seems related with the sense of the gene model that can be inspected here:
(export.ann file)
>MODEL236
Eterm   23624   23647   MODEL236
Exon    23354   23549   MODEL236
Exon    23004   23249   MODEL236
Exon    20923   21067   MODEL236
Exon    20389   20558   MODEL236
Exon    18821   18898   MODEL236
Exon    18471   18637   MODEL236
Exon    14945   14971   MODEL236
Exon    13511   13598   MODEL236
Exon    11920   13315   MODEL236
Exon    10459   10467   MODEL236
Exon    8873    9050    MODEL236
Exon    8628    8775    MODEL236
Exon    8374    8485    MODEL236
Exon    7880    7993    MODEL236
Exon    7689    7791    MODEL236
Exon    1917    2051    MODEL236
Einit   1001    1006    MODEL236

(genbankfile)
LOCUS       MODEL236               24647 bp    dna     linear   UNK
ACCESSION   unknown
FEATURES             Location/Qualifiers
     source          1..24647
     CDS             complement(join(1006..1001,2051..1917,7791..7689,
                     7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,
                     13315..11920,13598..13511,14971..14945,18637..18471,
                     18898..18821,20558..20389,21067..20923,23249..23004,
                     23549..23354,23647..23624))

It seems that augustus needs the direct sense description of the gene model in order to read the gb file and perform the training. How  could I fix the problem?
Thank you in advance
Pedro Seoane
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Xabier Vázquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: RV: Problem training agustus

Xabier Vázquez-Campos
In reply to this post by seoanezonjic
Hi Pedro,

this is what I did the last time I run it. I had RNAseq, the genome is soft-masked and the training set is the one from zff2augustus_gbk.pl. You only need one step with autoAug.pl, it should be included with your installation of AUGUSTUS

AUG_TRAIN= #workingdir
MYSP= #name used for the gene model in the augustus config folder
RNA_SEQ= #assembled RNAseq data
GENOME= #soft-masked genome
TRAIN=${BASE}/snap1/pugra.train1.gb

mkdir -p ${AUG_TRAIN}
cd ${AUG_TRAIN}

autoAug.pl --noninteractive -v -v -v \
--cpus=${PBS_NUM_PPN} \
--maxIntronLen=3000 \
--species=${MYSP} \
--genome=${GENOME} \
--cdna=${RNA_SEQ} \
--trainingset=${TRAIN} \
--optrounds=5 \
--workingdir=${AUG_TRAIN}

On Tue, 21 May 2019 at 21:25, p sz <[hidden email]> wrote:
Hi Xavier
I've changed from zff2genbank.pl to zff2augustus_gbk.pl  and the the problem is fixed. I used zff2genbank.pl  because it is packaged into the maker suite. I think that MAKER authors should include zff2augustus_gbk.pl into the main suite, I didn't know about this genome-scripts repository.
By the way, I would like to show you my training steps with augustus, in order to know if they are correct:

zff2augustus_gbk.pl export.ann export.dna > final_genes.gb
randomSplit.pl final_genes.gb 500
new_species.pl --species=Demo
etraining --species=Demo final_genes.gb
optimize_augustus.pl --species=Demo --onlytrain=final_genes.gb.train  final_genes.gb.test
etraining --species=Demo final_genes.gb

I have taken the training parameters (excepting the 500 parameter) from the train_augustus.pl script included in MAKER suite.
Thank you in advance
Pedro Seoane

De: Xabier Vázquez-Campos <[hidden email]>
Enviado: jueves, 16 de mayo de 2019 4:42
Para: p sz
Cc: [hidden email]
Asunto: Re: [maker-devel] RV: Problem training agustus
 
Hi Pedro,
I checked some of my files and there is no issue with a model in the inverse order. And the gb files generated look fine.
You need to use zff2augustus_gbk.pl not zff2genbank.pl. I don't remember the differences but I know that zff2augustus_gbk.pl works for sure.

Cheers,
Xabi

On Tue, 14 May 2019 at 18:12, p sz <[hidden email]> wrote:
Hi Maker author
I have been using Maker for long years and recently, I've tried to train agustus using the snap training files. To do this, I have used the train_augustus.pl script as follows:

zff2genbank.pl export.ann export.dna > final_genes.gb
train_augustus.pl final_genes.gb MyOrg

For each of my gene models the error is the following:

Constructing GenBank feature: Feature begins after it ends: 1006..1001,2051..1917,7791..7689,7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,13315..11920,13598..13511,14971..14945,18637..18471,18898..18821,20558..20389,21067..20923,23249..23004,23549..23354,23647..23624
GBProcessor::getGeneList(): GBFeature constructor:Format error when reading genbank format.
Encountered error after reading 0 annotations.

The export files are generated with SNAP as described by your reference guides (two maker rounds). The issue seems related with the sense of the gene model that can be inspected here:
(export.ann file)
>MODEL236
Eterm   23624   23647   MODEL236
Exon    23354   23549   MODEL236
Exon    23004   23249   MODEL236
Exon    20923   21067   MODEL236
Exon    20389   20558   MODEL236
Exon    18821   18898   MODEL236
Exon    18471   18637   MODEL236
Exon    14945   14971   MODEL236
Exon    13511   13598   MODEL236
Exon    11920   13315   MODEL236
Exon    10459   10467   MODEL236
Exon    8873    9050    MODEL236
Exon    8628    8775    MODEL236
Exon    8374    8485    MODEL236
Exon    7880    7993    MODEL236
Exon    7689    7791    MODEL236
Exon    1917    2051    MODEL236
Einit   1001    1006    MODEL236

(genbankfile)
LOCUS       MODEL236               24647 bp    dna     linear   UNK
ACCESSION   unknown
FEATURES             Location/Qualifiers
     source          1..24647
     CDS             complement(join(1006..1001,2051..1917,7791..7689,
                     7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,
                     13315..11920,13598..13511,14971..14945,18637..18471,
                     18898..18821,20558..20389,21067..20923,23249..23004,
                     23549..23354,23647..23624))

It seems that augustus needs the direct sense description of the gene model in order to read the gb file and perform the training. How  could I fix the problem?
Thank you in advance
Pedro Seoane
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Xabier Vázquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA


--
Xabier Vázquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Problem training agustus

Carson Holt-2
In reply to this post by djerroud samia
If you are looking to submit to Genebank, tools like GAL are good (https://github.com/The-Sequence-Ontology/GAL).

Things like accession which you listed below only exist inside of GeneBank (i.e. after you submit). They are not produced by pipelines or format converters.

—Carson



On May 21, 2019, at 6:11 AM, djerroud samia <[hidden email]> wrote:

Hello, thank your for the share, like I said the genbank format is quite important for me. What I really need is to get all this different informations 

cds: accession, protein product, protein sequence, start, end , locus_tag, gene, .....
My problem is I don't know the pipeline I should follow to get all this informaitions

thank you, Samia

Le mar. 21 mai 2019 à 07:25, p sz <[hidden email]> a écrit :
Hi Xavier
I've changed from zff2genbank.pl to zff2augustus_gbk.pl  and the the problem is fixed. I used zff2genbank.pl  because it is packaged into the maker suite. I think that MAKER authors should include zff2augustus_gbk.pl into the main suite, I didn't know about this genome-scripts repository.
By the way, I would like to show you my training steps with augustus, in order to know if they are correct:

zff2augustus_gbk.pl export.ann export.dna > final_genes.gb
randomSplit.pl final_genes.gb 500
new_species.pl --species=Demo
etraining --species=Demo final_genes.gb
optimize_augustus.pl --species=Demo --onlytrain=final_genes.gb.train  final_genes.gb.test
etraining --species=Demo final_genes.gb

I have taken the training parameters (excepting the 500 parameter) from the train_augustus.pl script included in MAKER suite.
Thank you in advance
Pedro Seoane

De: Xabier Vázquez-Campos <[hidden email]>
Enviado: jueves, 16 de mayo de 2019 4:42
Para: p sz
Cc: [hidden email]
Asunto: Re: [maker-devel] RV: Problem training agustus
 
Hi Pedro,
I checked some of my files and there is no issue with a model in the inverse order. And the gb files generated look fine.
You need to use zff2augustus_gbk.pl not zff2genbank.pl. I don't remember the differences but I know that zff2augustus_gbk.pl works for sure.

Cheers,
Xabi

On Tue, 14 May 2019 at 18:12, p sz <[hidden email]> wrote:
Hi Maker author
I have been using Maker for long years and recently, I've tried to train agustus using the snap training files. To do this, I have used the train_augustus.pl script as follows:

zff2genbank.pl export.ann export.dna > final_genes.gb
train_augustus.pl final_genes.gb MyOrg

For each of my gene models the error is the following:

Constructing GenBank feature: Feature begins after it ends: 1006..1001,2051..1917,7791..7689,7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,13315..11920,13598..13511,14971..14945,18637..18471,18898..18821,20558..20389,21067..20923,23249..23004,23549..23354,23647..23624
GBProcessor::getGeneList(): GBFeature constructor:Format error when reading genbank format.
Encountered error after reading 0 annotations.

The export files are generated with SNAP as described by your reference guides (two maker rounds). The issue seems related with the sense of the gene model that can be inspected here:
(export.ann file)
>MODEL236
Eterm   23624   23647   MODEL236
Exon    23354   23549   MODEL236
Exon    23004   23249   MODEL236
Exon    20923   21067   MODEL236
Exon    20389   20558   MODEL236
Exon    18821   18898   MODEL236
Exon    18471   18637   MODEL236
Exon    14945   14971   MODEL236
Exon    13511   13598   MODEL236
Exon    11920   13315   MODEL236
Exon    10459   10467   MODEL236
Exon    8873    9050    MODEL236
Exon    8628    8775    MODEL236
Exon    8374    8485    MODEL236
Exon    7880    7993    MODEL236
Exon    7689    7791    MODEL236
Exon    1917    2051    MODEL236
Einit   1001    1006    MODEL236

(genbankfile)
LOCUS       MODEL236               24647 bp    dna     linear   UNK
ACCESSION   unknown
FEATURES             Location/Qualifiers
     source          1..24647
     CDS             complement(join(1006..1001,2051..1917,7791..7689,
                     7993..7880,8485..8374,8775..8628,9050..8873,10467..10459,
                     13315..11920,13598..13511,14971..14945,18637..18471,
                     18898..18821,20558..20389,21067..20923,23249..23004,
                     23549..23354,23647..23624))

It seems that augustus needs the direct sense description of the gene model in order to read the gb file and perform the training. How  could I fix the problem?
Thank you in advance
Pedro Seoane
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


--
Xabier Vázquez-Campos, PhD
Research Associate
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org