[Gmod-ajax] flatfile-to-json.pl error with GFF

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Gmod-ajax] flatfile-to-json.pl error with GFF

David Breimann
I am trying to set up my first genome, after successfully playing with
the tutorial examples. and I run into some problems.

I use a fasta and a gff file from NCBI:
ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.fna
ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.gff

Setting up the sequence file seems to pass OK, but when I run
flatfile-to-json.pl with the GFF I get an error:


../../../jbrowse/bin/flatfile-to-json.pl --gff NC_007777.gff
--tracklabel test -key test

working on seq gi|86738724|ref|NC_007777.1|
Use of uninitialized value in string eq at
../../../jbrowse/bin/flatfile-to-json.pl line 179, <GEN2> line 24.

What's wrong?

Thank you,
David

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: flatfile-to-json.pl error with GFF

Scott Cain
Hi David,

The NCBI GFF3 is notoriously bad and doesn't pass validation at the
GFF3 validator:

  http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online

The most notable problems actually have to do with the relationships
between features.  For example, in the first few lines:

NC_007777.1     RefSeq  gene    35      1723    .       +       .
 locus_tag=Francci3_0001;db_xref=GeneID:3902947
NC_007777.1     RefSeq  CDS     35      1720    .       +       0
 locus_tag=Francci3_0001;transl_table=11;product=chromosomal
replication initiator protein
DnaA;protein_id=YP_479125.1;db_xref=GI:86738725;db_xref=InterPro:IPR001957;db_xref=InterPro:IPR003593;db_xref=InterPro:IPR013159;db_xref=InterPro:IPR013317;db_xref=GeneID:3902947;exon_number=1

While there is not anything technically wrong with these two lines,
there is what you might call a logic error: the CDS should have the
gene as a parent.  Without that information, a genome browser is going
to have a difficult time displaying the data appropriately.  Feel free
to complain to the folks at NCBI that there GFF3 is really bad (I've
done that a few times, but I think they are ignoring me :-)

So, the question is, what should you use?  The best option I can
suggest to you is the genbank2gff3 script that comes with BioPerl,
called bp_genbank2gff3.pl. If you get the developers version from
github, you can use a version of that script that has been fixed to
work appropriately with bacterial/circular genomes.

Scott


On Fri, Jul 23, 2010 at 10:54 AM, David Breimann
<[hidden email]> wrote:

> I am trying to set up my first genome, after successfully playing with
> the tutorial examples. and I run into some problems.
>
> I use a fasta and a gff file from NCBI:
> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.fna
> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.gff
>
> Setting up the sequence file seems to pass OK, but when I run
> flatfile-to-json.pl with the GFF I get an error:
>
>
> ../../../jbrowse/bin/flatfile-to-json.pl --gff NC_007777.gff
> --tracklabel test -key test
>
> working on seq gi|86738724|ref|NC_007777.1|
> Use of uninitialized value in string eq at
> ../../../jbrowse/bin/flatfile-to-json.pl line 179, <GEN2> line 24.
>
> What's wrong?
>
> Thank you,
> David
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Gmod-ajax mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: flatfile-to-json.pl error with GFF

Mitch Skinner
In reply to this post by David Breimann
This one is usually the result of a previous flatfile-to-json.pl or
biodb-to-json.pl run that didn't specify a track label.

Look in data/trackInfo.js for an entry that doesn't have a "label"
attribute.  Entries are supposed to look something like this:

    {
       "url" : "data/tracks/{refseq}/gene/trackData.json",
       "label" : "gene",
       "type" : "FeatureTrack",
       "key" : "Gene"
    },

but if "label" is missing then you'll see that error message.

I thought I added a check that would give a nicer error if the user
forgets to specify a label, but that might have been only on the
development branch.  I'll check again.

Mitch

On 07/23/2010 07:54 AM, David Breimann wrote:

> I am trying to set up my first genome, after successfully playing with
> the tutorial examples. and I run into some problems.
>
> I use a fasta and a gff file from NCBI:
> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.fna
> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.gff
>
> Setting up the sequence file seems to pass OK, but when I run
> flatfile-to-json.pl with the GFF I get an error:
>
>
> ../../../jbrowse/bin/flatfile-to-json.pl --gff NC_007777.gff
> --tracklabel test -key test
>
> working on seq gi|86738724|ref|NC_007777.1|
> Use of uninitialized value in string eq at
> ../../../jbrowse/bin/flatfile-to-json.pl line 179,<GEN2>  line 24.
>
> What's wrong?
>
> Thank you,
> David
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Gmod-ajax mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
>    


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: flatfile-to-json.pl error with GFF

David Breimann
Thank you Scott and Mitch!

On Fri, Jul 23, 2010 at 6:25 PM, Mitch Skinner
<[hidden email]> wrote:

> This one is usually the result of a previous flatfile-to-json.pl or
> biodb-to-json.pl run that didn't specify a track label.
>
> Look in data/trackInfo.js for an entry that doesn't have a "label"
> attribute.  Entries are supposed to look something like this:
>
>   {
>      "url" : "data/tracks/{refseq}/gene/trackData.json",
>      "label" : "gene",
>      "type" : "FeatureTrack",
>      "key" : "Gene"
>   },
>
> but if "label" is missing then you'll see that error message.
>
> I thought I added a check that would give a nicer error if the user forgets
> to specify a label, but that might have been only on the development branch.
>  I'll check again.
>
> Mitch
>
> On 07/23/2010 07:54 AM, David Breimann wrote:
>>
>> I am trying to set up my first genome, after successfully playing with
>> the tutorial examples. and I run into some problems.
>>
>> I use a fasta and a gff file from NCBI:
>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.fna
>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.gff
>>
>> Setting up the sequence file seems to pass OK, but when I run
>> flatfile-to-json.pl with the GFF I get an error:
>>
>>
>> ../../../jbrowse/bin/flatfile-to-json.pl --gff NC_007777.gff
>> --tracklabel test -key test
>>
>> working on seq gi|86738724|ref|NC_007777.1|
>> Use of uninitialized value in string eq at
>> ../../../jbrowse/bin/flatfile-to-json.pl line 179,<GEN2>  line 24.
>>
>> What's wrong?
>>
>> Thank you,
>> David
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Sprint
>> What will you do first with EVO, the first 4G phone?
>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>> _______________________________________________
>> Gmod-ajax mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
>>
>
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: flatfile-to-json.pl error with GFF

Scott Cain
In reply to this post by Scott Cain
Hi Dave,

Please keep your responses on the list so they can be archived.

I'm also cc'ing Nathan Liles, who did the work on the genbank2gff3
script to deal with bacterial genomes.  Perhaps Nathan can take a look
at this genbank entry and see more quickly what the problem is.

Thanks,
Scott




On Sun, Jul 25, 2010 at 8:26 AM, David Breimann
<[hidden email]> wrote:

> Scott,
>
> I cloned the latest version of bioperl from github (I'm not sure what you
> mean by developers version; I thought the dev branch is obsolete but I'm not
> sure; anyway - I got the version from bioperl-live).
> bp_genbank2gff3.pl fails exactly on features which are on the margin, e.g.
> "Ranges not in correct order. Strange ensembl genbank entry? Range:
> [207497,208369] [1,687]".
>
> Thanks,
> Dave
>
> On Fri, Jul 23, 2010 at 6:10 PM, Scott Cain <[hidden email]> wrote:
>> Hi David,
>>
>> The NCBI GFF3 is notoriously bad and doesn't pass validation at the
>> GFF3 validator:
>>
>>  http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online
>>
>> The most notable problems actually have to do with the relationships
>> between features.  For example, in the first few lines:
>>
>> NC_007777.1     RefSeq  gene    35      1723    .       +       .
>>  locus_tag=Francci3_0001;db_xref=GeneID:3902947
>> NC_007777.1     RefSeq  CDS     35      1720    .       +       0
>>  locus_tag=Francci3_0001;transl_table=11;product=chromosomal
>> replication initiator protein
>>
>> DnaA;protein_id=YP_479125.1;db_xref=GI:86738725;db_xref=InterPro:IPR001957;db_xref=InterPro:IPR003593;db_xref=InterPro:IPR013159;db_xref=InterPro:IPR013317;db_xref=GeneID:3902947;exon_number=1
>>
>> While there is not anything technically wrong with these two lines,
>> there is what you might call a logic error: the CDS should have the
>> gene as a parent.  Without that information, a genome browser is going
>> to have a difficult time displaying the data appropriately.  Feel free
>> to complain to the folks at NCBI that there GFF3 is really bad (I've
>> done that a few times, but I think they are ignoring me :-)
>>
>> So, the question is, what should you use?  The best option I can
>> suggest to you is the genbank2gff3 script that comes with BioPerl,
>> called bp_genbank2gff3.pl. If you get the developers version from
>> github, you can use a version of that script that has been fixed to
>> work appropriately with bacterial/circular genomes.
>>
>> Scott
>>
>>
>> On Fri, Jul 23, 2010 at 10:54 AM, David Breimann
>> <[hidden email]> wrote:
>>> I am trying to set up my first genome, after successfully playing with
>>> the tutorial examples. and I run into some problems.
>>>
>>> I use a fasta and a gff file from NCBI:
>>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.fna
>>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.gff
>>>
>>> Setting up the sequence file seems to pass OK, but when I run
>>> flatfile-to-json.pl with the GFF I get an error:
>>>
>>>
>>> ../../../jbrowse/bin/flatfile-to-json.pl --gff NC_007777.gff
>>> --tracklabel test -key test
>>>
>>> working on seq gi|86738724|ref|NC_007777.1|
>>> Use of uninitialized value in string eq at
>>> ../../../jbrowse/bin/flatfile-to-json.pl line 179, <GEN2> line 24.
>>>
>>> What's wrong?
>>>
>>> Thank you,
>>> David
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Sprint
>>> What will you do first with EVO, the first 4G phone?
>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>> _______________________________________________
>>> Gmod-ajax mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>>
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: flatfile-to-json.pl error with GFF

David Breimann
Hi again guys,

Here is a example:
NC_006578.gbk from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_thuringiensis_konkukian/NC_006578.gbk has a gene that spans join(77090..77112,1..586) and it produces an error:

$ bp_genbank2gff3.pl -y NC_006578.gbk
# Input: NC_006578.gbk
# working on region:NC_006578, Bacillus thuringiensis serovar konkukian str. 97-27, 23-JUL-2008, Bacillus thuringiensis serovar konkukian str. 97-27 plasmid pBT9727, complete sequence.
NC_006578 Unflattening error:
Details:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: PROBLEM, SEVERITY==2
Ranges not in correct order. Strange ensembl genbank entry? Range: [77090,77112] [1,586]
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:473
STACK: Bio::SeqFeature::Tools::Unflattener::problem /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
STACK: Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
STACK: /usr/local/bin/bp_genbank2gff3.pl:506
-----------------------------------------------------------

# Possible gene unflattening error withNC_006578: consult STDERR
# GFF3 saved to ./NC_006578.gff; DNA saved to ./NC_006578.fa

Another similar example: NC_008785.gbk from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Burkholderia_mallei_SAVP1/NC_008785.gbk.

Best,
Dave


On Mon, Jul 26, 2010 at 6:20 PM, Scott Cain <[hidden email]> wrote:
Hi Dave,

Please keep your responses on the list so they can be archived.

I'm also cc'ing Nathan Liles, who did the work on the genbank2gff3
script to deal with bacterial genomes.  Perhaps Nathan can take a look
at this genbank entry and see more quickly what the problem is.

Thanks,
Scott




On Sun, Jul 25, 2010 at 8:26 AM, David Breimann
<[hidden email]> wrote:
> Scott,
>
> I cloned the latest version of bioperl from github (I'm not sure what you
> mean by developers version; I thought the dev branch is obsolete but I'm not
> sure; anyway - I got the version from bioperl-live).
> bp_genbank2gff3.pl fails exactly on features which are on the margin, e.g.
> "Ranges not in correct order. Strange ensembl genbank entry? Range:
> [207497,208369] [1,687]".
>
> Thanks,
> Dave
>
> On Fri, Jul 23, 2010 at 6:10 PM, Scott Cain <[hidden email]> wrote:
>> Hi David,
>>
>> The NCBI GFF3 is notoriously bad and doesn't pass validation at the
>> GFF3 validator:
>>
>>  http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online
>>
>> The most notable problems actually have to do with the relationships
>> between features.  For example, in the first few lines:
>>
>> NC_007777.1     RefSeq  gene    35      1723    .       +       .
>>  locus_tag=Francci3_0001;db_xref=GeneID:3902947
>> NC_007777.1     RefSeq  CDS     35      1720    .       +       0
>>  locus_tag=Francci3_0001;transl_table=11;product=chromosomal
>> replication initiator protein
>>
>> DnaA;protein_id=YP_479125.1;db_xref=GI:86738725;db_xref=InterPro:IPR001957;db_xref=InterPro:IPR003593;db_xref=InterPro:IPR013159;db_xref=InterPro:IPR013317;db_xref=GeneID:3902947;exon_number=1
>>
>> While there is not anything technically wrong with these two lines,
>> there is what you might call a logic error: the CDS should have the
>> gene as a parent.  Without that information, a genome browser is going
>> to have a difficult time displaying the data appropriately.  Feel free
>> to complain to the folks at NCBI that there GFF3 is really bad (I've
>> done that a few times, but I think they are ignoring me :-)
>>
>> So, the question is, what should you use?  The best option I can
>> suggest to you is the genbank2gff3 script that comes with BioPerl,
>> called bp_genbank2gff3.pl. If you get the developers version from
>> github, you can use a version of that script that has been fixed to
>> work appropriately with bacterial/circular genomes.
>>
>> Scott
>>
>>
>> On Fri, Jul 23, 2010 at 10:54 AM, David Breimann
>> <[hidden email]> wrote:
>>> I am trying to set up my first genome, after successfully playing with
>>> the tutorial examples. and I run into some problems.
>>>
>>> I use a fasta and a gff file from NCBI:
>>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.fna
>>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.gff
>>>
>>> Setting up the sequence file seems to pass OK, but when I run
>>> flatfile-to-json.pl with the GFF I get an error:
>>>
>>>
>>> ../../../jbrowse/bin/flatfile-to-json.pl --gff NC_007777.gff
>>> --tracklabel test -key test
>>>
>>> working on seq gi|86738724|ref|NC_007777.1|
>>> Use of uninitialized value in string eq at
>>> ../../../jbrowse/bin/flatfile-to-json.pl line 179, <GEN2> line 24.
>>>
>>> What's wrong?
>>>
>>> Thank you,
>>> David
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Sprint
>>> What will you do first with EVO, the first 4G phone?
>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>> _______________________________________________
>>> Gmod-ajax mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>>
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: flatfile-to-json.pl error with GFF

Chris Mungall-3

I created a branch of bioperl-live called "circular" and have  
committed a more circular-genome lenient version of the unflattener  
which works on NC_006578. It's quite possible that more needs to be  
done here to satisfy this part of the gff3 spec:

>> For features that cross the origin of a circular feature (e.g. most  
>> bacterial genomes, plasmids, and some viral genomes), the  
>> requirement for start to be less than or equal to end is satisfied  
>> by making end = the position of the end + the length of the  
>> landmark feature.

the current behavior generates the following for NC_006578:

NC_006578       GenBank gene    77090   77112   .       -        
1       ID=pBT9727_0001;Dbxref=GeneID:3200518;locus_tag=pBT9727_0001
NC_006578       GenBank gene    1       586     .       -        
1       ID=pBT9727_0001;Dbxref=GeneID:3200518;locus_tag=pBT9727_0001

the smart thing to do would be to translate this to a single gene  
feature:
77090.. 77698

but I'm not sure how conformant the full bio*/gmod toolchain is to  
this part of the spec

On Jul 26, 2010, at 11:14 PM, David Breimann wrote:

> Hi again guys,
>
> Here is a example:
> NC_006578.gbk from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_thuringiensis_konkukian/NC_006578.gbk
>  has a gene that spans join(77090..77112,1..586) and it produces an  
> error:
>
> $ bp_genbank2gff3.pl -y NC_006578.gbk
> # Input: NC_006578.gbk
> # working on region:NC_006578, Bacillus thuringiensis serovar  
> konkukian str. 97-27, 23-JUL-2008, Bacillus thuringiensis serovar  
> konkukian str. 97-27 plasmid pBT9727, complete sequence.
> NC_006578 Unflattening error:
> Details:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: PROBLEM, SEVERITY==2
> Ranges not in correct order. Strange ensembl genbank entry? Range:  
> [77090,77112] [1,586]
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/
> Root.pm:473
> STACK: Bio::SeqFeature::Tools::Unflattener::problem /usr/local/share/
> perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:952
> STACK:  
> Bio::SeqFeature::Tools::Unflattener::_check_order_is_consistent /usr/
> local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2842
> STACK: Bio::SeqFeature::Tools::Unflattener::infer_mRNA_from_CDS /usr/
> local/share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:2713
> STACK: Bio::SeqFeature::Tools::Unflattener::unflatten_seq /usr/local/
> share/perl/5.10.1/Bio/SeqFeature/Tools/Unflattener.pm:1532
> STACK: main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:1023
> STACK: /usr/local/bin/bp_genbank2gff3.pl:506
> -----------------------------------------------------------
>
> # Possible gene unflattening error withNC_006578: consult STDERR
> # GFF3 saved to ./NC_006578.gff; DNA saved to ./NC_006578.fa
>
> Another similar example: NC_008785.gbk from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Burkholderia_mallei_SAVP1/NC_008785.gbk
> .
>
> Best,
> Dave
>
>
> On Mon, Jul 26, 2010 at 6:20 PM, Scott Cain <[hidden email]>  
> wrote:
> Hi Dave,
>
> Please keep your responses on the list so they can be archived.
>
> I'm also cc'ing Nathan Liles, who did the work on the genbank2gff3
> script to deal with bacterial genomes.  Perhaps Nathan can take a look
> at this genbank entry and see more quickly what the problem is.
>
> Thanks,
> Scott
>
>
>
>
> On Sun, Jul 25, 2010 at 8:26 AM, David Breimann
> <[hidden email]> wrote:
> > Scott,
> >
> > I cloned the latest version of bioperl from github (I'm not sure  
> what you
> > mean by developers version; I thought the dev branch is obsolete  
> but I'm not
> > sure; anyway - I got the version from bioperl-live).
> > bp_genbank2gff3.pl fails exactly on features which are on the  
> margin, e.g.
> > "Ranges not in correct order. Strange ensembl genbank entry? Range:
> > [207497,208369] [1,687]".
> >
> > Thanks,
> > Dave
> >
> > On Fri, Jul 23, 2010 at 6:10 PM, Scott Cain <[hidden email]>  
> wrote:
> >> Hi David,
> >>
> >> The NCBI GFF3 is notoriously bad and doesn't pass validation at the
> >> GFF3 validator:
> >>
> >>  http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online
> >>
> >> The most notable problems actually have to do with the  
> relationships
> >> between features.  For example, in the first few lines:
> >>
> >> NC_007777.1     RefSeq  gene    35      1723    .       +       .
> >>  locus_tag=Francci3_0001;db_xref=GeneID:3902947
> >> NC_007777.1     RefSeq  CDS     35      1720    .       +       0
> >>  locus_tag=Francci3_0001;transl_table=11;product=chromosomal
> >> replication initiator protein
> >>
> >> DnaA;protein_id=YP_479125.1;db_xref=GI:
> 86738725
> ;db_xref
> =
> InterPro:IPR001957
> ;db_xref
> =
> InterPro:IPR003593
> ;db_xref
> =InterPro:IPR013159;db_xref=InterPro:IPR013317;db_xref=GeneID:
> 3902947;exon_number=1
> >>
> >> While there is not anything technically wrong with these two lines,
> >> there is what you might call a logic error: the CDS should have the
> >> gene as a parent.  Without that information, a genome browser is  
> going
> >> to have a difficult time displaying the data appropriately.  Feel  
> free
> >> to complain to the folks at NCBI that there GFF3 is really bad  
> (I've
> >> done that a few times, but I think they are ignoring me :-)
> >>
> >> So, the question is, what should you use?  The best option I can
> >> suggest to you is the genbank2gff3 script that comes with BioPerl,
> >> called bp_genbank2gff3.pl. If you get the developers version from
> >> github, you can use a version of that script that has been fixed to
> >> work appropriately with bacterial/circular genomes.
> >>
> >> Scott
> >>
> >>
> >> On Fri, Jul 23, 2010 at 10:54 AM, David Breimann
> >> <[hidden email]> wrote:
> >>> I am trying to set up my first genome, after successfully  
> playing with
> >>> the tutorial examples. and I run into some problems.
> >>>
> >>> I use a fasta and a gff file from NCBI:
> >>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.fna
> >>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Frankia_CcI3/NC_007777.gff
> >>>
> >>> Setting up the sequence file seems to pass OK, but when I run
> >>> flatfile-to-json.pl with the GFF I get an error:
> >>>
> >>>
> >>> ../../../jbrowse/bin/flatfile-to-json.pl --gff NC_007777.gff
> >>> --tracklabel test -key test
> >>>
> >>> working on seq gi|86738724|ref|NC_007777.1|
> >>> Use of uninitialized value in string eq at
> >>> ../../../jbrowse/bin/flatfile-to-json.pl line 179, <GEN2> line 24.
> >>>
> >>> What's wrong?
> >>>
> >>> Thank you,
> >>> David
> >>>
> >>>
> >>>  
> ------------------------------------------------------------------------------
> >>> This SF.net email is sponsored by Sprint
> >>> What will you do first with EVO, the first 4G phone?
> >>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> >>> _______________________________________________
> >>> Gmod-ajax mailing list
> >>> [hidden email]
> >>> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
> >>>
> >>
> >>
> >>
> >> --
> >>  
> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.                                   scott at  
> scottcain
> >> dot net
> >> GMOD Coordinator (http://gmod.org/)                      
> 216-392-3087
> >> Ontario Institute for Cancer Research
> >>
> >
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at  
> scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>
> ------------------------------------------------------------------------------
> The Palm PDK Hot Apps Program offers developers who use the
> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
> of $1 Million in cash or HP Products. Visit us here for more details:
> http://ad.doubleclick.net/clk;226879339;13503038;l?
> http://clk.atdmt.com/CRS/go/247765532/direct/01/_______________________________________________
> Gmod-ajax mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-ajax


------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax