failing to upload bacterial genome GFF from NCBI

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

failing to upload bacterial genome GFF from NCBI

Adam Witney

Has anyone managed to get NCBI GFF files for bacterial genomes uploaded
into chado? I seem to be running into constant errors.

I have removed this header line:

##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=282458

I have removed the Is_circular=true tag.

But now I am getting these errors:

$ trunk/chado/load/bin/gmod_bulk_load_gff3.pl --organism "Staphylococcus
aureus" --gfffile NC_002952.gff --recreate_cache
(Re)creating the uniquename cache in the database...
Creating table...
Populating table...
Creating indexes...
Adjusting the primary key sequences (if necessary)...Done.
Preparing data for inserting into the chado database
(This may take a while ...)
Unable to find srcfeature NC_002952.2 in the database.
Perhaps you need to rerun your data load with the '--recreate_cache'
option. at
/opt/perlbrew/perls/perl-5.14.2/lib/site_perl/5.14.2/Bio/GMOD/DB/Adapter.pm
line 4599, <GEN0> line 5.

Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x1a74f588)',
'Bio::SeqFeature::Annotated=HASH(0x1bcd5db0)') called at
/homedirs8/share/Tools/GMOD/trunk/chado/load/bin/gmod_bulk_load_gff3.pl
line 851

Abnormal termination, trying to clean up...

Attempting to clean up the loader temp table (so that --recreate_cache
won't be needed)...
Trying to remove the run lock (so that --remove_lock won't be needed)...
Exiting...

I am new to chado (although not to other gmod tools), but I can't seem
to find any GFF files that will upload.

Thanks for any help

Adam

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: failing to upload bacterial genome GFF from NCBI

Scott Cain
Hi Adam,

The first two problems are not really the GFF loader's fault because I don't think the species directive or the Is_circular tag are part of the GFF spec yet (though bioperl will have to be updated to accommodate them). You could convert the Is_circular to is_circular so that the information would still get stored (though I can't think of any tool that would do anything with that information--maybe Aretmis).

The problem that is holding you up though is technically a bug in the loader. Because of the way data is stored from GFF into chado, it wants the identifier for reference sequences to be in the feature's (ie, the chromosome's) ID tag, whereas the spec says it should be in the name tag. The work around is to edit the GFF file so that the ninth column has both Name and ID attributes that are identical to the value in the first column in the chromosome line. Sorry for the hassle.

Scott


Sent from my iPhone

On Apr 10, 2013, at 5:17 PM, Adam Witney <[hidden email]> wrote:

>
> Has anyone managed to get NCBI GFF files for bacterial genomes uploaded
> into chado? I seem to be running into constant errors.
>
> I have removed this header line:
>
> ##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=282458
>
> I have removed the Is_circular=true tag.
>
> But now I am getting these errors:
>
> $ trunk/chado/load/bin/gmod_bulk_load_gff3.pl --organism "Staphylococcus
> aureus" --gfffile NC_002952.gff --recreate_cache
> (Re)creating the uniquename cache in the database...
> Creating table...
> Populating table...
> Creating indexes...
> Adjusting the primary key sequences (if necessary)...Done.
> Preparing data for inserting into the chado database
> (This may take a while ...)
> Unable to find srcfeature NC_002952.2 in the database.
> Perhaps you need to rerun your data load with the '--recreate_cache'
> option. at
> /opt/perlbrew/perls/perl-5.14.2/lib/site_perl/5.14.2/Bio/GMOD/DB/Adapter.pm
> line 4599, <GEN0> line 5.
>
> Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x1a74f588)',
> 'Bio::SeqFeature::Annotated=HASH(0x1bcd5db0)') called at
> /homedirs8/share/Tools/GMOD/trunk/chado/load/bin/gmod_bulk_load_gff3.pl
> line 851
>
> Abnormal termination, trying to clean up...
>
> Attempting to clean up the loader temp table (so that --recreate_cache
> won't be needed)...
> Trying to remove the run lock (so that --remove_lock won't be needed)...
> Exiting...
>
> I am new to chado (although not to other gmod tools), but I can't seem
> to find any GFF files that will upload.
>
> Thanks for any help
>
> Adam
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: failing to upload bacterial genome GFF from NCBI

Adam Witney

Great thanks Scott, yes that fixed it.

I am trying to use chado as the backend to Artemis (from Sanger). I also
noticed that the chromosome line in the GFF has to have "chromosome" in
the 3rd column. And the fasta sequence must also be in the GFF file.

However, now I need to get the feature data to show up in Artemis, I
suspect this is again a GFF format issue, I have emailed the Artemis
developers for an example GFF that I can match up to. Unless anyone here
has been successful in using chado behind Artenmis?

Thanks again

Adam

On 10/04/2013 20:45, Scott Cain wrote:

> Hi Adam,
>
> The first two problems are not really the GFF loader's fault because I don't think the species directive or the Is_circular tag are part of the GFF spec yet (though bioperl will have to be updated to accommodate them). You could convert the Is_circular to is_circular so that the information would still get stored (though I can't think of any tool that would do anything with that information--maybe Aretmis).
>
> The problem that is holding you up though is technically a bug in the loader. Because of the way data is stored from GFF into chado, it wants the identifier for reference sequences to be in the feature's (ie, the chromosome's) ID tag, whereas the spec says it should be in the name tag. The work around is to edit the GFF file so that the ninth column has both Name and ID attributes that are identical to the value in the first column in the chromosome line. Sorry for the hassle.
>
> Scott
>
>
> Sent from my iPhone
>
> On Apr 10, 2013, at 5:17 PM, Adam Witney <[hidden email]> wrote:
>
>>
>> Has anyone managed to get NCBI GFF files for bacterial genomes uploaded
>> into chado? I seem to be running into constant errors.
>>
>> I have removed this header line:
>>
>> ##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=282458
>>
>> I have removed the Is_circular=true tag.
>>
>> But now I am getting these errors:
>>
>> $ trunk/chado/load/bin/gmod_bulk_load_gff3.pl --organism "Staphylococcus
>> aureus" --gfffile NC_002952.gff --recreate_cache
>> (Re)creating the uniquename cache in the database...
>> Creating table...
>> Populating table...
>> Creating indexes...
>> Adjusting the primary key sequences (if necessary)...Done.
>> Preparing data for inserting into the chado database
>> (This may take a while ...)
>> Unable to find srcfeature NC_002952.2 in the database.
>> Perhaps you need to rerun your data load with the '--recreate_cache'
>> option. at
>> /opt/perlbrew/perls/perl-5.14.2/lib/site_perl/5.14.2/Bio/GMOD/DB/Adapter.pm
>> line 4599, <GEN0> line 5.
>>
>> Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x1a74f588)',
>> 'Bio::SeqFeature::Annotated=HASH(0x1bcd5db0)') called at
>> /homedirs8/share/Tools/GMOD/trunk/chado/load/bin/gmod_bulk_load_gff3.pl
>> line 851
>>
>> Abnormal termination, trying to clean up...
>>
>> Attempting to clean up the loader temp table (so that --recreate_cache
>> won't be needed)...
>> Trying to remove the run lock (so that --remove_lock won't be needed)...
>> Exiting...
>>
>> I am new to chado (although not to other gmod tools), but I can't seem
>> to find any GFF files that will upload.
>>
>> Thanks for any help
>>
>> Adam
>>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema