Re: a question of loading the gff3 file into chado database (2)

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading the gff3 file into chado database (2)

Scott Cain
Hi Zhuang,

There is a line in your GFF file that looks like this:

MBU59461 GenBank region 1 155060 . + . ID=U59461;Name=MBU59461;Dbxref=taxon:207830;....

See how the ID and the string in the first column don't match?  That
is the problem.  While that isn't required for the GFF3 spec, it is
required by the Chado loader.  The other messages about CDS without a
parent are warnings indicating the GFF has CDS features that don't
belong to a gene or a transcript, and in the case that it is a spliced
CDS might lead to loading failures.  Since you're working baculovirus,
that shouldn't be a problem.

Scott


On Wed, May 26, 2010 at 6:25 AM, zhuang chao <[hidden email]> wrote:

> hi , all :
>
>  When I was loading the gff3 file into chado database using
> gmod_bulk_load_gff3.pl, I got  the errors like this:
>
>  ==================================================
>  Unable to find srcfeature MBU59461 in the database.
>  ===================================================
>
>  I don't know why and how to handle. Could you help me? I am looking
>
>  forward to your reply.  Thank you very much !
>
>
>  The   gff file   is  in  the  attachment  .  It  was  compressed  by
> tar .   Here  is   a  history  of how I  loaded  previous  data .
>
>  =======================================================================
>
> root@debian:/home/zc/Downloads#
> perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile   sequences.gb.gff
> --organism 'cdab' --dbname four_viruses  --dbuser zc --dbpass 123456
> --dbhost localhost --dbport 5432  --recreate_cache  --noexon
>
> (Re)creating the uniquename cache in the database...
> Creating table...
> Populating table...
> Creating indexes...Done.
> Preparing data for inserting into the four_viruses database
> (This may take a while ...)
>
>
> There is a CDS feature with no parent (ID:ODV-e28)  I think that is
> wrong!
>
>
> This GFF file has CDS and/or UTR features that do not belong to a
> 'central dogma' gene (ie, gene/transcript/CDS).  The features of
> this type are being stored in the database as is.
>
>
>
> There is a CDS feature with no parent (ID:ODV-e25)  I think that is
> wrong!
>
>
>
> There is a CDS feature with no parent (ID:helicase)  I think that is
> wrong!
>
> Dropping cds temp tables...
> Creating cds temp tables...
> NOTICE:  CREATE TABLE will create implicit sequence
> "tmp_cds_handler_cds_row_id_seq" for serial column
> "tmp_cds_handler.cds_row_id"
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> "tmp_cds_handler_pkey" for table "tmp_cds_handler"
> NOTICE:  CREATE TABLE will create implicit sequence
> "tmp_cds_handler_relationship_rel_row_id_seq" for serial column
> "tmp_cds_handler_relationship.rel_row_id"
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> "tmp_cds_handler_relationship_pkey" for table
> "tmp_cds_handler_relationship"
>
>
> There is a CDS feature with no parent (ID:NP_613096.1)  I think that is
> wrong!
>
>
>
> There is a CDS feature with no parent (ID:NP_613119.1)  I think that is
> wrong!
>
>
>
> There is a CDS feature with no parent (ID:NP_613164.1)  I think that is
> wrong!
>
>
>
> There is a CDS feature with no parent (ID:NP_613178.1)  I think that is
> wrong!
>
>
>
> There is a CDS feature with no parent (ID:NP_613188.1)  I think that is
> wrong!
>
>
>
> There is a CDS feature with no parent (ID:NP_613194.1)  I think that is
> wrong!
>
>
>
> There is a CDS feature with no parent (ID:NP_613231.1)  I think that is
> wrong!
>
>
>
> There is a CDS feature with no parent (ID:39k/pp31)  I think that is
> wrong!
>
>
>
> There is a CDS feature with no parent (ID:NP_613234.1)  I think that is
> wrong!
>
> Unable to find srcfeature MBU59461 in the database.
> Perhaps you need to rerun your data load with the '--recreate_cache'
> option. at /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 4026
>
> Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x34d6e30)', 'Bio::SeqFeature::Annotated=HASH(0x2cea988)') called at /usr/local/bin/gmod_bulk_load_gff3.pl line 758
> Issuing rollback() due to DESTROY without explicit disconnect() of
> DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
> at /usr/share/perl/5.10/Carp.pm line 45.
> ===================================================================================
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading the gff3 file into chado database (2)

zhuang chao
hi , scott:

   There are so many lines that  I can't  edit them manually.  I  want  

to  handle them automatically .Could you help me?  I am looking  

forward to your reply. Thank you very much !


On Wed, 2010-05-26 at 10:28 -0400, Scott Cain wrote:

> Hi Zhuang,
>
> There is a line in your GFF file that looks like this:
>
> MBU59461 GenBank region 1 155060 . + . ID=U59461;Name=MBU59461;Dbxref=taxon:207830;....
>
> See how the ID and the string in the first column don't match?  That
> is the problem.  While that isn't required for the GFF3 spec, it is
> required by the Chado loader.  The other messages about CDS without a
> parent are warnings indicating the GFF has CDS features that don't
> belong to a gene or a transcript, and in the case that it is a spliced
> CDS might lead to loading failures.  Since you're working baculovirus,
> that shouldn't be a problem.
>
> Scott
>
>
> On Wed, May 26, 2010 at 6:25 AM, zhuang chao <[hidden email]> wrote:
> > hi , all :
> >
> >  When I was loading the gff3 file into chado database using
> > gmod_bulk_load_gff3.pl, I got  the errors like this:
> >
> >  ==================================================
> >  Unable to find srcfeature MBU59461 in the database.
> >  ===================================================
> >
> >  I don't know why and how to handle. Could you help me? I am looking
> >
> >  forward to your reply.  Thank you very much !
> >
> >
> >  The   gff file   is  in  the  attachment  .  It  was  compressed  by
> > tar .   Here  is   a  history  of how I  loaded  previous  data .
> >
> >  =======================================================================
> >
> > root@debian:/home/zc/Downloads#
> > perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile   sequences.gb.gff
> > --organism 'cdab' --dbname four_viruses  --dbuser zc --dbpass 123456
> > --dbhost localhost --dbport 5432  --recreate_cache  --noexon
> >
> > (Re)creating the uniquename cache in the database...
> > Creating table...
> > Populating table...
> > Creating indexes...Done.
> > Preparing data for inserting into the four_viruses database
> > (This may take a while ...)
> >
> >
> > There is a CDS feature with no parent (ID:ODV-e28)  I think that is
> > wrong!
> >
> >
> > This GFF file has CDS and/or UTR features that do not belong to a
> > 'central dogma' gene (ie, gene/transcript/CDS).  The features of
> > this type are being stored in the database as is.
> >
> >
> >
> > There is a CDS feature with no parent (ID:ODV-e25)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:helicase)  I think that is
> > wrong!
> >
> > Dropping cds temp tables...
> > Creating cds temp tables...
> > NOTICE:  CREATE TABLE will create implicit sequence
> > "tmp_cds_handler_cds_row_id_seq" for serial column
> > "tmp_cds_handler.cds_row_id"
> > NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> > "tmp_cds_handler_pkey" for table "tmp_cds_handler"
> > NOTICE:  CREATE TABLE will create implicit sequence
> > "tmp_cds_handler_relationship_rel_row_id_seq" for serial column
> > "tmp_cds_handler_relationship.rel_row_id"
> > NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> > "tmp_cds_handler_relationship_pkey" for table
> > "tmp_cds_handler_relationship"
> >
> >
> > There is a CDS feature with no parent (ID:NP_613096.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613119.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613164.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613178.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613188.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613194.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613231.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:39k/pp31)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613234.1)  I think that is
> > wrong!
> >
> > Unable to find srcfeature MBU59461 in the database.
> > Perhaps you need to rerun your data load with the '--recreate_cache'
> > option. at /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 4026
> >
> > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x34d6e30)', 'Bio::SeqFeature::Annotated=HASH(0x2cea988)') called at /usr/local/bin/gmod_bulk_load_gff3.pl line 758
> > Issuing rollback() due to DESTROY without explicit disconnect() of
> > DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
> > at /usr/share/perl/5.10/Carp.pm line 45.
> > ===================================================================================
> >
>
>
>



------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading the gff3 file into chado database (2)

simon rayner
Zhuang,

i think  you can just do this using a simple script

Simon

On Wed, May 26, 2010 at 9:52 PM, zhuang chao <[hidden email]> wrote:
hi , scott:

  There are so many lines that  I can't  edit them manually.  I  want

to  handle them automatically .Could you help me?  I am looking

forward to your reply. Thank you very much !


On Wed, 2010-05-26 at 10:28 -0400, Scott Cain wrote:
> Hi Zhuang,
>
> There is a line in your GFF file that looks like this:
>
> MBU59461      GenBank region  1       155060  .       +       .       ID=U59461;Name=MBU59461;Dbxref=taxon:207830;....
>
> See how the ID and the string in the first column don't match?  That
> is the problem.  While that isn't required for the GFF3 spec, it is
> required by the Chado loader.  The other messages about CDS without a
> parent are warnings indicating the GFF has CDS features that don't
> belong to a gene or a transcript, and in the case that it is a spliced
> CDS might lead to loading failures.  Since you're working baculovirus,
> that shouldn't be a problem.
>
> Scott
>
>
> On Wed, May 26, 2010 at 6:25 AM, zhuang chao <[hidden email]> wrote:
> > hi , all :
> >
> >  When I was loading the gff3 file into chado database using
> > gmod_bulk_load_gff3.pl, I got  the errors like this:
> >
> >  ==================================================
> >  Unable to find srcfeature MBU59461 in the database.
> >  ===================================================
> >
> >  I don't know why and how to handle. Could you help me? I am looking
> >
> >  forward to your reply.  Thank you very much !
> >
> >
> >  The   gff file   is  in  the  attachment  .  It  was  compressed  by
> > tar .   Here  is   a  history  of how I  loaded  previous  data .
> >
> >  =======================================================================
> >
> > root@debian:/home/zc/Downloads#
> > perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile   sequences.gb.gff
> > --organism 'cdab' --dbname four_viruses  --dbuser zc --dbpass 123456
> > --dbhost localhost --dbport 5432  --recreate_cache  --noexon
> >
> > (Re)creating the uniquename cache in the database...
> > Creating table...
> > Populating table...
> > Creating indexes...Done.
> > Preparing data for inserting into the four_viruses database
> > (This may take a while ...)
> >
> >
> > There is a CDS feature with no parent (ID:ODV-e28)  I think that is
> > wrong!
> >
> >
> > This GFF file has CDS and/or UTR features that do not belong to a
> > 'central dogma' gene (ie, gene/transcript/CDS).  The features of
> > this type are being stored in the database as is.
> >
> >
> >
> > There is a CDS feature with no parent (ID:ODV-e25)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:helicase)  I think that is
> > wrong!
> >
> > Dropping cds temp tables...
> > Creating cds temp tables...
> > NOTICE:  CREATE TABLE will create implicit sequence
> > "tmp_cds_handler_cds_row_id_seq" for serial column
> > "tmp_cds_handler.cds_row_id"
> > NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> > "tmp_cds_handler_pkey" for table "tmp_cds_handler"
> > NOTICE:  CREATE TABLE will create implicit sequence
> > "tmp_cds_handler_relationship_rel_row_id_seq" for serial column
> > "tmp_cds_handler_relationship.rel_row_id"
> > NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> > "tmp_cds_handler_relationship_pkey" for table
> > "tmp_cds_handler_relationship"
> >
> >
> > There is a CDS feature with no parent (ID:NP_613096.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613119.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613164.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613178.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613188.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613194.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613231.1)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:39k/pp31)  I think that is
> > wrong!
> >
> >
> >
> > There is a CDS feature with no parent (ID:NP_613234.1)  I think that is
> > wrong!
> >
> > Unable to find srcfeature MBU59461 in the database.
> > Perhaps you need to rerun your data load with the '--recreate_cache'
> > option. at /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 4026
> >
> > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x34d6e30)', 'Bio::SeqFeature::Annotated=HASH(0x2cea988)') called at /usr/local/bin/gmod_bulk_load_gff3.pl line 758
> > Issuing rollback() due to DESTROY without explicit disconnect() of
> > DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
> > at /usr/share/perl/5.10/Carp.pm line 45.
> > ===================================================================================
> >
>
>
>





--
Simon Rayner

State Key Laboratory of Virology
Wuhan Institute of Virology
Chinese Academy of Sciences
Wuhan, Hubei 430071
P.R.China

+86 (27) 87199895 (office)
+86 15972923715 (cell)


------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading the gff3 file into chado database (2)

Scott Cain
Hi Zhuang,

I agree with Simon: frequently the conversion of GenBank formatted
files to GFF3 is not perfect, and requires some fixing afterward.  If
this is the only sort of problem in the file, writing a perl script to
fix it should be fairly easy.

Scott


On Wed, May 26, 2010 at 10:09 PM, simon rayner
<[hidden email]> wrote:

> Zhuang,
>
> i think  you can just do this using a simple script
>
> Simon
>
> On Wed, May 26, 2010 at 9:52 PM, zhuang chao <[hidden email]> wrote:
>>
>> hi , scott:
>>
>>   There are so many lines that  I can't  edit them manually.  I  want
>>
>> to  handle them automatically .Could you help me?  I am looking
>>
>> forward to your reply. Thank you very much !
>>
>>
>> On Wed, 2010-05-26 at 10:28 -0400, Scott Cain wrote:
>> > Hi Zhuang,
>> >
>> > There is a line in your GFF file that looks like this:
>> >
>> > MBU59461      GenBank region  1       155060  .       +       .
>> > ID=U59461;Name=MBU59461;Dbxref=taxon:207830;....
>> >
>> > See how the ID and the string in the first column don't match?  That
>> > is the problem.  While that isn't required for the GFF3 spec, it is
>> > required by the Chado loader.  The other messages about CDS without a
>> > parent are warnings indicating the GFF has CDS features that don't
>> > belong to a gene or a transcript, and in the case that it is a spliced
>> > CDS might lead to loading failures.  Since you're working baculovirus,
>> > that shouldn't be a problem.
>> >
>> > Scott
>> >
>> >
>> > On Wed, May 26, 2010 at 6:25 AM, zhuang chao <[hidden email]> wrote:
>> > > hi , all :
>> > >
>> > >  When I was loading the gff3 file into chado database using
>> > > gmod_bulk_load_gff3.pl, I got  the errors like this:
>> > >
>> > >  ==================================================
>> > >  Unable to find srcfeature MBU59461 in the database.
>> > >  ===================================================
>> > >
>> > >  I don't know why and how to handle. Could you help me? I am looking
>> > >
>> > >  forward to your reply.  Thank you very much !
>> > >
>> > >
>> > >  The   gff file   is  in  the  attachment  .  It  was  compressed  by
>> > > tar .   Here  is   a  history  of how I  loaded  previous  data .
>> > >
>> > >
>> > >  =======================================================================
>> > >
>> > > root@debian:/home/zc/Downloads#
>> > > perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
>> > > sequences.gb.gff
>> > > --organism 'cdab' --dbname four_viruses  --dbuser zc --dbpass 123456
>> > > --dbhost localhost --dbport 5432  --recreate_cache  --noexon
>> > >
>> > > (Re)creating the uniquename cache in the database...
>> > > Creating table...
>> > > Populating table...
>> > > Creating indexes...Done.
>> > > Preparing data for inserting into the four_viruses database
>> > > (This may take a while ...)
>> > >
>> > >
>> > > There is a CDS feature with no parent (ID:ODV-e28)  I think that is
>> > > wrong!
>> > >
>> > >
>> > > This GFF file has CDS and/or UTR features that do not belong to a
>> > > 'central dogma' gene (ie, gene/transcript/CDS).  The features of
>> > > this type are being stored in the database as is.
>> > >
>> > >
>> > >
>> > > There is a CDS feature with no parent (ID:ODV-e25)  I think that is
>> > > wrong!
>> > >
>> > >
>> > >
>> > > There is a CDS feature with no parent (ID:helicase)  I think that is
>> > > wrong!
>> > >
>> > > Dropping cds temp tables...
>> > > Creating cds temp tables...
>> > > NOTICE:  CREATE TABLE will create implicit sequence
>> > > "tmp_cds_handler_cds_row_id_seq" for serial column
>> > > "tmp_cds_handler.cds_row_id"
>> > > NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
>> > > "tmp_cds_handler_pkey" for table "tmp_cds_handler"
>> > > NOTICE:  CREATE TABLE will create implicit sequence
>> > > "tmp_cds_handler_relationship_rel_row_id_seq" for serial column
>> > > "tmp_cds_handler_relationship.rel_row_id"
>> > > NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
>> > > "tmp_cds_handler_relationship_pkey" for table
>> > > "tmp_cds_handler_relationship"
>> > >
>> > >
>> > > There is a CDS feature with no parent (ID:NP_613096.1)  I think that
>> > > is
>> > > wrong!
>> > >
>> > >
>> > >
>> > > There is a CDS feature with no parent (ID:NP_613119.1)  I think that
>> > > is
>> > > wrong!
>> > >
>> > >
>> > >
>> > > There is a CDS feature with no parent (ID:NP_613164.1)  I think that
>> > > is
>> > > wrong!
>> > >
>> > >
>> > >
>> > > There is a CDS feature with no parent (ID:NP_613178.1)  I think that
>> > > is
>> > > wrong!
>> > >
>> > >
>> > >
>> > > There is a CDS feature with no parent (ID:NP_613188.1)  I think that
>> > > is
>> > > wrong!
>> > >
>> > >
>> > >
>> > > There is a CDS feature with no parent (ID:NP_613194.1)  I think that
>> > > is
>> > > wrong!
>> > >
>> > >
>> > >
>> > > There is a CDS feature with no parent (ID:NP_613231.1)  I think that
>> > > is
>> > > wrong!
>> > >
>> > >
>> > >
>> > > There is a CDS feature with no parent (ID:39k/pp31)  I think that is
>> > > wrong!
>> > >
>> > >
>> > >
>> > > There is a CDS feature with no parent (ID:NP_613234.1)  I think that
>> > > is
>> > > wrong!
>> > >
>> > > Unable to find srcfeature MBU59461 in the database.
>> > > Perhaps you need to rerun your data load with the '--recreate_cache'
>> > > option. at /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line
>> > > 4026
>> > >
>> > >
>> > > Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x34d6e30)',
>> > > 'Bio::SeqFeature::Annotated=HASH(0x2cea988)') called at
>> > > /usr/local/bin/gmod_bulk_load_gff3.pl line 758
>> > > Issuing rollback() due to DESTROY without explicit disconnect() of
>> > > DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
>> > > at /usr/share/perl/5.10/Carp.pm line 45.
>> > >
>> > > ===================================================================================
>> > >
>> >
>> >
>> >
>>
>>
>
>
>
> --
> Simon Rayner
>
> State Key Laboratory of Virology
> Wuhan Institute of Virology
> Chinese Academy of Sciences
> Wuhan, Hubei 430071
> P.R.China
>
> +86 (27) 87199895 (office)
> +86 15972923715 (cell)
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

a question of loading data into chado

zhuang chao
hi , all :

 When I was loading the gff3 file into chado database using
gmod_bulk_load_gff3.pl, I got  the errors like this:

 ==================================================
 MSG: no cvterm for protein
 ===================================================

 I don't know why and how to handle. Could you help me? I am looking  
 
 forward to your reply.  Thank you very much !


 The   gff file   is  in  the  attachment  .  It  was  compressed  by
tar .   Here  is   a  history  of how I  loaded  previous  data .

 =======================================================================


root@debian:/home/zc/Downloads#
perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
sequences.gp.gff  --organism 'abcd1' --dbname four_viruses  --dbuser zc
--dbpass 123456 --dbhost localhost --dbport 5432    --noexon

Preparing data for inserting into the four_viruses database
(This may take a while ...)

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no cvterm for protein
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
STACK:
Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:4050
STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:755
-----------------------------------------------------------
Issuing rollback() due to DESTROY without explicit disconnect() of
DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
at /usr/share/perl5/Error.pm line 184.
===================================================================================




------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema

sequences.gp.gff.tar.gz (6M) Download Attachment
Reply | Threaded
Open this post in threaded view
|

a question of loading data into chado (2)

zhuang chao
In reply to this post by Scott Cain
hi , all :

 When I was loading the gff3 file into chado database using
gmod_bulk_load_gff3.pl, I got  the errors like this:

 ==================================================
 MSG: no cvterm for protein
 ===================================================

 I don't know why and how to handle. Could you help me? I am looking  
 
 forward to your reply.  Thank you very much !
 
 Here  is   a  history  of how I  loaded  previous  data .

 =======================================================================


root@debian:/home/zc/Downloads#
perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
sequences.gp.gff  --organism 'abcd1' --dbname four_viruses  --dbuser zc
--dbpass 123456 --dbhost localhost --dbport 5432    --noexon

Preparing data for inserting into the four_viruses database
(This may take a while ...)

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no cvterm for protein
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
STACK:
Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:4050
STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:755
-----------------------------------------------------------
Issuing rollback() due to DESTROY without explicit disconnect() of
DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
at /usr/share/perl5/Error.pm line 184.
===================================================================================




------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading data into chado (2)

simon rayner
Zhuang,

it could be there is an extra or missing <TAB> character in that line for that entry in the GFF3 file. Can you post the dump for the entry

simon

On Fri, May 28, 2010 at 3:40 AM, zhuang chao <[hidden email]> wrote:
hi , all :

 When I was loading the gff3 file into chado database using
gmod_bulk_load_gff3.pl, I got  the errors like this:

 ==================================================
 MSG: no cvterm for protein
 ===================================================

 I don't know why and how to handle. Could you help me? I am looking

 forward to your reply.  Thank you very much !

 Here  is   a  history  of how I  loaded  previous  data .

 =======================================================================


root@debian:/home/zc/Downloads#
perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
sequences.gp.gff  --organism 'abcd1' --dbname four_viruses  --dbuser zc
--dbpass 123456 --dbhost localhost --dbport 5432    --noexon

Preparing data for inserting into the four_viruses database
(This may take a while ...)

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no cvterm for protein
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
STACK:
Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:4050
STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:755
-----------------------------------------------------------
Issuing rollback() due to DESTROY without explicit disconnect() of
DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
at /usr/share/perl5/Error.pm line 184.
===================================================================================






--
Simon Rayner

State Key Laboratory of Virology
Wuhan Institute of Virology
Chinese Academy of Sciences
Wuhan, Hubei 430071
P.R.China

+86 (27) 87199895 (office)
+86 15972923715 (cell)


------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading data into chado (2). .

Becksfort, Jared
In reply to this post by zhuang chao
Zhuang,

That error appears because the cvterm table does not have a row whose name corresponds to "protein".  You probably have one or more rows in your gff file whose third column reads "protein".  I am not an expert on Sequence Ontology, but I think you might be looking for "polypeptide" which should be in your cvterm table if you have loaded the Sequence Ontology.

Jared

-----Original Message-----
From: zhuang chao [mailto:[hidden email]]
Sent: Friday, May 28, 2010 2:40 AM
To: Scott Cain
Cc: zeroliu; gmod-schema; simon rayner
Subject: [Gmod-schema] a question of loading data into chado (2). .

hi , all :

 When I was loading the gff3 file into chado database using
gmod_bulk_load_gff3.pl, I got  the errors like this:

 ==================================================
 MSG: no cvterm for protein
 ===================================================

 I don't know why and how to handle. Could you help me? I am looking

 forward to your reply.  Thank you very much !

 Here  is   a  history  of how I  loaded  previous  data .

 =======================================================================


root@debian:/home/zc/Downloads#
perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
sequences.gp.gff  --organism 'abcd1' --dbname four_viruses  --dbuser zc
--dbpass 123456 --dbhost localhost --dbport 5432    --noexon

Preparing data for inserting into the four_viruses database
(This may take a while ...)

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no cvterm for protein
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
STACK:
Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:4050
STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:755
-----------------------------------------------------------
Issuing rollback() due to DESTROY without explicit disconnect() of
DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
at /usr/share/perl5/Error.pm line 184.
===================================================================================




------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


Email Disclaimer:  www.stjude.org/emaildisclaimer


------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading data into chado (2)

Dave Clements, GMOD Help Desk-3
In reply to this post by simon rayner
Hi Zhuang,

Protein is not a first class term in the Sequence Ontology.  It is a synonym from "polypeptide".  See http://www.sequenceontology.org/miso/current_release/term/SO:0000104.  Does the load work if you change all your "protein" to "polypeptide"?  IF it still doesn't work, then the sequence ontology may not be loaded in your database.

Dave C.

On Fri, May 28, 2010 at 1:09 AM, simon rayner <simon.rayner.cn@gmail.com> wrote:
Zhuang,

it could be there is an extra or missing <TAB> character in that line for that entry in the GFF3 file. Can you post the dump for the entry

simon


On Fri, May 28, 2010 at 3:40 AM, zhuang chao <[hidden email]> wrote:
hi , all :

 When I was loading the gff3 file into chado database using
gmod_bulk_load_gff3.pl, I got  the errors like this:

 ==================================================
 MSG: no cvterm for protein
 ===================================================

 I don't know why and how to handle. Could you help me? I am looking

 forward to your reply.  Thank you very much !

 Here  is   a  history  of how I  loaded  previous  data .

 =======================================================================


root@debian:/home/zc/Downloads#
perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
sequences.gp.gff  --organism 'abcd1' --dbname four_viruses  --dbuser zc
--dbpass 123456 --dbhost localhost --dbport 5432    --noexon

Preparing data for inserting into the four_viruses database
(This may take a while ...)

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no cvterm for protein
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
STACK:
Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:4050
STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:755
-----------------------------------------------------------
Issuing rollback() due to DESTROY without explicit disconnect() of
DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
at /usr/share/perl5/Error.pm line 184.
===================================================================================






--
Simon Rayner

State Key Laboratory of Virology
Wuhan Institute of Virology
Chinese Academy of Sciences
Wuhan, Hubei 430071
P.R.China

+86 (27) 87199895 (office)
+86 15972923715 (cell)


------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
===> PLEASE KEEP RESPONSES ON THE LIST <===
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/Calendar
http://gmod.org/wiki/Help_Desk_Feedback


------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading data into chado (no attachment). .

zhuang chao
In reply to this post by Becksfort, Jared
hi , all :

 When I was loading the gff3 file into chado database using
gmod_bulk_load_gff3.pl, I got  the errors like this:

 ==================================================
 MSG: no cvterm for protein
 ===================================================


   I  replaced  'protein' with 'polypeptide' in the gff

file .  And the sequence ontology  was loaded in chado

database.   But  I  got  the errors like this:

   ===============================
   MSG: no cvterm for polypeptide
   ================================
   
   Could you help me?  Thank you very much !

 Here  is   a  history  of how I  loaded  previous  data .  
 
 =======================================================================

 root@debian:/home/zc/Downloads#
perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
sequences.gp.gff  --organism 'abcd1' --dbname four_viruses  --dbuser zc
--dbpass 123456 --dbhost localhost --dbport 5432    --recreate_cache
--noexon
(Re)creating the uniquename cache in the database...
Creating table...
Populating table...
Creating indexes...Done.
Preparing data for inserting into the four_viruses database
(This may take a while ...)

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no cvterm for polypeptide
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
STACK:
Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:4050
STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:755
-----------------------------------------------------------
Issuing rollback() due to DESTROY without explicit disconnect() of
DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
at /usr/share/perl5/Error.pm line 184.

========================================================================


On Fri, 2010-05-28 at 08:55 -0500, Becksfort, Jared wrote:

> Zhuang,
>
> That error appears because the cvterm table does not have a row whose name corresponds to "protein".  You probably have one or more rows in your gff file whose third column reads "protein".  I am not an expert on Sequence Ontology, but I think you might be looking for "polypeptide" which should be in your cvterm table if you have loaded the Sequence Ontology.
>
> Jared
>
> -----Original Message-----
> From: zhuang chao [mailto:[hidden email]]
> Sent: Friday, May 28, 2010 2:40 AM
> To: Scott Cain
> Cc: zeroliu; gmod-schema; simon rayner
> Subject: [Gmod-schema] a question of loading data into chado (2). .
>
> hi , all :
>
>  When I was loading the gff3 file into chado database using
> gmod_bulk_load_gff3.pl, I got  the errors like this:
>
>  ==================================================
>  MSG: no cvterm for protein
>  ===================================================
>
>  I don't know why and how to handle. Could you help me? I am looking
>
>  forward to your reply.  Thank you very much !
>
>  Here  is   a  history  of how I  loaded  previous  data .
>
>  =======================================================================
>
>
> root@debian:/home/zc/Downloads#
> perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
> sequences.gp.gff  --organism 'abcd1' --dbname four_viruses  --dbuser zc
> --dbpass 123456 --dbhost localhost --dbport 5432    --noexon
>
> Preparing data for inserting into the four_viruses database
> (This may take a while ...)
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: no cvterm for protein
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
> STACK:
> Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:4050
> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:755
> -----------------------------------------------------------
> Issuing rollback() due to DESTROY without explicit disconnect() of
> DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
> at /usr/share/perl5/Error.pm line 184.
> ===================================================================================
>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
> Email Disclaimer:  www.stjude.org/emaildisclaimer
>



------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading data into chado (an attachment). .

Scott Cain
In reply to this post by Becksfort, Jared
Hi Zhuang,

You don't have "polypeptide", you have "polypeptide " (notice the
trailing space).  I found this by looking through your gff file for
"\tpolypeptide\t" and didn't find anything.

Scott


On Mon, May 31, 2010 at 6:14 AM, zhuang chao <[hidden email]> wrote:

> hi , all :
>
>   When I was loading the gff3 file into chado database using
>
> gmod_bulk_load_gff3.pl, I got  the errors like this:
>
>  ==================================================
>  MSG: no cvterm for protein
>  ===================================================
>
>   I  replaced  'protein' with 'polypeptide' in the gff
>
> file .  And the sequence ontology  was loaded in chado
>
> database.   But  I  got  the errors like this:
>
>   ===============================
>   MSG: no cvterm for polypeptide
>   ================================
>
>   Could you help me?  Thank you very much !
>
>
> The   gff file   is  in  the  attachment  .  It  was  compressed  by
>
> tar . Here  is   a  history  of how I  loaded  previous  data .
>
>  =======================================================================
>
>  root@debian:/home/zc/Downloads#
> perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
> sequences.gp.gff  --organism 'abcd1' --dbname four_viruses  --dbuser zc
> --dbpass 123456 --dbhost localhost --dbport 5432    --recreate_cache
> --noexon
> (Re)creating the uniquename cache in the database...
> Creating table...
> Populating table...
> Creating indexes...Done.
> Preparing data for inserting into the four_viruses database
> (This may take a while ...)
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: no cvterm for polypeptide
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
> STACK:
> Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:4050
> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:755
> -----------------------------------------------------------
> Issuing rollback() due to DESTROY without explicit disconnect() of
> DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
> at /usr/share/perl5/Error.pm line 184.
>
> ========================================================================
>
>
>
>
>
> On Fri, 2010-05-28 at 08:55 -0500, Becksfort, Jared wrote:
>> Zhuang,
>>
>> That error appears because the cvterm table does not have a row whose name corresponds to "protein".  You probably have one or more rows in your gff file whose third column reads "protein".  I am not an expert on Sequence Ontology, but I think you might be looking for "polypeptide" which should be in your cvterm table if you have loaded the Sequence Ontology.
>>
>> Jared
>>
>> -----Original Message-----
>> From: zhuang chao [mailto:[hidden email]]
>> Sent: Friday, May 28, 2010 2:40 AM
>> To: Scott Cain
>> Cc: zeroliu; gmod-schema; simon rayner
>> Subject: [Gmod-schema] a question of loading data into chado (2). .
>>
>> hi , all :
>>
>>  When I was loading the gff3 file into chado database using
>> gmod_bulk_load_gff3.pl, I got  the errors like this:
>>
>>  ==================================================
>>  MSG: no cvterm for protein
>>  ===================================================
>>
>>  I don't know why and how to handle. Could you help me? I am looking
>>
>>  forward to your reply.  Thank you very much !
>>
>>  Here  is   a  history  of how I  loaded  previous  data .
>>
>>  =======================================================================
>>
>>
>> root@debian:/home/zc/Downloads#
>> perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
>> sequences.gp.gff  --organism 'abcd1' --dbname four_viruses  --dbuser zc
>> --dbpass 123456 --dbhost localhost --dbport 5432    --noexon
>>
>> Preparing data for inserting into the four_viruses database
>> (This may take a while ...)
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: no cvterm for protein
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
>> STACK:
>> Bio::GMOD::DB::Adapter::get_type /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:4050
>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:755
>> -----------------------------------------------------------
>> Issuing rollback() due to DESTROY without explicit disconnect() of
>> DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
>> at /usr/share/perl5/Error.pm line 184.
>> ===================================================================================
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

a question of loading the gff3 file into chado database (2_1)

zhuang chao
In reply to this post by Scott Cain
hi all:

When I was loading the gff3 file into chado database using

gmod_bulk_load_gff3.pl, I got  the error message like this:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Object Bio::Annotation::SimpleValue=HASH(0x9f7b420) was not valid
with key type. If you were adding new keys in, perhaps you want to make
use
of the archetype method to allow registration to a more basic type
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
STACK:
Bio::Annotation::Collection::add_Annotation /usr/share/perl5/Bio/Annotation/Collection.pm:361
STACK:
Bio::SeqFeature::Annotated::add_Annotation /usr/share/perl5/Bio/SeqFeature/Annotated.pm:609
STACK:
Bio::FeatureIO::gff::_handle_non_reserved_tag /usr/share/perl5/Bio/FeatureIO/gff.pm:797
STACK:
Bio::FeatureIO::gff::_handle_feature /usr/share/perl5/Bio/FeatureIO/gff.pm:752
STACK:
Bio::FeatureIO::gff::next_feature /usr/share/perl5/Bio/FeatureIO/gff.pm:172
STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:694
-----------------------------------------------------------


I don't know why and how to handle. Could you help me?


the command line was :
=======================================================================
perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
sequences.gbwithparts.gff.1    --organism 'abcd5' --dbname four_viruses
--dbuser zc --dbpass 123456 --dbhost localhost --dbport 5432
--recreate_cache --noexon
=======================================================================

I am looking forward to your reply.  Thank you very much !




------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

a question of loading the gff3 data file into chado (3)

zhuang chao
In reply to this post by Scott Cain
hi all:

When I was loading the gff3 file into chado database using

gmod_bulk_load_gff3.pl, I got  the error message like this:
===========================================================

Skipping organism table since the load file is empty...

DBD::Pg::db commit failed: ERROR:  insert or update on table
"feature_relationship" violates foreign key constraint
"feature_relationship_subject_id_fkey"

DETAIL:  Key (subject_id)=(263947) is not present in table "feature".
at /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 2651, <$fh>
line 3.

commit failed: ERROR:  insert or update on table "feature_relationship"
violates foreign key constraint "feature_relationship_subject_id_fkey"

DETAIL:  Key (subject_id)=(263947) is not present in table "feature".
at /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 2651, <$fh>
line 3.

========================================================================


I don't know why and how to handle. Could you help me?


the command line was :
=======================================================================
perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
sequences.gbwithparts.gff.1    --organism 'abcd3' --dbname four_viruses
--dbuser zc --dbpass 123456 --dbhost localhost --dbport 5432
--recreate_cache   --noexon
=======================================================================


I am looking forward to your reply.  Thank you very much !



------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading the gff3 data into chado

Scott Cain
In reply to this post by Scott Cain
Hello Zhuang,

Zhenyang Guo just wrote the schema mailing list about exactly the same
problem over the last few days.  You have the same IDs (P, P.t01, and
P.p01) repeated over and over again in the same GFF3 file, which
violates the GFF3 specification.  The IDs must be unique for a GFF3
file.  Additionally, I don't think you want to load the data like this
anyway, even if you do make the IDs unique, since it appears to me
that these are separate strains of a virus where you are looking at
the same gene.  In that case, they should really be represented as
separate organisms.  Chado doesn't represent the concept of strain,
and I don't know anything about the scientific naming of viruses, but
it needs to have a unique genus and species.  I know for people
working with bacteria, it is common to append the strain information
on the species name.

Scott


On Fri, Jun 4, 2010 at 5:40 AM, zhuang chao <[hidden email]> wrote:

> hi all:
>
> When I was loading the gff3 file into chado database using
>
> gmod_bulk_load_gff3.pl, I got  the error message like this:
> ===========================================================
>
> Skipping organism table since the load file is empty...
>
> DBD::Pg::db commit failed: ERROR:  insert or update on table
> "feature_relationship" violates foreign key constraint
> "feature_relationship_subject_id_fkey"
>
> DETAIL:  Key (subject_id)=(263947) is not present in table "feature".
> at /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 2651, <$fh>
> line 3.
>
> commit failed: ERROR:  insert or update on table "feature_relationship"
> violates foreign key constraint "feature_relationship_subject_id_fkey"
>
> DETAIL:  Key (subject_id)=(263947) is not present in table "feature".
> at /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 2651, <$fh>
> line 3.
>
> ========================================================================
>
>
> I don't know why and how to handle. Could you help me?
>
>
> the command line was :
> =======================================================================
> perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
> sequences.gbwithparts.gff.1    --organism 'abcd3' --dbname four_viruses
> --dbuser zc --dbpass 123456 --dbhost localhost --dbport 5432
> --recreate_cache   --noexon
> =======================================================================
>
>
> The   genbank and  gff3  file   is  in  the  attachment  .  It  was
> compressed  by  tar .
>
> I am looking forward to your reply.  Thank you very much !
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

a question of loading the gff3 data file into chado database

zhuang chao
hi all:

When I was loading the gff3 file into chado database using

gmod_bulk_load_gff3.pl, I got  the error message like this:
===========================================================

(Re)creating the uniquename cache in the database...
Creating table...
Populating table...
Creating indexes...Done.
Preparing data for inserting into the four_viruses database
(This may take a while ...)

no parent P;
you probably need to rerun the loader with the --recreate_cache option

Issuing rollback() due to DESTROY without explicit disconnect() of
DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
at /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 3882.

========================================================================

I don't know why and how to handle. Could you help me?

the command line was :
=======================================================================
perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
GU207834.gbk.gff    --organism 'Rabies virus' --dbname four_viruses
--dbuser zc --dbpass 123456 --dbhost localhost --dbport 5432
--recreate_cache --noexon
=======================================================================

The   genbank and  gff3  file   is  in  the  attachment  .

I am looking forward to your reply.  Thank you very much !



------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema

GU207834.gbk (3K) Download Attachment
GU207834.gbk.gff (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading the gff3 data file into chado database

Scott Cain
Hi Zhuang,

Thanks for sending a small testing data set :-)

I was able to load this GFF file just fine using this command:

gmod_bulk_load_gff3.pl --org yeast --noexon -g GU207834.gbk.gff

(I used yeast as the organism so I wouldn't need to create a new entry
in my organism table, but that shouldn't matter).  The only warning
message I got was this:

No feature found for P.p01, org_id:10 when trying to add sequence at
/Users/cain/cvs_stuff/schema/trunk/chado/lib/Bio/GMOD/DB/Adapter.pm
line 2527, <GEN0> line 14.

which happens because the CDS feature doesn't get loaded into the
database, but rather an implied polypeptide feature is created in its
place, but the loader doesn't know how to associate the fasta sequence
for the CDS with the polypeptide feature.  This might be fixed by
processing the genbank with the genbank2gff3.pl script with the
--noCDS flag, which creates genes that are
gene->mRNA->exon,polypeptide rather than gene->mRNA->exon,CDS.  I
don't know for sure if that will do it though, because people
generally don't save polypeptide sequences since they can be inferred
from other information in the database.

To further test you situation, I wondered if the problem was loading
subsequent data sets with exactly the same IDs used again (P, P.t01,
and P.p01) so I copied the file you sent me, but changed the accession
from GU207834 to GU207835 and reloaded the data with exactly the same
command line as above, and it loaded fine again, with the same warning
message about the CDS sequence.

So, I'm left wondering what is different about your situation compared
to mine.  Perhaps a previous load attempt has left the database in a
horribly corrupted state, such that the loader can't recover from it
and the load fails instead.  Could you try with a fresh database?  My
suggestion in this regard is to create a new database with whatever
ontologies you plan on using already loaded, and make a dump of that
with pg_dump and save it, so you can easily go back to that initial
state while you are testing.

If that has the same result, then we have to look at more fundamental
issues, like perl libraries (notably BioPerl) and how the PostgreSQL
database is set up, and particularly how its encoding is configured,
but hopefully we won't have to go there.

Scott


On Mon, Jun 7, 2010 at 5:51 AM, zhuang chao <[hidden email]> wrote:

> hi all:
>
> When I was loading the gff3 file into chado database using
>
> gmod_bulk_load_gff3.pl, I got  the error message like this:
> ===========================================================
>
> (Re)creating the uniquename cache in the database...
> Creating table...
> Populating table...
> Creating indexes...Done.
> Preparing data for inserting into the four_viruses database
> (This may take a while ...)
>
> no parent P;
> you probably need to rerun the loader with the --recreate_cache option
>
> Issuing rollback() due to DESTROY without explicit disconnect() of
> DBD::Pg::db handle dbname=four_viruses;port=5432;host=localhost
> at /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 3882.
>
> ========================================================================
>
> I don't know why and how to handle. Could you help me?
>
> the command line was :
> =======================================================================
> perl /usr/local/bin/gmod_bulk_load_gff3.pl  --gfffile
> GU207834.gbk.gff    --organism 'Rabies virus' --dbname four_viruses
> --dbuser zc --dbpass 123456 --dbhost localhost --dbport 5432
> --recreate_cache --noexon
> =======================================================================
>
> The   genbank and  gff3  file   is  in  the  attachment  .
>
> I am looking forward to your reply.  Thank you very much !
>
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

a question of loading the gff3 data into chado database

zhuang chao
hi, all ,


I have a question . Can  I  Simultaneously  run  several  

gmod_bulk_load_gff3.pl  scripts  to load  the  different

gff3  data  files  into  chado  database ?  If I do , Does

the  load  action  make  the  database  in a horribly corrupted  

state ?  


I hope your reply . Thank you very much !



------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading the gff3 data into chado database

Scott Cain
HI Zhuang,

No you can't, and if you try, the loader should fail to run (unless
you supply the --remove_lock flag when running, but you wouldn't want
to do that if there is still another loader running).  From `perldoc
gmod_bulk_load_gff3.pl`:

The run lock
           The bulk loader is not a multiuser application.  If two separate
           bulk load processes try to load data into the database at the same
           time, at least one and possibly all loads will fail.  To keep this
           from happening, the bulk loader places a lock in the database to
           prevent other gmod_bulk_load_gff3.pl processes from running at the
           same time.  When the application exits normally, this lock will be
           removed, but if it crashes for some reason, the lock will not be
           removed.  To remove the lock from the command line, provide the
           flag --remove_lock.  Note that if the loader crashed necessitating
           the removal of the lock, you also may need to rebuild the
           uniquename cache (see the next section).

Scott


On Fri, Jun 11, 2010 at 3:48 AM, zhuang chao <[hidden email]> wrote:

> hi, all ,
>
>
> I have a question . Can  I  Simultaneously  run  several
>
> gmod_bulk_load_gff3.pl  scripts  to load  the  different
>
> gff3  data  files  into  chado  database ?  If I do , Does
>
> the  load  action  make  the  database  in a horribly corrupted
>
> state ?
>
>
> I hope your reply . Thank you very much !
>
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading the gff3 data into chado database

zhuang chao
hi  Scott ,

Can   I  simultaneously run different gmod_bulk_load_gff3 processes for

different  databases ?


I hope your reply . Thank you very much !




On Fri, 2010-06-11 at 10:11 -0400, Scott Cain wrote:

> HI Zhuang,
>
> No you can't, and if you try, the loader should fail to run (unless
> you supply the --remove_lock flag when running, but you wouldn't want
> to do that if there is still another loader running).  From `perldoc
> gmod_bulk_load_gff3.pl`:
>
> The run lock
>            The bulk loader is not a multiuser application.  If two separate
>            bulk load processes try to load data into the database at the same
>            time, at least one and possibly all loads will fail.  To keep this
>            from happening, the bulk loader places a lock in the database to
>            prevent other gmod_bulk_load_gff3.pl processes from running at the
>            same time.  When the application exits normally, this lock will be
>            removed, but if it crashes for some reason, the lock will not be
>            removed.  To remove the lock from the command line, provide the
>            flag --remove_lock.  Note that if the loader crashed necessitating
>            the removal of the lock, you also may need to rebuild the
>            uniquename cache (see the next section).
>
> Scott
>
>
> On Fri, Jun 11, 2010 at 3:48 AM, zhuang chao <[hidden email]> wrote:
> > hi, all ,
> >
> >
> > I have a question . Can  I  Simultaneously  run  several
> >
> > gmod_bulk_load_gff3.pl  scripts  to load  I hope your reply . Thank you very much !the  different
> >
> > gff3  data  files  into  chado  database ?  If I do , Does
> >
> > the  load  action  make  the  database  in a horribly corrupted
> >
> > state ?
> >
> >
> > I hope your reply . Thank you very much !
> >
> >
> >
>
>
>



------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: a question of loading the gff3 data into chado database

Scott Cain
Yes, that should be fine (though I've never tried it).  The part that
can be corrupted resides inside the individual database, so if you are
loading separate databases, it should work.

Scott


On Fri, Jun 11, 2010 at 10:49 PM, zhuang chao <[hidden email]> wrote:

> hi  Scott ,
>
> Can   I  simultaneously run different gmod_bulk_load_gff3 processes for
>
> different  databases ?
>
>
> I hope your reply . Thank you very much !
>
>
>
>
> On Fri, 2010-06-11 at 10:11 -0400, Scott Cain wrote:
>> HI Zhuang,
>>
>> No you can't, and if you try, the loader should fail to run (unless
>> you supply the --remove_lock flag when running, but you wouldn't want
>> to do that if there is still another loader running).  From `perldoc
>> gmod_bulk_load_gff3.pl`:
>>
>> The run lock
>>            The bulk loader is not a multiuser application.  If two separate
>>            bulk load processes try to load data into the database at the same
>>            time, at least one and possibly all loads will fail.  To keep this
>>            from happening, the bulk loader places a lock in the database to
>>            prevent other gmod_bulk_load_gff3.pl processes from running at the
>>            same time.  When the application exits normally, this lock will be
>>            removed, but if it crashes for some reason, the lock will not be
>>            removed.  To remove the lock from the command line, provide the
>>            flag --remove_lock.  Note that if the loader crashed necessitating
>>            the removal of the lock, you also may need to rebuild the
>>            uniquename cache (see the next section).
>>
>> Scott
>>
>>
>> On Fri, Jun 11, 2010 at 3:48 AM, zhuang chao <[hidden email]> wrote:
>> > hi, all ,
>> >
>> >
>> > I have a question . Can  I  Simultaneously  run  several
>> >
>> > gmod_bulk_load_gff3.pl  scripts  to load  I hope your reply . Thank you very much !the  different
>> >
>> > gff3  data  files  into  chado  database ?  If I do , Does
>> >
>> > the  load  action  make  the  database  in a horribly corrupted
>> >
>> > state ?
>> >
>> >
>> > I hope your reply . Thank you very much !
>> >
>> >
>> >
>>
>>
>>
>
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
12