Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive

Jonathan Leto
Howdy,

I have been attempting to load the ITAG GFF3 [0] files, which contain
##sequence-region directives, but I run into errors like this:

$ ./gmod_bulk_load_gff3.pl --gfffile
~/git/ITAG1_release/ITAG1_gene_models_sample.gff3 --organism tomato
--noexon --recreate_cache --analysis --remove_lock --save_tmpfiles
(Re)creating the uniquename cache in the database...
Creating table...
Populating table...
Creating indexes...
Adjusting the primary key sequences (if necessary)...Done.

--------------------- WARNING ---------------------
MSG: '##feature-ontology' directive handling not yet implemented
---------------------------------------------------
Preparing data for inserting into the cxgn database
(This may take a while ...)
Loading data into feature table ...
        COPY feature (feature_id,organism_id,name,uniquename,type_id,is_analysis,seqlen,dbxref_id)
FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
line 3210.
Loading data into featureloc table ...
        COPY featureloc
(featureloc_id,feature_id,srcfeature_id,fmin,fmax,strand,phase,rank,locgroup)
FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
line 3210.
DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer: ""
CONTEXT:  COPY featureloc, line 1, column strand: "" at
/home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
line 3.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: calling endcopy for featureloc failed:
STACK: Error::throw
STACK: Bio::Root::Root::throw
/home/leto/local-lib/lib/perl5/Bio/Root/Root.pm:368
STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
/home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3222
STACK: Bio::GMOD::DB::Adapter::load_data
/home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3144
STACK: ./gmod_bulk_load_gff3.pl:1060
-----------------------------------------------------------

The salient information is that somehow a strand of "" is attempting
to be inserted into the database, which fails. Note that I have also
uncommented
a warning statement that shows the SQL query that is being executed.

I have traced this issue to be caused by the sequence-region
directive. When I remove the line, the file loads fine. As another
test, I created a file with nothing but a sequence-region directive,
and the same error occurs. I have attached that file and  the temp
data file that gmod_bulk_load_gff3.pl creates as well. The 6th column
of that file is the strand, and it has a value of "\N, which is the
text representation of NULL.

It seems to me that something is stringifying the NULL into "" and
then attempting to insert the empty string into strand, which has a
type of smallint. This is what causes the failure.

I would greatly appreciate any thoughts or comments on how to make the
bulk loading script support the sequence-region directive.

Thanks

[0] ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/

--
Jonathan "Duke" Leto
[hidden email]
http://leto.net

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema

ITAG1_gene_models_sample.gff3 (212 bytes) Download Attachment
chado-featureloc-nNS0.dat (64 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive

Dave Clements, GMOD Help Desk-3
Hi Jonathan,

I've created a bug report on this:


This is interesting because the code says:

  This script does not use sequence-region directives for anything.
  If it represents a feature that needs to be inserted into the database,
  it should be represented with a full GFF line.

Dave C.

On Fri, Jul 16, 2010 at 1:31 PM, Jonathan Leto <[hidden email]> wrote:
Howdy,

I have been attempting to load the ITAG GFF3 [0] files, which contain
##sequence-region directives, but I run into errors like this:

$ ./gmod_bulk_load_gff3.pl --gfffile
~/git/ITAG1_release/ITAG1_gene_models_sample.gff3 --organism tomato
--noexon --recreate_cache --analysis --remove_lock --save_tmpfiles
(Re)creating the uniquename cache in the database...
Creating table...
Populating table...
Creating indexes...
Adjusting the primary key sequences (if necessary)...Done.

--------------------- WARNING ---------------------
MSG: '##feature-ontology' directive handling not yet implemented
---------------------------------------------------
Preparing data for inserting into the cxgn database
(This may take a while ...)
Loading data into feature table ...
       COPY feature (feature_id,organism_id,name,uniquename,type_id,is_analysis,seqlen,dbxref_id)
FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
line 3210.
Loading data into featureloc table ...
       COPY featureloc
(featureloc_id,feature_id,srcfeature_id,fmin,fmax,strand,phase,rank,locgroup)
FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
line 3210.
DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer: ""
CONTEXT:  COPY featureloc, line 1, column strand: "" at
/home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
line 3.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: calling endcopy for featureloc failed:
STACK: Error::throw
STACK: Bio::Root::Root::throw
/home/leto/local-lib/lib/perl5/Bio/Root/Root.pm:368
STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
/home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3222
STACK: Bio::GMOD::DB::Adapter::load_data
/home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3144
STACK: ./gmod_bulk_load_gff3.pl:1060
-----------------------------------------------------------

The salient information is that somehow a strand of "" is attempting
to be inserted into the database, which fails. Note that I have also
uncommented
a warning statement that shows the SQL query that is being executed.

I have traced this issue to be caused by the sequence-region
directive. When I remove the line, the file loads fine. As another
test, I created a file with nothing but a sequence-region directive,
and the same error occurs. I have attached that file and  the temp
data file that gmod_bulk_load_gff3.pl creates as well. The 6th column
of that file is the strand, and it has a value of "\N, which is the
text representation of NULL.

It seems to me that something is stringifying the NULL into "" and
then attempting to insert the empty string into strand, which has a
type of smallint. This is what causes the failure.

I would greatly appreciate any thoughts or comments on how to make the
bulk loading script support the sequence-region directive.

Thanks

[0] ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/

--
Jonathan "Duke" Leto
[hidden email]
http://leto.net

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
===> PLEASE KEEP RESPONSES ON THE LIST <===
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/Calendar
http://gmod.org/wiki/Help_Desk_Feedback


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive

Scott Cain
This is in fact a current bug; the easiest work around is to get rid
of sequence-region directives.  Actually fixing the bug is a little
trickier since it is due to the fact the Chado and BioPerl have
different ideas of what should happen.  While I could (probably)
modify BioPerl to do the right thing (from my perspective), I am
reluctant to do that at the moment since that section of BioPerl is
slated to be refactored.

Scott


On Tue, Jul 20, 2010 at 6:55 PM, Dave Clements, GMOD Help Desk
<[hidden email]> wrote:

> Hi Jonathan,
> I've created a bug report on this:
>   http://sourceforge.net/tracker/?func=detail&aid=3032325&group_id=27707&atid=391291
> This is interesting because the code says:
>   This script does not use sequence-region directives for anything.
>   If it represents a feature that needs to be inserted into the database,
>   it should be represented with a full GFF line.
> Dave C.
> On Fri, Jul 16, 2010 at 1:31 PM, Jonathan Leto <[hidden email]> wrote:
>>
>> Howdy,
>>
>> I have been attempting to load the ITAG GFF3 [0] files, which contain
>> ##sequence-region directives, but I run into errors like this:
>>
>> $ ./gmod_bulk_load_gff3.pl --gfffile
>> ~/git/ITAG1_release/ITAG1_gene_models_sample.gff3 --organism tomato
>> --noexon --recreate_cache --analysis --remove_lock --save_tmpfiles
>> (Re)creating the uniquename cache in the database...
>> Creating table...
>> Populating table...
>> Creating indexes...
>> Adjusting the primary key sequences (if necessary)...Done.
>>
>> --------------------- WARNING ---------------------
>> MSG: '##feature-ontology' directive handling not yet implemented
>> ---------------------------------------------------
>> Preparing data for inserting into the cxgn database
>> (This may take a while ...)
>> Loading data into feature table ...
>>        COPY feature
>> (feature_id,organism_id,name,uniquename,type_id,is_analysis,seqlen,dbxref_id)
>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>> line 3210.
>> Loading data into featureloc table ...
>>        COPY featureloc
>>
>> (featureloc_id,feature_id,srcfeature_id,fmin,fmax,strand,phase,rank,locgroup)
>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>> line 3210.
>> DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer:
>> ""
>> CONTEXT:  COPY featureloc, line 1, column strand: "" at
>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>> line 3.
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: calling endcopy for featureloc failed:
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw
>> /home/leto/local-lib/lib/perl5/Bio/Root/Root.pm:368
>> STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3222
>> STACK: Bio::GMOD::DB::Adapter::load_data
>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3144
>> STACK: ./gmod_bulk_load_gff3.pl:1060
>> -----------------------------------------------------------
>>
>> The salient information is that somehow a strand of "" is attempting
>> to be inserted into the database, which fails. Note that I have also
>> uncommented
>> a warning statement that shows the SQL query that is being executed.
>>
>> I have traced this issue to be caused by the sequence-region
>> directive. When I remove the line, the file loads fine. As another
>> test, I created a file with nothing but a sequence-region directive,
>> and the same error occurs. I have attached that file and  the temp
>> data file that gmod_bulk_load_gff3.pl creates as well. The 6th column
>> of that file is the strand, and it has a value of "\N, which is the
>> text representation of NULL.
>>
>> It seems to me that something is stringifying the NULL into "" and
>> then attempting to insert the empty string into strand, which has a
>> type of smallint. This is what causes the failure.
>>
>> I would greatly appreciate any thoughts or comments on how to make the
>> bulk loading script support the sequence-region directive.
>>
>> Thanks
>>
>> [0] ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/
>>
>> --
>> Jonathan "Duke" Leto
>> [hidden email]
>> http://leto.net
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Sprint
>> What will you do first with EVO, the first 4G phone?
>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>
>
>
> --
> ===> PLEASE KEEP RESPONSES ON THE LIST <===
> http://gmod.org/wiki/GMOD_News
> http://gmod.org/wiki/Calendar
> http://gmod.org/wiki/Help_Desk_Feedback
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive

Jonathan Leto
Howdy,

Could you explain what exactly Chado and BioPerl are disagreeing on?
If modifying BioPerl does not make any BioPerl tests fail and allows the loading
of sequence-region directives, I think it should be done.

If the part of BioPerl that needs to be modified has no or few tests, I can add
some and ask the BioPerl people what they think.

Duke


On Fri, Jul 23, 2010 at 10:52 AM, Scott Cain <[hidden email]> wrote:

> This is in fact a current bug; the easiest work around is to get rid
> of sequence-region directives.  Actually fixing the bug is a little
> trickier since it is due to the fact the Chado and BioPerl have
> different ideas of what should happen.  While I could (probably)
> modify BioPerl to do the right thing (from my perspective), I am
> reluctant to do that at the moment since that section of BioPerl is
> slated to be refactored.
>
> Scott
>
>
> On Tue, Jul 20, 2010 at 6:55 PM, Dave Clements, GMOD Help Desk
> <[hidden email]> wrote:
>> Hi Jonathan,
>> I've created a bug report on this:
>>   http://sourceforge.net/tracker/?func=detail&aid=3032325&group_id=27707&atid=391291
>> This is interesting because the code says:
>>   This script does not use sequence-region directives for anything.
>>   If it represents a feature that needs to be inserted into the database,
>>   it should be represented with a full GFF line.
>> Dave C.
>> On Fri, Jul 16, 2010 at 1:31 PM, Jonathan Leto <[hidden email]> wrote:
>>>
>>> Howdy,
>>>
>>> I have been attempting to load the ITAG GFF3 [0] files, which contain
>>> ##sequence-region directives, but I run into errors like this:
>>>
>>> $ ./gmod_bulk_load_gff3.pl --gfffile
>>> ~/git/ITAG1_release/ITAG1_gene_models_sample.gff3 --organism tomato
>>> --noexon --recreate_cache --analysis --remove_lock --save_tmpfiles
>>> (Re)creating the uniquename cache in the database...
>>> Creating table...
>>> Populating table...
>>> Creating indexes...
>>> Adjusting the primary key sequences (if necessary)...Done.
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: '##feature-ontology' directive handling not yet implemented
>>> ---------------------------------------------------
>>> Preparing data for inserting into the cxgn database
>>> (This may take a while ...)
>>> Loading data into feature table ...
>>>        COPY feature
>>> (feature_id,organism_id,name,uniquename,type_id,is_analysis,seqlen,dbxref_id)
>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>> line 3210.
>>> Loading data into featureloc table ...
>>>        COPY featureloc
>>>
>>> (featureloc_id,feature_id,srcfeature_id,fmin,fmax,strand,phase,rank,locgroup)
>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>> line 3210.
>>> DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer:
>>> ""
>>> CONTEXT:  COPY featureloc, line 1, column strand: "" at
>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>>> line 3.
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: calling endcopy for featureloc failed:
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>>> /home/leto/local-lib/lib/perl5/Bio/Root/Root.pm:368
>>> STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3222
>>> STACK: Bio::GMOD::DB::Adapter::load_data
>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3144
>>> STACK: ./gmod_bulk_load_gff3.pl:1060
>>> -----------------------------------------------------------
>>>
>>> The salient information is that somehow a strand of "" is attempting
>>> to be inserted into the database, which fails. Note that I have also
>>> uncommented
>>> a warning statement that shows the SQL query that is being executed.
>>>
>>> I have traced this issue to be caused by the sequence-region
>>> directive. When I remove the line, the file loads fine. As another
>>> test, I created a file with nothing but a sequence-region directive,
>>> and the same error occurs. I have attached that file and  the temp
>>> data file that gmod_bulk_load_gff3.pl creates as well. The 6th column
>>> of that file is the strand, and it has a value of "\N, which is the
>>> text representation of NULL.
>>>
>>> It seems to me that something is stringifying the NULL into "" and
>>> then attempting to insert the empty string into strand, which has a
>>> type of smallint. This is what causes the failure.
>>>
>>> I would greatly appreciate any thoughts or comments on how to make the
>>> bulk loading script support the sequence-region directive.
>>>
>>> Thanks
>>>
>>> [0] ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/
>>>
>>> --
>>> Jonathan "Duke" Leto
>>> [hidden email]
>>> http://leto.net
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Sprint
>>> What will you do first with EVO, the first 4G phone?
>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>
>>
>>
>> --
>> ===> PLEASE KEEP RESPONSES ON THE LIST <===
>> http://gmod.org/wiki/GMOD_News
>> http://gmod.org/wiki/Calendar
>> http://gmod.org/wiki/Help_Desk_Feedback
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Sprint
>> What will you do first with EVO, the first 4G phone?
>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>



--
Jonathan "Duke" Leto
[hidden email]
http://leto.net

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive

Fields, Christopher J
I think the part of BioPerl Scott is referring to for significant refactoring is Bio::FeatureIO.  Scott, is that correct?  

Having some tests would really help.  I can always sync them over to the Bio-FeatureIO repo, which is separate from core ATM.  I did uncover some pretty significant bugs during my first round of FeatureIO work which are now fixed (skipping features and/or sequences was one).  Now just waiting on tuits...

chris

On Jul 27, 2010, at 6:39 PM, Jonathan Leto wrote:

> Howdy,
>
> Could you explain what exactly Chado and BioPerl are disagreeing on?
> If modifying BioPerl does not make any BioPerl tests fail and allows the loading
> of sequence-region directives, I think it should be done.
>
> If the part of BioPerl that needs to be modified has no or few tests, I can add
> some and ask the BioPerl people what they think.
>
> Duke
>
>
> On Fri, Jul 23, 2010 at 10:52 AM, Scott Cain <[hidden email]> wrote:
>> This is in fact a current bug; the easiest work around is to get rid
>> of sequence-region directives.  Actually fixing the bug is a little
>> trickier since it is due to the fact the Chado and BioPerl have
>> different ideas of what should happen.  While I could (probably)
>> modify BioPerl to do the right thing (from my perspective), I am
>> reluctant to do that at the moment since that section of BioPerl is
>> slated to be refactored.
>>
>> Scott
>>
>>
>> On Tue, Jul 20, 2010 at 6:55 PM, Dave Clements, GMOD Help Desk
>> <[hidden email]> wrote:
>>> Hi Jonathan,
>>> I've created a bug report on this:
>>>   http://sourceforge.net/tracker/?func=detail&aid=3032325&group_id=27707&atid=391291
>>> This is interesting because the code says:
>>>   This script does not use sequence-region directives for anything.
>>>   If it represents a feature that needs to be inserted into the database,
>>>   it should be represented with a full GFF line.
>>> Dave C.
>>> On Fri, Jul 16, 2010 at 1:31 PM, Jonathan Leto <[hidden email]> wrote:
>>>>
>>>> Howdy,
>>>>
>>>> I have been attempting to load the ITAG GFF3 [0] files, which contain
>>>> ##sequence-region directives, but I run into errors like this:
>>>>
>>>> $ ./gmod_bulk_load_gff3.pl --gfffile
>>>> ~/git/ITAG1_release/ITAG1_gene_models_sample.gff3 --organism tomato
>>>> --noexon --recreate_cache --analysis --remove_lock --save_tmpfiles
>>>> (Re)creating the uniquename cache in the database...
>>>> Creating table...
>>>> Populating table...
>>>> Creating indexes...
>>>> Adjusting the primary key sequences (if necessary)...Done.
>>>>
>>>> --------------------- WARNING ---------------------
>>>> MSG: '##feature-ontology' directive handling not yet implemented
>>>> ---------------------------------------------------
>>>> Preparing data for inserting into the cxgn database
>>>> (This may take a while ...)
>>>> Loading data into feature table ...
>>>>        COPY feature
>>>> (feature_id,organism_id,name,uniquename,type_id,is_analysis,seqlen,dbxref_id)
>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>> line 3210.
>>>> Loading data into featureloc table ...
>>>>        COPY featureloc
>>>>
>>>> (featureloc_id,feature_id,srcfeature_id,fmin,fmax,strand,phase,rank,locgroup)
>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>> line 3210.
>>>> DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer:
>>>> ""
>>>> CONTEXT:  COPY featureloc, line 1, column strand: "" at
>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>>>> line 3.
>>>>
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: calling endcopy for featureloc failed:
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw
>>>> /home/leto/local-lib/lib/perl5/Bio/Root/Root.pm:368
>>>> STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3222
>>>> STACK: Bio::GMOD::DB::Adapter::load_data
>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3144
>>>> STACK: ./gmod_bulk_load_gff3.pl:1060
>>>> -----------------------------------------------------------
>>>>
>>>> The salient information is that somehow a strand of "" is attempting
>>>> to be inserted into the database, which fails. Note that I have also
>>>> uncommented
>>>> a warning statement that shows the SQL query that is being executed.
>>>>
>>>> I have traced this issue to be caused by the sequence-region
>>>> directive. When I remove the line, the file loads fine. As another
>>>> test, I created a file with nothing but a sequence-region directive,
>>>> and the same error occurs. I have attached that file and  the temp
>>>> data file that gmod_bulk_load_gff3.pl creates as well. The 6th column
>>>> of that file is the strand, and it has a value of "\N, which is the
>>>> text representation of NULL.
>>>>
>>>> It seems to me that something is stringifying the NULL into "" and
>>>> then attempting to insert the empty string into strand, which has a
>>>> type of smallint. This is what causes the failure.
>>>>
>>>> I would greatly appreciate any thoughts or comments on how to make the
>>>> bulk loading script support the sequence-region directive.
>>>>
>>>> Thanks
>>>>
>>>> [0] ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/
>>>>
>>>> --
>>>> Jonathan "Duke" Leto
>>>> [hidden email]
>>>> http://leto.net
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> This SF.net email is sponsored by Sprint
>>>> What will you do first with EVO, the first 4G phone?
>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>
>>>
>>>
>>> --
>>> ===> PLEASE KEEP RESPONSES ON THE LIST <===
>>> http://gmod.org/wiki/GMOD_News
>>> http://gmod.org/wiki/Calendar
>>> http://gmod.org/wiki/Help_Desk_Feedback
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Sprint
>>> What will you do first with EVO, the first 4G phone?
>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>>
>
>
>
> --
> Jonathan "Duke" Leto
> [hidden email]
> http://leto.net
>
> ------------------------------------------------------------------------------
> The Palm PDK Hot Apps Program offers developers who use the
> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
> of $1 Million in cash or HP Products. Visit us here for more details:
> http://ad.doubleclick.net/clk;226879339;13503038;l?
> http://clk.atdmt.com/CRS/go/247765532/direct/01/
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive

Scott Cain
An additional (though probably somewhat easy to fix) issue is
Bio::FeatureIO's insistence that ##sequence-region directives get
turned into features.  These bits of data are not sufficient to create
a full fledged feature that Chado requires, which is why the loader
(should) ignore them.  Only it can't, because it defers to
Bio::FeatureIO for file parsing.  If the constructor had a flag to
ignore those directives, that would make life a little better.  Even
better than that would be if Bio::FeatureIO could return a message
stating that a ##sequence-region directive was found but was being
ignored, so that message could be relayed to the user.

On the other hand, I was unaware of Bio::FeatureIO dropping features;
that's somewhat unpleasant.  I recall an issue with skipping
sequences, but I thought that was fixed already.

Scott


On Wed, Jul 28, 2010 at 12:53 AM, Chris Fields <[hidden email]> wrote:

> I think the part of BioPerl Scott is referring to for significant refactoring is Bio::FeatureIO.  Scott, is that correct?
>
> Having some tests would really help.  I can always sync them over to the Bio-FeatureIO repo, which is separate from core ATM.  I did uncover some pretty significant bugs during my first round of FeatureIO work which are now fixed (skipping features and/or sequences was one).  Now just waiting on tuits...
>
> chris
>
> On Jul 27, 2010, at 6:39 PM, Jonathan Leto wrote:
>
>> Howdy,
>>
>> Could you explain what exactly Chado and BioPerl are disagreeing on?
>> If modifying BioPerl does not make any BioPerl tests fail and allows the loading
>> of sequence-region directives, I think it should be done.
>>
>> If the part of BioPerl that needs to be modified has no or few tests, I can add
>> some and ask the BioPerl people what they think.
>>
>> Duke
>>
>>
>> On Fri, Jul 23, 2010 at 10:52 AM, Scott Cain <[hidden email]> wrote:
>>> This is in fact a current bug; the easiest work around is to get rid
>>> of sequence-region directives.  Actually fixing the bug is a little
>>> trickier since it is due to the fact the Chado and BioPerl have
>>> different ideas of what should happen.  While I could (probably)
>>> modify BioPerl to do the right thing (from my perspective), I am
>>> reluctant to do that at the moment since that section of BioPerl is
>>> slated to be refactored.
>>>
>>> Scott
>>>
>>>
>>> On Tue, Jul 20, 2010 at 6:55 PM, Dave Clements, GMOD Help Desk
>>> <[hidden email]> wrote:
>>>> Hi Jonathan,
>>>> I've created a bug report on this:
>>>>   http://sourceforge.net/tracker/?func=detail&aid=3032325&group_id=27707&atid=391291
>>>> This is interesting because the code says:
>>>>   This script does not use sequence-region directives for anything.
>>>>   If it represents a feature that needs to be inserted into the database,
>>>>   it should be represented with a full GFF line.
>>>> Dave C.
>>>> On Fri, Jul 16, 2010 at 1:31 PM, Jonathan Leto <[hidden email]> wrote:
>>>>>
>>>>> Howdy,
>>>>>
>>>>> I have been attempting to load the ITAG GFF3 [0] files, which contain
>>>>> ##sequence-region directives, but I run into errors like this:
>>>>>
>>>>> $ ./gmod_bulk_load_gff3.pl --gfffile
>>>>> ~/git/ITAG1_release/ITAG1_gene_models_sample.gff3 --organism tomato
>>>>> --noexon --recreate_cache --analysis --remove_lock --save_tmpfiles
>>>>> (Re)creating the uniquename cache in the database...
>>>>> Creating table...
>>>>> Populating table...
>>>>> Creating indexes...
>>>>> Adjusting the primary key sequences (if necessary)...Done.
>>>>>
>>>>> --------------------- WARNING ---------------------
>>>>> MSG: '##feature-ontology' directive handling not yet implemented
>>>>> ---------------------------------------------------
>>>>> Preparing data for inserting into the cxgn database
>>>>> (This may take a while ...)
>>>>> Loading data into feature table ...
>>>>>        COPY feature
>>>>> (feature_id,organism_id,name,uniquename,type_id,is_analysis,seqlen,dbxref_id)
>>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>>> line 3210.
>>>>> Loading data into featureloc table ...
>>>>>        COPY featureloc
>>>>>
>>>>> (featureloc_id,feature_id,srcfeature_id,fmin,fmax,strand,phase,rank,locgroup)
>>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>>> line 3210.
>>>>> DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer:
>>>>> ""
>>>>> CONTEXT:  COPY featureloc, line 1, column strand: "" at
>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>>>>> line 3.
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: calling endcopy for featureloc failed:
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw
>>>>> /home/leto/local-lib/lib/perl5/Bio/Root/Root.pm:368
>>>>> STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3222
>>>>> STACK: Bio::GMOD::DB::Adapter::load_data
>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3144
>>>>> STACK: ./gmod_bulk_load_gff3.pl:1060
>>>>> -----------------------------------------------------------
>>>>>
>>>>> The salient information is that somehow a strand of "" is attempting
>>>>> to be inserted into the database, which fails. Note that I have also
>>>>> uncommented
>>>>> a warning statement that shows the SQL query that is being executed.
>>>>>
>>>>> I have traced this issue to be caused by the sequence-region
>>>>> directive. When I remove the line, the file loads fine. As another
>>>>> test, I created a file with nothing but a sequence-region directive,
>>>>> and the same error occurs. I have attached that file and  the temp
>>>>> data file that gmod_bulk_load_gff3.pl creates as well. The 6th column
>>>>> of that file is the strand, and it has a value of "\N, which is the
>>>>> text representation of NULL.
>>>>>
>>>>> It seems to me that something is stringifying the NULL into "" and
>>>>> then attempting to insert the empty string into strand, which has a
>>>>> type of smallint. This is what causes the failure.
>>>>>
>>>>> I would greatly appreciate any thoughts or comments on how to make the
>>>>> bulk loading script support the sequence-region directive.
>>>>>
>>>>> Thanks
>>>>>
>>>>> [0] ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/
>>>>>
>>>>> --
>>>>> Jonathan "Duke" Leto
>>>>> [hidden email]
>>>>> http://leto.net
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> This SF.net email is sponsored by Sprint
>>>>> What will you do first with EVO, the first 4G phone?
>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>> _______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ===> PLEASE KEEP RESPONSES ON THE LIST <===
>>>> http://gmod.org/wiki/GMOD_News
>>>> http://gmod.org/wiki/Calendar
>>>> http://gmod.org/wiki/Help_Desk_Feedback
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> This SF.net email is sponsored by Sprint
>>>> What will you do first with EVO, the first 4G phone?
>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>> Ontario Institute for Cancer Research
>>>
>>
>>
>>
>> --
>> Jonathan "Duke" Leto
>> [hidden email]
>> http://leto.net
>>
>> ------------------------------------------------------------------------------
>> The Palm PDK Hot Apps Program offers developers who use the
>> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
>> of $1 Million in cash or HP Products. Visit us here for more details:
>> http://ad.doubleclick.net/clk;226879339;13503038;l?
>> http://clk.atdmt.com/CRS/go/247765532/direct/01/
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive

Jonathan Leto
Howdy,

There is actually a flag called -ignore_seqregion in
Bio::DB::SeqFeature::Store::GFF3Loader .

It would be nice if gmod_bulk_load_gff3.pl could take that as a
command-line argument
and do the right thing with it.

Duke



On Wed, Jul 28, 2010 at 12:20 PM, Scott Cain <[hidden email]> wrote:

> An additional (though probably somewhat easy to fix) issue is
> Bio::FeatureIO's insistence that ##sequence-region directives get
> turned into features.  These bits of data are not sufficient to create
> a full fledged feature that Chado requires, which is why the loader
> (should) ignore them.  Only it can't, because it defers to
> Bio::FeatureIO for file parsing.  If the constructor had a flag to
> ignore those directives, that would make life a little better.  Even
> better than that would be if Bio::FeatureIO could return a message
> stating that a ##sequence-region directive was found but was being
> ignored, so that message could be relayed to the user.
>
> On the other hand, I was unaware of Bio::FeatureIO dropping features;
> that's somewhat unpleasant.  I recall an issue with skipping
> sequences, but I thought that was fixed already.
>
> Scott
>
>
> On Wed, Jul 28, 2010 at 12:53 AM, Chris Fields <[hidden email]> wrote:
>> I think the part of BioPerl Scott is referring to for significant refactoring is Bio::FeatureIO.  Scott, is that correct?
>>
>> Having some tests would really help.  I can always sync them over to the Bio-FeatureIO repo, which is separate from core ATM.  I did uncover some pretty significant bugs during my first round of FeatureIO work which are now fixed (skipping features and/or sequences was one).  Now just waiting on tuits...
>>
>> chris
>>
>> On Jul 27, 2010, at 6:39 PM, Jonathan Leto wrote:
>>
>>> Howdy,
>>>
>>> Could you explain what exactly Chado and BioPerl are disagreeing on?
>>> If modifying BioPerl does not make any BioPerl tests fail and allows the loading
>>> of sequence-region directives, I think it should be done.
>>>
>>> If the part of BioPerl that needs to be modified has no or few tests, I can add
>>> some and ask the BioPerl people what they think.
>>>
>>> Duke
>>>
>>>
>>> On Fri, Jul 23, 2010 at 10:52 AM, Scott Cain <[hidden email]> wrote:
>>>> This is in fact a current bug; the easiest work around is to get rid
>>>> of sequence-region directives.  Actually fixing the bug is a little
>>>> trickier since it is due to the fact the Chado and BioPerl have
>>>> different ideas of what should happen.  While I could (probably)
>>>> modify BioPerl to do the right thing (from my perspective), I am
>>>> reluctant to do that at the moment since that section of BioPerl is
>>>> slated to be refactored.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Tue, Jul 20, 2010 at 6:55 PM, Dave Clements, GMOD Help Desk
>>>> <[hidden email]> wrote:
>>>>> Hi Jonathan,
>>>>> I've created a bug report on this:
>>>>>   http://sourceforge.net/tracker/?func=detail&aid=3032325&group_id=27707&atid=391291
>>>>> This is interesting because the code says:
>>>>>   This script does not use sequence-region directives for anything.
>>>>>   If it represents a feature that needs to be inserted into the database,
>>>>>   it should be represented with a full GFF line.
>>>>> Dave C.
>>>>> On Fri, Jul 16, 2010 at 1:31 PM, Jonathan Leto <[hidden email]> wrote:
>>>>>>
>>>>>> Howdy,
>>>>>>
>>>>>> I have been attempting to load the ITAG GFF3 [0] files, which contain
>>>>>> ##sequence-region directives, but I run into errors like this:
>>>>>>
>>>>>> $ ./gmod_bulk_load_gff3.pl --gfffile
>>>>>> ~/git/ITAG1_release/ITAG1_gene_models_sample.gff3 --organism tomato
>>>>>> --noexon --recreate_cache --analysis --remove_lock --save_tmpfiles
>>>>>> (Re)creating the uniquename cache in the database...
>>>>>> Creating table...
>>>>>> Populating table...
>>>>>> Creating indexes...
>>>>>> Adjusting the primary key sequences (if necessary)...Done.
>>>>>>
>>>>>> --------------------- WARNING ---------------------
>>>>>> MSG: '##feature-ontology' directive handling not yet implemented
>>>>>> ---------------------------------------------------
>>>>>> Preparing data for inserting into the cxgn database
>>>>>> (This may take a while ...)
>>>>>> Loading data into feature table ...
>>>>>>        COPY feature
>>>>>> (feature_id,organism_id,name,uniquename,type_id,is_analysis,seqlen,dbxref_id)
>>>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>>>> line 3210.
>>>>>> Loading data into featureloc table ...
>>>>>>        COPY featureloc
>>>>>>
>>>>>> (featureloc_id,feature_id,srcfeature_id,fmin,fmax,strand,phase,rank,locgroup)
>>>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>>>> line 3210.
>>>>>> DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer:
>>>>>> ""
>>>>>> CONTEXT:  COPY featureloc, line 1, column strand: "" at
>>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>>>>>> line 3.
>>>>>>
>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>> MSG: calling endcopy for featureloc failed:
>>>>>> STACK: Error::throw
>>>>>> STACK: Bio::Root::Root::throw
>>>>>> /home/leto/local-lib/lib/perl5/Bio/Root/Root.pm:368
>>>>>> STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3222
>>>>>> STACK: Bio::GMOD::DB::Adapter::load_data
>>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3144
>>>>>> STACK: ./gmod_bulk_load_gff3.pl:1060
>>>>>> -----------------------------------------------------------
>>>>>>
>>>>>> The salient information is that somehow a strand of "" is attempting
>>>>>> to be inserted into the database, which fails. Note that I have also
>>>>>> uncommented
>>>>>> a warning statement that shows the SQL query that is being executed.
>>>>>>
>>>>>> I have traced this issue to be caused by the sequence-region
>>>>>> directive. When I remove the line, the file loads fine. As another
>>>>>> test, I created a file with nothing but a sequence-region directive,
>>>>>> and the same error occurs. I have attached that file and  the temp
>>>>>> data file that gmod_bulk_load_gff3.pl creates as well. The 6th column
>>>>>> of that file is the strand, and it has a value of "\N, which is the
>>>>>> text representation of NULL.
>>>>>>
>>>>>> It seems to me that something is stringifying the NULL into "" and
>>>>>> then attempting to insert the empty string into strand, which has a
>>>>>> type of smallint. This is what causes the failure.
>>>>>>
>>>>>> I would greatly appreciate any thoughts or comments on how to make the
>>>>>> bulk loading script support the sequence-region directive.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> [0] ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/
>>>>>>
>>>>>> --
>>>>>> Jonathan "Duke" Leto
>>>>>> [hidden email]
>>>>>> http://leto.net
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> This SF.net email is sponsored by Sprint
>>>>>> What will you do first with EVO, the first 4G phone?
>>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>>> _______________________________________________
>>>>>> Gmod-schema mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ===> PLEASE KEEP RESPONSES ON THE LIST <===
>>>>> http://gmod.org/wiki/GMOD_News
>>>>> http://gmod.org/wiki/Calendar
>>>>> http://gmod.org/wiki/Help_Desk_Feedback
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> This SF.net email is sponsored by Sprint
>>>>> What will you do first with EVO, the first 4G phone?
>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>> _______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>>> Ontario Institute for Cancer Research
>>>>
>>>
>>>
>>>
>>> --
>>> Jonathan "Duke" Leto
>>> [hidden email]
>>> http://leto.net
>>>
>>> ------------------------------------------------------------------------------
>>> The Palm PDK Hot Apps Program offers developers who use the
>>> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
>>> of $1 Million in cash or HP Products. Visit us here for more details:
>>> http://ad.doubleclick.net/clk;226879339;13503038;l?
>>> http://clk.atdmt.com/CRS/go/247765532/direct/01/
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>



--
Jonathan "Duke" Leto
[hidden email]
http://leto.net

------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive

Scott Cain
Hi Duke,

I made the changes in bioperl-live and the schema repositories so that
##sequence-region directives are always ignored by the GFF3 bulk
loader.

Scott


On Mon, Aug 9, 2010 at 2:06 PM, Jonathan Leto <[hidden email]> wrote:

> Howdy,
>
> There is actually a flag called -ignore_seqregion in
> Bio::DB::SeqFeature::Store::GFF3Loader .
>
> It would be nice if gmod_bulk_load_gff3.pl could take that as a
> command-line argument
> and do the right thing with it.
>
> Duke
>
>
>
> On Wed, Jul 28, 2010 at 12:20 PM, Scott Cain <[hidden email]> wrote:
>> An additional (though probably somewhat easy to fix) issue is
>> Bio::FeatureIO's insistence that ##sequence-region directives get
>> turned into features.  These bits of data are not sufficient to create
>> a full fledged feature that Chado requires, which is why the loader
>> (should) ignore them.  Only it can't, because it defers to
>> Bio::FeatureIO for file parsing.  If the constructor had a flag to
>> ignore those directives, that would make life a little better.  Even
>> better than that would be if Bio::FeatureIO could return a message
>> stating that a ##sequence-region directive was found but was being
>> ignored, so that message could be relayed to the user.
>>
>> On the other hand, I was unaware of Bio::FeatureIO dropping features;
>> that's somewhat unpleasant.  I recall an issue with skipping
>> sequences, but I thought that was fixed already.
>>
>> Scott
>>
>>
>> On Wed, Jul 28, 2010 at 12:53 AM, Chris Fields <[hidden email]> wrote:
>>> I think the part of BioPerl Scott is referring to for significant refactoring is Bio::FeatureIO.  Scott, is that correct?
>>>
>>> Having some tests would really help.  I can always sync them over to the Bio-FeatureIO repo, which is separate from core ATM.  I did uncover some pretty significant bugs during my first round of FeatureIO work which are now fixed (skipping features and/or sequences was one).  Now just waiting on tuits...
>>>
>>> chris
>>>
>>> On Jul 27, 2010, at 6:39 PM, Jonathan Leto wrote:
>>>
>>>> Howdy,
>>>>
>>>> Could you explain what exactly Chado and BioPerl are disagreeing on?
>>>> If modifying BioPerl does not make any BioPerl tests fail and allows the loading
>>>> of sequence-region directives, I think it should be done.
>>>>
>>>> If the part of BioPerl that needs to be modified has no or few tests, I can add
>>>> some and ask the BioPerl people what they think.
>>>>
>>>> Duke
>>>>
>>>>
>>>> On Fri, Jul 23, 2010 at 10:52 AM, Scott Cain <[hidden email]> wrote:
>>>>> This is in fact a current bug; the easiest work around is to get rid
>>>>> of sequence-region directives.  Actually fixing the bug is a little
>>>>> trickier since it is due to the fact the Chado and BioPerl have
>>>>> different ideas of what should happen.  While I could (probably)
>>>>> modify BioPerl to do the right thing (from my perspective), I am
>>>>> reluctant to do that at the moment since that section of BioPerl is
>>>>> slated to be refactored.
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>> On Tue, Jul 20, 2010 at 6:55 PM, Dave Clements, GMOD Help Desk
>>>>> <[hidden email]> wrote:
>>>>>> Hi Jonathan,
>>>>>> I've created a bug report on this:
>>>>>>   http://sourceforge.net/tracker/?func=detail&aid=3032325&group_id=27707&atid=391291
>>>>>> This is interesting because the code says:
>>>>>>   This script does not use sequence-region directives for anything.
>>>>>>   If it represents a feature that needs to be inserted into the database,
>>>>>>   it should be represented with a full GFF line.
>>>>>> Dave C.
>>>>>> On Fri, Jul 16, 2010 at 1:31 PM, Jonathan Leto <[hidden email]> wrote:
>>>>>>>
>>>>>>> Howdy,
>>>>>>>
>>>>>>> I have been attempting to load the ITAG GFF3 [0] files, which contain
>>>>>>> ##sequence-region directives, but I run into errors like this:
>>>>>>>
>>>>>>> $ ./gmod_bulk_load_gff3.pl --gfffile
>>>>>>> ~/git/ITAG1_release/ITAG1_gene_models_sample.gff3 --organism tomato
>>>>>>> --noexon --recreate_cache --analysis --remove_lock --save_tmpfiles
>>>>>>> (Re)creating the uniquename cache in the database...
>>>>>>> Creating table...
>>>>>>> Populating table...
>>>>>>> Creating indexes...
>>>>>>> Adjusting the primary key sequences (if necessary)...Done.
>>>>>>>
>>>>>>> --------------------- WARNING ---------------------
>>>>>>> MSG: '##feature-ontology' directive handling not yet implemented
>>>>>>> ---------------------------------------------------
>>>>>>> Preparing data for inserting into the cxgn database
>>>>>>> (This may take a while ...)
>>>>>>> Loading data into feature table ...
>>>>>>>        COPY feature
>>>>>>> (feature_id,organism_id,name,uniquename,type_id,is_analysis,seqlen,dbxref_id)
>>>>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>>>>> line 3210.
>>>>>>> Loading data into featureloc table ...
>>>>>>>        COPY featureloc
>>>>>>>
>>>>>>> (featureloc_id,feature_id,srcfeature_id,fmin,fmax,strand,phase,rank,locgroup)
>>>>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>>>>> line 3210.
>>>>>>> DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer:
>>>>>>> ""
>>>>>>> CONTEXT:  COPY featureloc, line 1, column strand: "" at
>>>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>>>>>>> line 3.
>>>>>>>
>>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>>> MSG: calling endcopy for featureloc failed:
>>>>>>> STACK: Error::throw
>>>>>>> STACK: Bio::Root::Root::throw
>>>>>>> /home/leto/local-lib/lib/perl5/Bio/Root/Root.pm:368
>>>>>>> STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>>>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3222
>>>>>>> STACK: Bio::GMOD::DB::Adapter::load_data
>>>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3144
>>>>>>> STACK: ./gmod_bulk_load_gff3.pl:1060
>>>>>>> -----------------------------------------------------------
>>>>>>>
>>>>>>> The salient information is that somehow a strand of "" is attempting
>>>>>>> to be inserted into the database, which fails. Note that I have also
>>>>>>> uncommented
>>>>>>> a warning statement that shows the SQL query that is being executed.
>>>>>>>
>>>>>>> I have traced this issue to be caused by the sequence-region
>>>>>>> directive. When I remove the line, the file loads fine. As another
>>>>>>> test, I created a file with nothing but a sequence-region directive,
>>>>>>> and the same error occurs. I have attached that file and  the temp
>>>>>>> data file that gmod_bulk_load_gff3.pl creates as well. The 6th column
>>>>>>> of that file is the strand, and it has a value of "\N, which is the
>>>>>>> text representation of NULL.
>>>>>>>
>>>>>>> It seems to me that something is stringifying the NULL into "" and
>>>>>>> then attempting to insert the empty string into strand, which has a
>>>>>>> type of smallint. This is what causes the failure.
>>>>>>>
>>>>>>> I would greatly appreciate any thoughts or comments on how to make the
>>>>>>> bulk loading script support the sequence-region directive.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> [0] ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/
>>>>>>>
>>>>>>> --
>>>>>>> Jonathan "Duke" Leto
>>>>>>> [hidden email]
>>>>>>> http://leto.net
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> This SF.net email is sponsored by Sprint
>>>>>>> What will you do first with EVO, the first 4G phone?
>>>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>>>> _______________________________________________
>>>>>>> Gmod-schema mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ===> PLEASE KEEP RESPONSES ON THE LIST <===
>>>>>> http://gmod.org/wiki/GMOD_News
>>>>>> http://gmod.org/wiki/Calendar
>>>>>> http://gmod.org/wiki/Help_Desk_Feedback
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> This SF.net email is sponsored by Sprint
>>>>>> What will you do first with EVO, the first 4G phone?
>>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>>> _______________________________________________
>>>>>> Gmod-schema mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------------------------------------------------------------------------
>>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>>>> Ontario Institute for Cancer Research
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jonathan "Duke" Leto
>>>> [hidden email]
>>>> http://leto.net
>>>>
>>>> ------------------------------------------------------------------------------
>>>> The Palm PDK Hot Apps Program offers developers who use the
>>>> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
>>>> of $1 Million in cash or HP Products. Visit us here for more details:
>>>> http://ad.doubleclick.net/clk;226879339;13503038;l?
>>>> http://clk.atdmt.com/CRS/go/247765532/direct/01/
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>>
>
>
>
> --
> Jonathan "Duke" Leto
> [hidden email]
> http://leto.net
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema