error loading sequence

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

error loading sequence

anja
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: error loading sequence

Scott Cain
Hi Anja,

Can you give us more information, like the rest of the error message
and a sample of the data that is causing the problem?  My initial
guess is there is a problem with the GFF.

Scott


On Fri, Sep 3, 2010 at 6:07 AM, Anja Friedrich
<[hidden email]> wrote:

> Hi all,
>
> I deleted my older post because I solved the problem. But I got a new one
> with different data
>
> MSG: calling endcopy for feature_relationship failed:
>
> How can I fix that?
>
> Regards,
> Anja
>
>
> ------------------------------------------------------------------------------
> This SF.net Dev2Dev email is sponsored by:
>
> Show off your parallel programming skills.
> Enter the Intel(R) Threading Challenge 2010.
> http://p.sf.net/sfu/intel-thread-sfd
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: error loading sequence

anja
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: error loading sequence

Scott Cain
Hi Anja,

There are multiple problems with this GFF that I spotted in just a few
seconds of looking.  The genbank2gff3.pl script is not foolproof
because there is considerable variability in the way that people
encode Genbanke entries.  Here's a few of the problems:

1. Shared ID to indicate features with multiple locations, like in these lines:

NC_010109 GenBank CDS 103413 103438 . - . ID=rps12;Dbxref=GI:161784216,GeneID:5787510;Note=trans-splicing
of 5'-rps12 (exon 1) and 3'-rps12 (exons 2 and 3) in
IR;codon_start=1;gene=rps12;locus_tag=LemiCp046;product=ribosomal
protein S12;protein_id=YP_001595487.1;trans_splicing=_no_value;transl_table=11;translation=length.123
NC_010109 GenBank CDS 103979 104210 . - . ID=rps12;Dbxref=GI:161784216,GeneID:5787510;Note=trans-splicing
of 5'-rps12 (exon 1) and 3'-rps12 (exons 2 and 3) in
IR;codon_start=1;gene=rps12;locus_tag=LemiCp046;product=ribosomal
protein S12;protein_id=YP_001595487.1;trans_splicing=_no_value;transl_table=11;translation=length.123
NC_010109 GenBank CDS 74638 74751 . - . ID=rps12;Dbxref=GI:161784216,GeneID:5787510;Note=trans-splicing
of 5'-rps12 (exon 1) and 3'-rps12 (exons 2 and 3) in
IR;codon_start=1;gene=rps12;locus_tag=LemiCp046;product=ribosomal
protein S12;protein_id=YP_001595487.1;trans_splicing=_no_value;transl_table=11;translation=length.123

While this is allowed by the GFF3 spec, it is not allowed by the Chado
GFF3 loader (see `perldoc gmod_bulk_load_gff3.pl` for more info).

2. Non unique IDs, like these lines:

NC_010109 GenBank CDS 164556 164837 . + . ID=rpl23;Dbxref=GI:161784255,GeneID:5787585;codon_start=1;gene=rpl23;locus_tag=LemiCp089;product=ribosomal
protein L23;protein_id=YP_001595571.1;transl_table=11;translation=length.93
NC_010109 GenBank gene 164556 164837 . + . ID=rpl23;Dbxref=GeneID:5787585;gene=rpl23;locus_tag=LemiCp089

Here it violates the GFF3 spec, as these features share the same ID
but are different types, so they aren't like that to show multiple
locations.  In addition, I assume that the CDS is part of the gene,
and the Chado loader requires features to be sorted so that the parent
comes first.

Scott


On Fri, Sep 3, 2010 at 9:03 AM, Anja Friedrich
<[hidden email]> wrote:

> Hi Scott,
>
> its the same gff I used before to load the data. And it was working fine...
> I attached it. It was created with
>
>  bp_genbank2gff3.pl from a genbank entry...
>
> Here is the full error message:
>
> Loading data into feature table ...
> Loading data into featureloc table ...
> Loading data into feature_relationship table ...
> DBD::Pg::db pg_endcopy failed: ERROR:  invalid syntax entry for integer: »«
> KONTEXT:  COPY feature_relationship, Zeile 1, Spalte type_id: »« at
> /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 3222, <$fh> line
> 55.
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: calling endcopy for feature_relationship failed:
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368
> STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
> /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:3222
> STACK: Bio::GMOD::DB::Adapter::load_data
> /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:3144
> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:1063
> -----------------------------------------------------------
>
> Regards,
> Anja
>
>> Date: Fri, 3 Sep 2010 08:46:19 -0400
>> Subject: Re: [Gmod-schema] error loading sequence
>> From: [hidden email]
>> To: [hidden email]
>> CC: [hidden email]
>>
>> Hi Anja,
>>
>> Can you give us more information, like the rest of the error message
>> and a sample of the data that is causing the problem? My initial
>> guess is there is a problem with the GFF.
>>
>> Scott
>>
>>
>> On Fri, Sep 3, 2010 at 6:07 AM, Anja Friedrich
>> <[hidden email]> wrote:
>> > Hi all,
>> >
>> > I deleted my older post because I solved the problem. But I got a new
>> > one
>> > with different data
>> >
>> > MSG: calling endcopy for feature_relationship failed:
>> >
>> > How can I fix that?
>> >
>> > Regards,
>> > Anja
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > This SF.net Dev2Dev email is sponsored by:
>> >
>> > Show off your parallel programming skills.
>> > Enter the Intel(R) Threading Challenge 2010.
>> > http://p.sf.net/sfu/intel-thread-sfd
>> > _______________________________________________
>> > Gmod-schema mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: error loading sequence

anja
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: error loading sequence

Scott Cain
Hi Anja,

The preprocessor is only going to work on an already valid GFF3 file.
My guess is that you're going to have to edit the GFF3 file by hand to
get it correct (or write a perl script that will do the fixing for
you, if this is the sort of thing that will need to be done often).
The flaws I pointed out in my previous email would still have to be
fixed.  You will probably find it easier to edit the unsorted file.

Scott


On Fri, Sep 3, 2010 at 10:35 AM, Anja Friedrich
<[hidden email]> wrote:

> Hi Scott,
>
> I used gmod_gff3_preprocessor.pl to sort the file.
> What else can I do?
>
> Anja
>
>> Date: Fri, 3 Sep 2010 09:35:37 -0400
>> Subject: Re: [Gmod-schema] error loading sequence
>> From: [hidden email]
>> To: [hidden email]
>> CC: [hidden email]
>>
>> Hi Anja,
>>
>> There are multiple problems with this GFF that I spotted in just a few
>> seconds of looking. The genbank2gff3.pl script is not foolproof
>> because there is considerable variability in the way that people
>> encode Genbanke entries. Here's a few of the problems:
>>
>> 1. Shared ID to indicate features with multiple locations, like in these
>> lines:
>>
>> NC_010109 GenBank CDS 103413 103438 . - .
>> ID=rps12;Dbxref=GI:161784216,GeneID:5787510;Note=trans-splicing
>> of 5'-rps12 (exon 1) and 3'-rps12 (exons 2 and 3) in
>> IR;codon_start=1;gene=rps12;locus_tag=LemiCp046;product=ribosomal
>> protein
>> S12;protein_id=YP_001595487.1;trans_splicing=_no_value;transl_table=11;translation=length.123
>> NC_010109 GenBank CDS 103979 104210 . - .
>> ID=rps12;Dbxref=GI:161784216,GeneID:5787510;Note=trans-splicing
>> of 5'-rps12 (exon 1) and 3'-rps12 (exons 2 and 3) in
>> IR;codon_start=1;gene=rps12;locus_tag=LemiCp046;product=ribosomal
>> protein
>> S12;protein_id=YP_001595487.1;trans_splicing=_no_value;transl_table=11;translation=length.123
>> NC_010109 GenBank CDS 74638 74751 . - .
>> ID=rps12;Dbxref=GI:161784216,GeneID:5787510;Note=trans-splicing
>> of 5'-rps12 (exon 1) and 3'-rps12 (exons 2 and 3) in
>> IR;codon_start=1;gene=rps12;locus_tag=LemiCp046;product=ribosomal
>> protein
>> S12;protein_id=YP_001595487.1;trans_splicing=_no_value;transl_table=11;translation=length.123
>>
>> While this is allowed by the GFF3 spec, it is not allowed by the Chado
>> GFF3 loader (see `perldoc gmod_bulk_load_gff3.pl` for more info).
>>
>> 2. Non unique IDs, like these lines:
>>
>> NC_010109 GenBank CDS 164556 164837 . + .
>> ID=rpl23;Dbxref=GI:161784255,GeneID:5787585;codon_start=1;gene=rpl23;locus_tag=LemiCp089;product=ribosomal
>> protein
>> L23;protein_id=YP_001595571.1;transl_table=11;translation=length.93
>> NC_010109 GenBank gene 164556 164837 . + .
>> ID=rpl23;Dbxref=GeneID:5787585;gene=rpl23;locus_tag=LemiCp089
>>
>> Here it violates the GFF3 spec, as these features share the same ID
>> but are different types, so they aren't like that to show multiple
>> locations. In addition, I assume that the CDS is part of the gene,
>> and the Chado loader requires features to be sorted so that the parent
>> comes first.
>>
>> Scott
>>
>>
>> On Fri, Sep 3, 2010 at 9:03 AM, Anja Friedrich
>> <[hidden email]> wrote:
>> > Hi Scott,
>> >
>> > its the same gff I used before to load the data. And it was working
>> > fine...
>> > I attached it. It was created with
>> >
>> > bp_genbank2gff3.pl from a genbank entry...
>> >
>> > Here is the full error message:
>> >
>> > Loading data into feature table ...
>> > Loading data into featureloc table ...
>> > Loading data into feature_relationship table ...
>> > DBD::Pg::db pg_endcopy failed: ERROR: invalid syntax entry for integer:
>> > »«
>> > KONTEXT: COPY feature_relationship, Zeile 1, Spalte type_id: »« at
>> > /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>> > line
>> > 55.
>> >
>> > ------------- EXCEPTION: Bio::Root::Exception -------------
>> > MSG: calling endcopy for feature_relationship failed:
>> > STACK: Error::throw
>> > STACK: Bio::Root::Root::throw
>> > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368
>> > STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>> > /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:3222
>> > STACK: Bio::GMOD::DB::Adapter::load_data
>> > /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:3144
>> > STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:1063
>> > -----------------------------------------------------------
>> >
>> > Regards,
>> > Anja
>> >
>> >> Date: Fri, 3 Sep 2010 08:46:19 -0400
>> >> Subject: Re: [Gmod-schema] error loading sequence
>> >> From: [hidden email]
>> >> To: [hidden email]
>> >> CC: [hidden email]
>> >>
>> >> Hi Anja,
>> >>
>> >> Can you give us more information, like the rest of the error message
>> >> and a sample of the data that is causing the problem? My initial
>> >> guess is there is a problem with the GFF.
>> >>
>> >> Scott
>> >>
>> >>
>> >> On Fri, Sep 3, 2010 at 6:07 AM, Anja Friedrich
>> >> <[hidden email]> wrote:
>> >> > Hi all,
>> >> >
>> >> > I deleted my older post because I solved the problem. But I got a new
>> >> > one
>> >> > with different data
>> >> >
>> >> > MSG: calling endcopy for feature_relationship failed:
>> >> >
>> >> > How can I fix that?
>> >> >
>> >> > Regards,
>> >> > Anja
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > ------------------------------------------------------------------------------
>> >> > This SF.net Dev2Dev email is sponsored by:
>> >> >
>> >> > Show off your parallel programming skills.
>> >> > Enter the Intel(R) Threading Challenge 2010.
>> >> > http://p.sf.net/sfu/intel-thread-sfd
>> >> > _______________________________________________
>> >> > Gmod-schema mailing list
>> >> > [hidden email]
>> >> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.                                   scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: error loading sequence

anja
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: error loading sequence

Scott Cain
Hi Anja,

I can't tell you why it was working fine before; the GFF you sent to
the list has multiple problems and shouldn't have worked (and makes me
wonder if what you think got loaded really did).

Scott


On Fri, Sep 3, 2010 at 10:43 AM, Anja Friedrich
<[hidden email]> wrote:

> Hi Scott,
>
> it was working fine before. How come that there are so many errors now? I
> didn't change anything
>
> Anja
>
> ________________________________
> Date: Fri, 3 Sep 2010 07:39:42 -0700
> From: [hidden email]
> To: [hidden email]
> Subject: Re: error loading sequence
>
> Hi Anja,
>
> The preprocessor is only going to work on an already valid GFF3 file.
> My guess is that you're going to have to edit the GFF3 file by hand to
> get it correct (or write a perl script that will do the fixing for
> you, if this is the sort of thing that will need to be done often).
> The flaws I pointed out in my previous email would still have to be
> fixed.  You will probably find it easier to edit the unsorted file.
>
> Scott
>
>
> On Fri, Sep 3, 2010 at 10:35 AM, Anja Friedrich
> <[hidden email]> wrote:
>> Hi Scott,
>>
>> I used gmod_gff3_preprocessor.pl to sort the file.
>> What else can I do?
>>
>> Anja
>>
>>> Date: Fri, 3 Sep 2010 09:35:37 -0400
>>> Subject: Re: [Gmod-schema] error loading sequence
>>> From: [hidden email]
>>> To: [hidden email]
>>> CC: [hidden email]
>>>
>>> Hi Anja,
>>>
>>> There are multiple problems with this GFF that I spotted in just a few
>>> seconds of looking. The genbank2gff3.pl script is not foolproof
>>> because there is considerable variability in the way that people
>>> encode Genbanke entries. Here's a few of the problems:
>>>
>>> 1. Shared ID to indicate features with multiple locations, like in these
>>> lines:
>>>
>>> NC_010109 GenBank CDS 103413 103438 . - .
>>> ID=rps12;Dbxref=GI:161784216,GeneID:5787510;Note=trans-splicing
>>> of 5'-rps12 (exon 1) and 3'-rps12 (exons 2 and 3) in
>>> IR;codon_start=1;gene=rps12;locus_tag=LemiCp046;product=ribosomal
>>> protein
>>>
>>> S12;protein_id=YP_001595487.1;trans_splicing=_no_value;transl_table=11;translation=length.123
>>> NC_010109 GenBank CDS 103979 104210 . - .
>>> ID=rps12;Dbxref=GI:161784216,GeneID:5787510;Note=trans-splicing
>>> of 5'-rps12 (exon 1) and 3'-rps12 (exons 2 and 3) in
>>> IR;codon_start=1;gene=rps12;locus_tag=LemiCp046;product=ribosomal
>>> protein
>>>
>>> S12;protein_id=YP_001595487.1;trans_splicing=_no_value;transl_table=11;translation=length.123
>>> NC_010109 GenBank CDS 74638 74751 . - .
>>> ID=rps12;Dbxref=GI:161784216,GeneID:5787510;Note=trans-splicing
>>> of 5'-rps12 (exon 1) and 3'-rps12 (exons 2 and 3) in
>>> IR;codon_start=1;gene=rps12;locus_tag=LemiCp046;product=ribosomal
>>> protein
>>>
>>> S12;protein_id=YP_001595487.1;trans_splicing=_no_value;transl_table=11;translation=length.123
>>>
>>> While this is allowed by the GFF3 spec, it is not allowed by the Chado
>>> GFF3 loader (see `perldoc gmod_bulk_load_gff3.pl` for more info).
>>>
>>> 2. Non unique IDs, like these lines:
>>>
>>> NC_010109 GenBank CDS 164556 164837 . + .
>>>
>>> ID=rpl23;Dbxref=GI:161784255,GeneID:5787585;codon_start=1;gene=rpl23;locus_tag=LemiCp089;product=ribosomal
>>> protein
>>> L23;protein_id=YP_001595571.1;transl_table=11;translation=length.93
>>> NC_010109 GenBank gene 164556 164837 . + .
>>> ID=rpl23;Dbxref=GeneID:5787585;gene=rpl23;locus_tag=LemiCp089
>>>
>>> Here it violates the GFF3 spec, as these features share the same ID
>>> but are different types, so they aren't like that to show multiple
>>> locations. In addition, I assume that the CDS is part of the gene,
>>> and the Chado loader requires features to be sorted so that the parent
>>> comes first.
>>>
>>> Scott
>>>
>>>
>>> On Fri, Sep 3, 2010 at 9:03 AM, Anja Friedrich
>>> <[hidden email]> wrote:
>>> > Hi Scott,
>>> >
>>> > its the same gff I used before to load the data. And it was working
>>> > fine...
>>> > I attached it. It was created with
>>> >
>>> > bp_genbank2gff3.pl from a genbank entry...
>>> >
>>> > Here is the full error message:
>>> >
>>> > Loading data into feature table ...
>>> > Loading data into featureloc table ...
>>> > Loading data into feature_relationship table ...
>>> > DBD::Pg::db pg_endcopy failed: ERROR: invalid syntax entry for integer:
>>> > »«
>>> > KONTEXT: COPY feature_relationship, Zeile 1, Spalte type_id: »« at
>>> > /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>>> > line
>>> > 55.
>>> >
>>> > ------------- EXCEPTION: Bio::Root::Exception -------------
>>> > MSG: calling endcopy for feature_relationship failed:
>>> > STACK: Error::throw
>>> > STACK: Bio::Root::Root::throw
>>> > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368
>>> > STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>>> > /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:3222
>>> > STACK: Bio::GMOD::DB::Adapter::load_data
>>> > /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:3144
>>> > STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:1063
>>> > -----------------------------------------------------------
>>> >
>>> > Regards,
>>> > Anja
>>> >
>>> >> Date: Fri, 3 Sep 2010 08:46:19 -0400
>>> >> Subject: Re: [Gmod-schema] error loading sequence
>>> >> From: [hidden email]
>>> >> To: [hidden email]
>>> >> CC: [hidden email]
>>> >>
>>> >> Hi Anja,
>>> >>
>>> >> Can you give us more information, like the rest of the error message
>>> >> and a sample of the data that is causing the problem? My initial
>>> >> guess is there is a problem with the GFF.
>>> >>
>>> >> Scott
>>> >>
>>> >>
>>> >> On Fri, Sep 3, 2010 at 6:07 AM, Anja Friedrich
>>> >> <[hidden email]> wrote:
>>> >> > Hi all,
>>> >> >
>>> >> > I deleted my older post because I solved the problem. But I got a
>>> >> > new
>>> >> > one
>>> >> > with different data
>>> >> >
>>> >> > MSG: calling endcopy for feature_relationship failed:
>>> >> >
>>> >> > How can I fix that?
>>> >> >
>>> >> > Regards,
>>> >> > Anja
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > ------------------------------------------------------------------------------
>>> >> > This SF.net Dev2Dev email is sponsored by:
>>> >> >
>>> >> > Show off your parallel programming skills.
>>> >> > Enter the Intel(R) Threading Challenge 2010.
>>> >> > http://p.sf.net/sfu/intel-thread-sfd
>>> >> > _______________________________________________
>>> >> > Gmod-schema mailing list
>>> >> > [hidden email]
>>> >> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >>
>>> >> ------------------------------------------------------------------------
>>> >> Scott Cain, Ph. D.                                   scott at
>>> >> scottcain
>>> >> dot net
>>> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>> >> Ontario Institute for Cancer Research
>>> >
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                   scott at scottcain
>>> dot net
>>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>> Ontario Institute for Cancer Research
>>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>
> ------------------------------------------------------------------------------
> This SF.net Dev2Dev email is sponsored by:
>
> Show off your parallel programming skills.
> Enter the Intel(R) Threading Challenge 2010.
> http://p.sf.net/sfu/intel-thread-sfd
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
> ________________________________
> View message @
> http://gmod.827538.n3.nabble.com/error-loading-sequence-tp1411254p1412464.html
> To unsubscribe from GMOD, click here.
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: error loading sequence

anja
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: error loading sequence

Scott Cain
Hi Anja,

It is much more difficult to "debug" GFF that has been sorted by the
preprocessor, because when the GFF is generated, parent and child
features are usually close to each other, making easier to spot
problems.  Looking at this file there are still features that share
IDs in violation of the spec and what the loader needs.  For example,
these repeat features:

NC_010109 Genbank repeat_region 134734 165955 . - . ID=inverted;note=repeat%20IRa
NC_010109 Genbank repeat_region 89908 121130 . + . ID=inverted;note=repeat%20IRb

and this tRNA and its parent gene:

NC_010109 Genbank gene 55641 56319 . - . ID=tRNA-Val%28UAC%29;locus_tag=LemiCt019;db_xref=GeneID%3A5787541
NC_010109 Genbank tRNA 55641 55677 . - . ID=tRNA-Val%28UAC%29
NC_010109 Genbank tRNA 55641 56319 . - . ID=tRNA-Val%28UAC%29;locus_tag=LemiCt019;db_xref=GeneID%3A5787541;product=tRNA-Val
NC_010109 Genbank tRNA 56282 56319 . - . ID=tRNA-Val%28UAC%29

There are lots of examples like this throughout your GFF file.

Scott


On Fri, Sep 3, 2010 at 11:25 AM, Anja Friedrich
<[hidden email]> wrote:

> Hi Scott,
>
> I loaded and created it again. Does it look better now?
>
> Cheers,
> Anja
>
>> Date: Fri, 3 Sep 2010 10:59:49 -0400
>> Subject: Re: error loading sequence
>> From: [hidden email]
>> To: [hidden email]
>> CC: [hidden email]
>>
>> Hi Anja,
>>
>> I can't tell you why it was working fine before; the GFF you sent to
>> the list has multiple problems and shouldn't have worked (and makes me
>> wonder if what you think got loaded really did).
>>
>> Scott
>>
>>
>> On Fri, Sep 3, 2010 at 10:43 AM, Anja Friedrich
>> <[hidden email]> wrote:
>> > Hi Scott,
>> >
>> > it was working fine before. How come that there are so many errors now?
>> > I
>> > didn't change anything
>> >
>> > Anja
>> >
>> > ________________________________
>> > Date: Fri, 3 Sep 2010 07:39:42 -0700
>> > From: [hidden email]
>> > To: [hidden email]
>> > Subject: Re: error loading sequence
>> >
>> > Hi Anja,
>> >
>> > The preprocessor is only going to work on an already valid GFF3 file.
>> > My guess is that you're going to have to edit the GFF3 file by hand to
>> > get it correct (or write a perl script that will do the fixing for
>> > you, if this is the sort of thing that will need to be done often).
>> > The flaws I pointed out in my previous email would still have to be
>> > fixed.  You will probably find it easier to edit the unsorted file.
>> >
>> > Scott
>> >
>> >
>> > On Fri, Sep 3, 2010 at 10:35 AM, Anja Friedrich
>> > <[hidden email]> wrote:
>> >> Hi Scott,
>> >>
>> >> I used gmod_gff3_preprocessor.pl to sort the file.
>> >> What else can I do?
>> >>
>> >> Anja
>> >>
>> >>> Date: Fri, 3 Sep 2010 09:35:37 -0400
>> >>> Subject: Re: [Gmod-schema] error loading sequence
>> >>> From: [hidden email]
>> >>> To: [hidden email]
>> >>> CC: [hidden email]
>> >>>
>> >>> Hi Anja,
>> >>>
>> >>> There are multiple problems with this GFF that I spotted in just a few
>> >>> seconds of looking. The genbank2gff3.pl script is not foolproof
>> >>> because there is considerable variability in the way that people
>> >>> encode Genbanke entries. Here's a few of the problems:
>> >>>
>> >>> 1. Shared ID to indicate features with multiple locations, like in
>> >>> these
>> >>> lines:
>> >>>
>> >>> NC_010109 GenBank CDS 103413 103438 . - .
>> >>> ID=rps12;Dbxref=GI:161784216,GeneID:5787510;Note=trans-splicing
>> >>> of 5'-rps12 (exon 1) and 3'-rps12 (exons 2 and 3) in
>> >>> IR;codon_start=1;gene=rps12;locus_tag=LemiCp046;product=ribosomal
>> >>> protein
>> >>>
>> >>>
>> >>> S12;protein_id=YP_001595487.1;trans_splicing=_no_value;transl_table=11;translation=length.123
>> >>> NC_010109 GenBank CDS 103979 104210 . - .
>> >>> ID=rps12;Dbxref=GI:161784216,GeneID:5787510;Note=trans-splicing
>> >>> of 5'-rps12 (exon 1) and 3'-rps12 (exons 2 and 3) in
>> >>> IR;codon_start=1;gene=rps12;locus_tag=LemiCp046;product=ribosomal
>> >>> protein
>> >>>
>> >>>
>> >>> S12;protein_id=YP_001595487.1;trans_splicing=_no_value;transl_table=11;translation=length.123
>> >>> NC_010109 GenBank CDS 74638 74751 . - .
>> >>> ID=rps12;Dbxref=GI:161784216,GeneID:5787510;Note=trans-splicing
>> >>> of 5'-rps12 (exon 1) and 3'-rps12 (exons 2 and 3) in
>> >>> IR;codon_start=1;gene=rps12;locus_tag=LemiCp046;product=ribosomal
>> >>> protein
>> >>>
>> >>>
>> >>> S12;protein_id=YP_001595487.1;trans_splicing=_no_value;transl_table=11;translation=length.123
>> >>>
>> >>> While this is allowed by the GFF3 spec, it is not allowed by the Chado
>> >>> GFF3 loader (see `perldoc gmod_bulk_load_gff3.pl` for more info).
>> >>>
>> >>> 2. Non unique IDs, like these lines:
>> >>>
>> >>> NC_010109 GenBank CDS 164556 164837 . + .
>> >>>
>> >>>
>> >>> ID=rpl23;Dbxref=GI:161784255,GeneID:5787585;codon_start=1;gene=rpl23;locus_tag=LemiCp089;product=ribosomal
>> >>> protein
>> >>> L23;protein_id=YP_001595571.1;transl_table=11;translation=length.93
>> >>> NC_010109 GenBank gene 164556 164837 . + .
>> >>> ID=rpl23;Dbxref=GeneID:5787585;gene=rpl23;locus_tag=LemiCp089
>> >>>
>> >>> Here it violates the GFF3 spec, as these features share the same ID
>> >>> but are different types, so they aren't like that to show multiple
>> >>> locations. In addition, I assume that the CDS is part of the gene,
>> >>> and the Chado loader requires features to be sorted so that the parent
>> >>> comes first.
>> >>>
>> >>> Scott
>> >>>
>> >>>
>> >>> On Fri, Sep 3, 2010 at 9:03 AM, Anja Friedrich
>> >>> <[hidden email]> wrote:
>> >>> > Hi Scott,
>> >>> >
>> >>> > its the same gff I used before to load the data. And it was working
>> >>> > fine...
>> >>> > I attached it. It was created with
>> >>> >
>> >>> > bp_genbank2gff3.pl from a genbank entry...
>> >>> >
>> >>> > Here is the full error message:
>> >>> >
>> >>> > Loading data into feature table ...
>> >>> > Loading data into featureloc table ...
>> >>> > Loading data into feature_relationship table ...
>> >>> > DBD::Pg::db pg_endcopy failed: ERROR: invalid syntax entry for
>> >>> > integer:
>> >>> > »«
>> >>> > KONTEXT: COPY feature_relationship, Zeile 1, Spalte type_id: »« at
>> >>> > /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>> >>> > line
>> >>> > 55.
>> >>> >
>> >>> > ------------- EXCEPTION: Bio::Root::Exception -------------
>> >>> > MSG: calling endcopy for feature_relationship failed:
>> >>> > STACK: Error::throw
>> >>> > STACK: Bio::Root::Root::throw
>> >>> > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368
>> >>> > STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>> >>> > /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:3222
>> >>> > STACK: Bio::GMOD::DB::Adapter::load_data
>> >>> > /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm:3144
>> >>> > STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:1063
>> >>> > -----------------------------------------------------------
>> >>> >
>> >>> > Regards,
>> >>> > Anja
>> >>> >
>> >>> >> Date: Fri, 3 Sep 2010 08:46:19 -0400
>> >>> >> Subject: Re: [Gmod-schema] error loading sequence
>> >>> >> From: [hidden email]
>> >>> >> To: [hidden email]
>> >>> >> CC: [hidden email]
>> >>> >>
>> >>> >> Hi Anja,
>> >>> >>
>> >>> >> Can you give us more information, like the rest of the error
>> >>> >> message
>> >>> >> and a sample of the data that is causing the problem? My initial
>> >>> >> guess is there is a problem with the GFF.
>> >>> >>
>> >>> >> Scott
>> >>> >>
>> >>> >>
>> >>> >> On Fri, Sep 3, 2010 at 6:07 AM, Anja Friedrich
>> >>> >> <[hidden email]> wrote:
>> >>> >> > Hi all,
>> >>> >> >
>> >>> >> > I deleted my older post because I solved the problem. But I got a
>> >>> >> > new
>> >>> >> > one
>> >>> >> > with different data
>> >>> >> >
>> >>> >> > MSG: calling endcopy for feature_relationship failed:
>> >>> >> >
>> >>> >> > How can I fix that?
>> >>> >> >
>> >>> >> > Regards,
>> >>> >> > Anja
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > ------------------------------------------------------------------------------
>> >>> >> > This SF.net Dev2Dev email is sponsored by:
>> >>> >> >
>> >>> >> > Show off your parallel programming skills.
>> >>> >> > Enter the Intel(R) Threading Challenge 2010.
>> >>> >> > http://p.sf.net/sfu/intel-thread-sfd
>> >>> >> > _______________________________________________
>> >>> >> > Gmod-schema mailing list
>> >>> >> > [hidden email]
>> >>> >> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >>> >> >
>> >>> >> >
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> ------------------------------------------------------------------------
>> >>> >> Scott Cain, Ph. D.                                   scott at
>> >>> >> scottcain
>> >>> >> dot net
>> >>> >> GMOD Coordinator (http://gmod.org/)
>> >>> >> 216-392-3087
>> >>> >> Ontario Institute for Cancer Research
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>>
>> >>> ------------------------------------------------------------------------
>> >>> Scott Cain, Ph. D.                                   scott at
>> >>> scottcain
>> >>> dot net
>> >>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> >>> Ontario Institute for Cancer Research
>> >>
>> >
>> >
>> > --
>> > ------------------------------------------------------------------------
>> > Scott Cain, Ph. D.                                   scott at scottcain
>> > dot
>> > net
>> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> > Ontario Institute for Cancer Research
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > This SF.net Dev2Dev email is sponsored by:
>> >
>> > Show off your parallel programming skills.
>> > Enter the Intel(R) Threading Challenge 2010.
>> > http://p.sf.net/sfu/intel-thread-sfd
>> > _______________________________________________
>> > Gmod-schema mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >
>> >
>> > ________________________________
>> > View message @
>> >
>> > http://gmod.827538.n3.nabble.com/error-loading-sequence-tp1411254p1412464.html
>> > To unsubscribe from GMOD, click here.
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema