Re: [GMOD-devel] loading data into chado

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [GMOD-devel] loading data into chado

Scott Cain
Hi Claudia,

Questions about Chado are best sent to the schema mailing list, which
I cc'ed here.

The problem you are having is that the comma has special meaning in
column nine of a GFF3 file, indicating more than one value, so that
feature really has two names,
"30128.m008887#Guanosine-5'-triphosphate" and "3'-diphosphate", which
isn't allowed.  In order for that to be the name, the comma needs to
be URI escaped, which is to say, replaced with "%2C".

Scott


On Tue, Dec 7, 2010 at 2:54 PM, Dinatale C <[hidden email]> wrote:

> To whom it may concern,
>
>  I am attempting to load a preproccessed gff3 file that is a merged gff3
> from a maker output and I am getting this response (below) when I use the
> gmod bulk load script. Could you shed some light for me in solving this
> problem?
>
> Thank you,
>
> Claudia DiNatale
>
> contig00562    blastx    protein_match    24    422    187    -    .
>  ID=contig00562:hit:365;Name=30128.m008887#Guanosine-5'-triphosphate,3'-diphosphate;
>
> A feature may have at most one Name value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
> STACK: Bio::FeatureIO::gff::_handle_feature
> /usr/share/perl5/Bio/FeatureIO/gff.pm:729
> STACK: Bio::FeatureIO::gff::next_feature
> /usr/share/perl5/Bio/FeatureIO/gff.pm:172
> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:777
>
> ------------------------------------------------------------------------------
> What happens now with your Lotus Notes apps - do you make another costly
> upgrade, or settle for being marooned without product support? Time to move
> off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
> use, and manage than apps on traditional platforms. Sign up for the Lotus
> Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
> _______________________________________________
> Gmod-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
What happens now with your Lotus Notes apps - do you make another costly
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: [GMOD-devel] loading data into chado

claudia
Hi thank you for the  quick reply,
  I realize I could manually edit the GFF3, but I have a database full
of files like this produced from 'Maker', is there any script available,
or Maker preference to change this?

Claudia


On 08/12/2010 9:18 AM, Scott Cain wrote:

> Hi Claudia,
>
> Questions about Chado are best sent to the schema mailing list, which
> I cc'ed here.
>
> The problem you are having is that the comma has special meaning in
> column nine of a GFF3 file, indicating more than one value, so that
> feature really has two names,
> "30128.m008887#Guanosine-5'-triphosphate" and "3'-diphosphate", which
> isn't allowed.  In order for that to be the name, the comma needs to
> be URI escaped, which is to say, replaced with "%2C".
>
> Scott
>
>
> On Tue, Dec 7, 2010 at 2:54 PM, Dinatale C<[hidden email]>  wrote:
>> To whom it may concern,
>>
>>   I am attempting to load a preproccessed gff3 file that is a merged gff3
>> from a maker output and I am getting this response (below) when I use the
>> gmod bulk load script. Could you shed some light for me in solving this
>> problem?
>>
>> Thank you,
>>
>> Claudia DiNatale
>>
>> contig00562    blastx    protein_match    24    422    187    -    .
>>   ID=contig00562:hit:365;Name=30128.m008887#Guanosine-5'-triphosphate,3'-diphosphate;
>>
>> A feature may have at most one Name value
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
>> STACK: Bio::FeatureIO::gff::_handle_feature
>> /usr/share/perl5/Bio/FeatureIO/gff.pm:729
>> STACK: Bio::FeatureIO::gff::next_feature
>> /usr/share/perl5/Bio/FeatureIO/gff.pm:172
>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:777
>>
>> ------------------------------------------------------------------------------
>> What happens now with your Lotus Notes apps - do you make another costly
>> upgrade, or settle for being marooned without product support? Time to move
>> off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
>> use, and manage than apps on traditional platforms. Sign up for the Lotus
>> Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
>> _______________________________________________
>> Gmod-devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>>
>>
>
>


------------------------------------------------------------------------------
What happens now with your Lotus Notes apps - do you make another costly
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: [GMOD-devel] loading data into chado

Scott Cain
Hi Claudia,

I doubt if there is a maker preference to fix this, but I'll cc the
maker list just in case.

I also don't know of a script that will do this for you, though it
wouldn't be terribly hard to write a perl script that did it (possibly
even in one line :-)  Anyone want to have a shot at perl golf? :-)

Scott


On Wed, Dec 8, 2010 at 9:47 AM, claudia <[hidden email]> wrote:

> Hi thank you for the  quick reply,
>  I realize I could manually edit the GFF3, but I have a database full of
> files like this produced from 'Maker', is there any script available, or
> Maker preference to change this?
>
> Claudia
>
>
> On 08/12/2010 9:18 AM, Scott Cain wrote:
>>
>> Hi Claudia,
>>
>> Questions about Chado are best sent to the schema mailing list, which
>> I cc'ed here.
>>
>> The problem you are having is that the comma has special meaning in
>> column nine of a GFF3 file, indicating more than one value, so that
>> feature really has two names,
>> "30128.m008887#Guanosine-5'-triphosphate" and "3'-diphosphate", which
>> isn't allowed.  In order for that to be the name, the comma needs to
>> be URI escaped, which is to say, replaced with "%2C".
>>
>> Scott
>>
>>
>> On Tue, Dec 7, 2010 at 2:54 PM, Dinatale C<[hidden email]>  wrote:
>>>
>>> To whom it may concern,
>>>
>>>  I am attempting to load a preproccessed gff3 file that is a merged gff3
>>> from a maker output and I am getting this response (below) when I use the
>>> gmod bulk load script. Could you shed some light for me in solving this
>>> problem?
>>>
>>> Thank you,
>>>
>>> Claudia DiNatale
>>>
>>> contig00562    blastx    protein_match    24    422    187    -    .
>>>
>>>  ID=contig00562:hit:365;Name=30128.m008887#Guanosine-5'-triphosphate,3'-diphosphate;
>>>
>>> A feature may have at most one Name value
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
>>> STACK: Bio::FeatureIO::gff::_handle_feature
>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:729
>>> STACK: Bio::FeatureIO::gff::next_feature
>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:172
>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:777
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> What happens now with your Lotus Notes apps - do you make another costly
>>> upgrade, or settle for being marooned without product support? Time to
>>> move
>>> off Lotus Notes and onto the cloud with Force.com, apps are easier to
>>> build,
>>> use, and manage than apps on traditional platforms. Sign up for the Lotus
>>> Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
>>> _______________________________________________
>>> Gmod-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>>>
>>>
>>
>>
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
What happens now with your Lotus Notes apps - do you make another costly
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: [GMOD-devel] loading data into chado

Fields, Christopher J
In reply to this post by claudia
This should probably be filed as a bug with the authors of MAKER:

http://malachite.genetics.utah.edu/projects/maker/report

I know MAKER uses bioperl and a few other components, so it may be due to a bug within one of those (I can try helping with bioperl issues), but Carson & company would probably be the best people to ask about this.  In the meantime, I agree with Scott re: this could be fixed post-run, the tricky bit is to decide which commas need URI substitution and which are actually needed for denoting multiple values for a specific tag name.

chris

On Dec 8, 2010, at 8:47 AM, claudia wrote:

> Hi thank you for the  quick reply,
>  I realize I could manually edit the GFF3, but I have a database full
> of files like this produced from 'Maker', is there any script available,
> or Maker preference to change this?
>
> Claudia
>
>
> On 08/12/2010 9:18 AM, Scott Cain wrote:
>> Hi Claudia,
>>
>> Questions about Chado are best sent to the schema mailing list, which
>> I cc'ed here.
>>
>> The problem you are having is that the comma has special meaning in
>> column nine of a GFF3 file, indicating more than one value, so that
>> feature really has two names,
>> "30128.m008887#Guanosine-5'-triphosphate" and "3'-diphosphate", which
>> isn't allowed.  In order for that to be the name, the comma needs to
>> be URI escaped, which is to say, replaced with "%2C".
>>
>> Scott
>>
>>
>> On Tue, Dec 7, 2010 at 2:54 PM, Dinatale C<[hidden email]>  wrote:
>>> To whom it may concern,
>>>
>>>  I am attempting to load a preproccessed gff3 file that is a merged gff3
>>> from a maker output and I am getting this response (below) when I use the
>>> gmod bulk load script. Could you shed some light for me in solving this
>>> problem?
>>>
>>> Thank you,
>>>
>>> Claudia DiNatale
>>>
>>> contig00562    blastx    protein_match    24    422    187    -    .
>>>  ID=contig00562:hit:365;Name=30128.m008887#Guanosine-5'-triphosphate,3'-diphosphate;
>>>
>>> A feature may have at most one Name value
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
>>> STACK: Bio::FeatureIO::gff::_handle_feature
>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:729
>>> STACK: Bio::FeatureIO::gff::next_feature
>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:172
>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:777
>>>
>>> ------------------------------------------------------------------------------
>>> What happens now with your Lotus Notes apps - do you make another costly
>>> upgrade, or settle for being marooned without product support? Time to move
>>> off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
>>> use, and manage than apps on traditional platforms. Sign up for the Lotus
>>> Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
>>> _______________________________________________
>>> Gmod-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>>>
>>>
>>
>>
>
>
> ------------------------------------------------------------------------------
> What happens now with your Lotus Notes apps - do you make another costly
> upgrade, or settle for being marooned without product support? Time to move
> off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
> use, and manage than apps on traditional platforms. Sign up for the Lotus
> Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
What happens now with your Lotus Notes apps - do you make another costly
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: [GMOD-devel] loading data into chado

Scott Cain
In reply to this post by Scott Cain
Hi Claudia,

As a quick fix, you can use the attached perl file.  I think the only
assumption I made when writing this (which is perhaps a little more
verbose that I would have typically done :-) is that the value of the
Name tag ends with a semicolon, as it did in your example line.  If
the Name value is the last thing on the line, the semicolon isn't
required, but it is not unusual for it to be there because of how the
file is constructed.  If it can't be counted on to be there, the
regular expression that finds the commas to replace would have to be
changed a little bit.

To use it, do this:

  perl comma-fix.pl problemfile.gff > new_gff_file.gff

which should hopefully do the trick.

Scott


On Wed, Dec 8, 2010 at 9:55 AM, Scott Cain <[hidden email]> wrote:

> Hi Claudia,
>
> I doubt if there is a maker preference to fix this, but I'll cc the
> maker list just in case.
>
> I also don't know of a script that will do this for you, though it
> wouldn't be terribly hard to write a perl script that did it (possibly
> even in one line :-)  Anyone want to have a shot at perl golf? :-)
>
> Scott
>
>
> On Wed, Dec 8, 2010 at 9:47 AM, claudia <[hidden email]> wrote:
>> Hi thank you for the  quick reply,
>>  I realize I could manually edit the GFF3, but I have a database full of
>> files like this produced from 'Maker', is there any script available, or
>> Maker preference to change this?
>>
>> Claudia
>>
>>
>> On 08/12/2010 9:18 AM, Scott Cain wrote:
>>>
>>> Hi Claudia,
>>>
>>> Questions about Chado are best sent to the schema mailing list, which
>>> I cc'ed here.
>>>
>>> The problem you are having is that the comma has special meaning in
>>> column nine of a GFF3 file, indicating more than one value, so that
>>> feature really has two names,
>>> "30128.m008887#Guanosine-5'-triphosphate" and "3'-diphosphate", which
>>> isn't allowed.  In order for that to be the name, the comma needs to
>>> be URI escaped, which is to say, replaced with "%2C".
>>>
>>> Scott
>>>
>>>
>>> On Tue, Dec 7, 2010 at 2:54 PM, Dinatale C<[hidden email]>  wrote:
>>>>
>>>> To whom it may concern,
>>>>
>>>>  I am attempting to load a preproccessed gff3 file that is a merged gff3
>>>> from a maker output and I am getting this response (below) when I use the
>>>> gmod bulk load script. Could you shed some light for me in solving this
>>>> problem?
>>>>
>>>> Thank you,
>>>>
>>>> Claudia DiNatale
>>>>
>>>> contig00562    blastx    protein_match    24    422    187    -    .
>>>>
>>>>  ID=contig00562:hit:365;Name=30128.m008887#Guanosine-5'-triphosphate,3'-diphosphate;
>>>>
>>>> A feature may have at most one Name value
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
>>>> STACK: Bio::FeatureIO::gff::_handle_feature
>>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:729
>>>> STACK: Bio::FeatureIO::gff::next_feature
>>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:172
>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:777
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> What happens now with your Lotus Notes apps - do you make another costly
>>>> upgrade, or settle for being marooned without product support? Time to
>>>> move
>>>> off Lotus Notes and onto the cloud with Force.com, apps are easier to
>>>> build,
>>>> use, and manage than apps on traditional platforms. Sign up for the Lotus
>>>> Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
>>>> _______________________________________________
>>>> Gmod-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF Dev2Dev email is sponsored by:

WikiLeaks The End of the Free Internet
http://p.sf.net/sfu/therealnews-com
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema

comma-fix.pl (360 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [maker-devel] [GMOD-devel] loading data into chado

Carson Hinton Holt
In reply to this post by Scott Cain
Re: [maker-devel] [GMOD-devel] loading data into chado You can also do this with command line perl like so -->

cat file.gff | perl -MURI::Escape -ane '$_ =~ s/(Name|ID)=([^\;\n]+)/"$1=".uri_escape($2, ",\x27\#")/ge; print $_' > fixed_file.gff

I am surprised this is not already being escaped in MAKER.  Which version are you using?

Thanks,
Carson



On 12/8/10 7:55 AM, "Scott Cain" <scott@...> wrote:

Hi Claudia,

I doubt if there is a maker preference to fix this, but I'll cc the
maker list just in case.

I also don't know of a script that will do this for you, though it
wouldn't be terribly hard to write a perl script that did it (possibly
even in one line :-)  Anyone want to have a shot at perl golf? :-)

Scott


On Wed, Dec 8, 2010 at 9:47 AM, claudia <dinatal@...> wrote:
> Hi thank you for the  quick reply,
>  I realize I could manually edit the GFF3, but I have a database full of
> files like this produced from 'Maker', is there any script available, or
> Maker preference to change this?
>
> Claudia
>
>
> On 08/12/2010 9:18 AM, Scott Cain wrote:
>>
>> Hi Claudia,
>>
>> Questions about Chado are best sent to the schema mailing list, which
>> I cc'ed here.
>>
>> The problem you are having is that the comma has special meaning in
>> column nine of a GFF3 file, indicating more than one value, so that
>> feature really has two names,
>> "30128.m008887#Guanosine-5'-triphosphate" and "3'-diphosphate", which
>> isn't allowed.  In order for that to be the name, the comma needs to
>> be URI escaped, which is to say, replaced with "%2C".
>>
>> Scott
>>
>>
>> On Tue, Dec 7, 2010 at 2:54 PM, Dinatale C<dinatal@...>  wrote:
>>>
>>> To whom it may concern,
>>>
>>>  I am attempting to load a preproccessed gff3 file that is a merged gff3
>>> from a maker output and I am getting this response (below) when I use the
>>> gmod bulk load script. Could you shed some light for me in solving this
>>> problem?
>>>
>>> Thank you,
>>>
>>> Claudia DiNatale
>>>
>>> contig00562    blastx    protein_match    24    422    187    -    .
>>>
>>>  ID=contig00562:hit:365;Name=30128.m008887#Guanosine-5'-triphosphate,3'-diphosphate;
>>>
>>> A feature may have at most one Name value
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
>>> STACK: Bio::FeatureIO::gff::_handle_feature
>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:729
>>> STACK: Bio::FeatureIO::gff::next_feature
>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:172
>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:777
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> What happens now with your Lotus Notes apps - do you make another costly
>>> upgrade, or settle for being marooned without product support? Time to
>>> move
>>> off Lotus Notes and onto the cloud with Force.com, apps are easier to
>>> build,
>>> use, and manage than apps on traditional platforms. Sign up for the Lotus
>>> Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
>>> _______________________________________________
>>> Gmod-devel mailing list
>>> Gmod-devel@...
>>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>>>
>>>
>>
>>
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

_______________________________________________
maker-devel mailing list
maker-devel@...
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema