Re: [GMOD-devel] loading data into chado

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [GMOD-devel] loading data into chado

Scott Cain
Hi Claudia,

I doubt if there is a maker preference to fix this, but I'll cc the
maker list just in case.

I also don't know of a script that will do this for you, though it
wouldn't be terribly hard to write a perl script that did it (possibly
even in one line :-)  Anyone want to have a shot at perl golf? :-)

Scott


On Wed, Dec 8, 2010 at 9:47 AM, claudia <[hidden email]> wrote:

> Hi thank you for the  quick reply,
>  I realize I could manually edit the GFF3, but I have a database full of
> files like this produced from 'Maker', is there any script available, or
> Maker preference to change this?
>
> Claudia
>
>
> On 08/12/2010 9:18 AM, Scott Cain wrote:
>>
>> Hi Claudia,
>>
>> Questions about Chado are best sent to the schema mailing list, which
>> I cc'ed here.
>>
>> The problem you are having is that the comma has special meaning in
>> column nine of a GFF3 file, indicating more than one value, so that
>> feature really has two names,
>> "30128.m008887#Guanosine-5'-triphosphate" and "3'-diphosphate", which
>> isn't allowed.  In order for that to be the name, the comma needs to
>> be URI escaped, which is to say, replaced with "%2C".
>>
>> Scott
>>
>>
>> On Tue, Dec 7, 2010 at 2:54 PM, Dinatale C<[hidden email]>  wrote:
>>>
>>> To whom it may concern,
>>>
>>>  I am attempting to load a preproccessed gff3 file that is a merged gff3
>>> from a maker output and I am getting this response (below) when I use the
>>> gmod bulk load script. Could you shed some light for me in solving this
>>> problem?
>>>
>>> Thank you,
>>>
>>> Claudia DiNatale
>>>
>>> contig00562    blastx    protein_match    24    422    187    -    .
>>>
>>>  ID=contig00562:hit:365;Name=30128.m008887#Guanosine-5'-triphosphate,3'-diphosphate;
>>>
>>> A feature may have at most one Name value
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
>>> STACK: Bio::FeatureIO::gff::_handle_feature
>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:729
>>> STACK: Bio::FeatureIO::gff::next_feature
>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:172
>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:777
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> What happens now with your Lotus Notes apps - do you make another costly
>>> upgrade, or settle for being marooned without product support? Time to
>>> move
>>> off Lotus Notes and onto the cloud with Force.com, apps are easier to
>>> build,
>>> use, and manage than apps on traditional platforms. Sign up for the Lotus
>>> Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
>>> _______________________________________________
>>> Gmod-devel mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>>>
>>>
>>
>>
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: [GMOD-devel] loading data into chado

Scott Cain
Hi Claudia,

As a quick fix, you can use the attached perl file.  I think the only
assumption I made when writing this (which is perhaps a little more
verbose that I would have typically done :-) is that the value of the
Name tag ends with a semicolon, as it did in your example line.  If
the Name value is the last thing on the line, the semicolon isn't
required, but it is not unusual for it to be there because of how the
file is constructed.  If it can't be counted on to be there, the
regular expression that finds the commas to replace would have to be
changed a little bit.

To use it, do this:

  perl comma-fix.pl problemfile.gff > new_gff_file.gff

which should hopefully do the trick.

Scott


On Wed, Dec 8, 2010 at 9:55 AM, Scott Cain <[hidden email]> wrote:

> Hi Claudia,
>
> I doubt if there is a maker preference to fix this, but I'll cc the
> maker list just in case.
>
> I also don't know of a script that will do this for you, though it
> wouldn't be terribly hard to write a perl script that did it (possibly
> even in one line :-)  Anyone want to have a shot at perl golf? :-)
>
> Scott
>
>
> On Wed, Dec 8, 2010 at 9:47 AM, claudia <[hidden email]> wrote:
>> Hi thank you for the  quick reply,
>>  I realize I could manually edit the GFF3, but I have a database full of
>> files like this produced from 'Maker', is there any script available, or
>> Maker preference to change this?
>>
>> Claudia
>>
>>
>> On 08/12/2010 9:18 AM, Scott Cain wrote:
>>>
>>> Hi Claudia,
>>>
>>> Questions about Chado are best sent to the schema mailing list, which
>>> I cc'ed here.
>>>
>>> The problem you are having is that the comma has special meaning in
>>> column nine of a GFF3 file, indicating more than one value, so that
>>> feature really has two names,
>>> "30128.m008887#Guanosine-5'-triphosphate" and "3'-diphosphate", which
>>> isn't allowed.  In order for that to be the name, the comma needs to
>>> be URI escaped, which is to say, replaced with "%2C".
>>>
>>> Scott
>>>
>>>
>>> On Tue, Dec 7, 2010 at 2:54 PM, Dinatale C<[hidden email]>  wrote:
>>>>
>>>> To whom it may concern,
>>>>
>>>>  I am attempting to load a preproccessed gff3 file that is a merged gff3
>>>> from a maker output and I am getting this response (below) when I use the
>>>> gmod bulk load script. Could you shed some light for me in solving this
>>>> problem?
>>>>
>>>> Thank you,
>>>>
>>>> Claudia DiNatale
>>>>
>>>> contig00562    blastx    protein_match    24    422    187    -    .
>>>>
>>>>  ID=contig00562:hit:365;Name=30128.m008887#Guanosine-5'-triphosphate,3'-diphosphate;
>>>>
>>>> A feature may have at most one Name value
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
>>>> STACK: Bio::FeatureIO::gff::_handle_feature
>>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:729
>>>> STACK: Bio::FeatureIO::gff::next_feature
>>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:172
>>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:777
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> What happens now with your Lotus Notes apps - do you make another costly
>>>> upgrade, or settle for being marooned without product support? Time to
>>>> move
>>>> off Lotus Notes and onto the cloud with Force.com, apps are easier to
>>>> build,
>>>> use, and manage than apps on traditional platforms. Sign up for the Lotus
>>>> Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
>>>> _______________________________________________
>>>> Gmod-devel mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

comma-fix.pl (360 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [GMOD-devel] loading data into chado

Carson Hinton Holt
In reply to this post by Scott Cain
Re: [maker-devel] [GMOD-devel] loading data into chado You can also do this with command line perl like so -->

cat file.gff | perl -MURI::Escape -ane '$_ =~ s/(Name|ID)=([^\;\n]+)/"$1=".uri_escape($2, ",\x27\#")/ge; print $_' > fixed_file.gff

I am surprised this is not already being escaped in MAKER.  Which version are you using?

Thanks,
Carson



On 12/8/10 7:55 AM, "Scott Cain" <scott@...> wrote:

Hi Claudia,

I doubt if there is a maker preference to fix this, but I'll cc the
maker list just in case.

I also don't know of a script that will do this for you, though it
wouldn't be terribly hard to write a perl script that did it (possibly
even in one line :-)  Anyone want to have a shot at perl golf? :-)

Scott


On Wed, Dec 8, 2010 at 9:47 AM, claudia <dinatal@...> wrote:
> Hi thank you for the  quick reply,
>  I realize I could manually edit the GFF3, but I have a database full of
> files like this produced from 'Maker', is there any script available, or
> Maker preference to change this?
>
> Claudia
>
>
> On 08/12/2010 9:18 AM, Scott Cain wrote:
>>
>> Hi Claudia,
>>
>> Questions about Chado are best sent to the schema mailing list, which
>> I cc'ed here.
>>
>> The problem you are having is that the comma has special meaning in
>> column nine of a GFF3 file, indicating more than one value, so that
>> feature really has two names,
>> "30128.m008887#Guanosine-5'-triphosphate" and "3'-diphosphate", which
>> isn't allowed.  In order for that to be the name, the comma needs to
>> be URI escaped, which is to say, replaced with "%2C".
>>
>> Scott
>>
>>
>> On Tue, Dec 7, 2010 at 2:54 PM, Dinatale C<dinatal@...>  wrote:
>>>
>>> To whom it may concern,
>>>
>>>  I am attempting to load a preproccessed gff3 file that is a merged gff3
>>> from a maker output and I am getting this response (below) when I use the
>>> gmod bulk load script. Could you shed some light for me in solving this
>>> problem?
>>>
>>> Thank you,
>>>
>>> Claudia DiNatale
>>>
>>> contig00562    blastx    protein_match    24    422    187    -    .
>>>
>>>  ID=contig00562:hit:365;Name=30128.m008887#Guanosine-5'-triphosphate,3'-diphosphate;
>>>
>>> A feature may have at most one Name value
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
>>> STACK: Bio::FeatureIO::gff::_handle_feature
>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:729
>>> STACK: Bio::FeatureIO::gff::next_feature
>>> /usr/share/perl5/Bio/FeatureIO/gff.pm:172
>>> STACK: /usr/local/bin/gmod_bulk_load_gff3.pl:777
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> What happens now with your Lotus Notes apps - do you make another costly
>>> upgrade, or settle for being marooned without product support? Time to
>>> move
>>> off Lotus Notes and onto the cloud with Force.com, apps are easier to
>>> build,
>>> use, and manage than apps on traditional platforms. Sign up for the Lotus
>>> Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
>>> _______________________________________________
>>> Gmod-devel mailing list
>>> Gmod-devel@...
>>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>>>
>>>
>>
>>
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

_______________________________________________
maker-devel mailing list
maker-devel@...
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org