[Gmod-ajax] flatfile-to-json can't find ID

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Gmod-ajax] flatfile-to-json can't find ID

Scott Cain
Hi All,

I'm trying to run flatfile-to-json on a largish GFF3 file (4M lines) and it fails with the error below.  I'm wondering if this is related to running out of memory, since the parent it says it can't find (ID=Gene:WBGene00007597) exists in the file, though the ID that it indicates is causing the problem (ID=Transcript:C15A11.2) does not exist in the file, but Transcript:C15A11.2.2 and Transcript:C15A11.2.1 do exist, so it seems like flatfile-to-json is truncating that ID (though perhaps it does that for display purposes?  Seems like a bad idea if it does).  There are no ### directives in the file (though it would be nice if there were!).  Changing sortMem from 2,000,000,000 to 500,000,000 doesn't make a difference though.

Any ideas?

Thanks,
Scott

Here's the command and output:

bin/flatfile-to-json.pl --gff ../c_elegans_gff/I.c_elegans.PRJNA13758.WS243.annotations.gff3.out.gff3 --out data/c_elegans --type gene:WormBase --trackLabel gene_from_gff --trackType CanvasFeatures --key genes_from_gff --sortMem 2000000000

And I get this:

GFF3 parse error: some features reference other features that do not exist in the file (or in the same '###' scope).  A list of them:

 ID                 |           Cannot Find
----------------------------------------------------------------------
Transcript:C15A11.2 | Parent=Gene:WBGene00007597


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: flatfile-to-json can't find ID

Colin
You might have to increase the --maxLookback parameter for flatfile-to-json. The fact the "Parent" is missing means that the script isn't looking "far back enough" in the gff3 file to find it, so increasing this parameter can help.

-Colin


On Fri, May 30, 2014 at 10:04 AM, Scott Cain <[hidden email]> wrote:
Hi All,

I'm trying to run flatfile-to-json on a largish GFF3 file (4M lines) and it fails with the error below.  I'm wondering if this is related to running out of memory, since the parent it says it can't find (ID=Gene:WBGene00007597) exists in the file, though the ID that it indicates is causing the problem (ID=Transcript:C15A11.2) does not exist in the file, but Transcript:C15A11.2.2 and Transcript:C15A11.2.1 do exist, so it seems like flatfile-to-json is truncating that ID (though perhaps it does that for display purposes?  Seems like a bad idea if it does).  There are no ### directives in the file (though it would be nice if there were!).  Changing sortMem from 2,000,000,000 to 500,000,000 doesn't make a difference though.

Any ideas?

Thanks,
Scott

Here's the command and output:

bin/flatfile-to-json.pl --gff ../c_elegans_gff/I.c_elegans.PRJNA13758.WS243.annotations.gff3.out.gff3 --out data/c_elegans --type gene:WormBase --trackLabel gene_from_gff --trackType CanvasFeatures --key genes_from_gff --sortMem 2000000000

And I get this:

GFF3 parse error: some features reference other features that do not exist in the file (or in the same '###' scope).  A list of them:

 ID                 |           Cannot Find
----------------------------------------------------------------------
Transcript:C15A11.2 | Parent=Gene:WBGene00007597


--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax



------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax