[Gmod-ajax] JBrowse 1.2 problem generating JSON data from bed files?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Gmod-ajax] JBrowse 1.2 problem generating JSON data from bed files?

Gregg Helt
With the updated JBrowse 1.2, I'm getting "Segmentation fault" errors when trying to process bed files with flatfile-to-json.pl

With minimal parameters I just get the seg fault message:
bin/flatfile-to-json.pl --bed refGene.bed --tracklabel refGene
Segmentation fault

With a fuller set of parameters I'm getting an additional error message:

bin/flatfile-to-json.pl --bed all_mrna.bed --tracklabel all_mrna --key "Drosophila mRNAs" --cssClass transcript --subfeatureClasses '{"CDS": "transcript-CDS", "UTR": "transcript-UTR"}' --arrowheadClass transcript-arrowhead --getLabel --autoComplete label --getSubs

Can't store CODE items at ../../lib/Storable.pm (autosplit into ../../lib/auto/Storable/_store_fd.al) line 304, <GEN1> line 33158, at /Users/gregg/projects/webapollo/jbrowse/bin/../lib/ExternalSorter.pm line 92
Segmentation fault

I tried whittling down the bed file to simplify, but even with a one line bed file I still get the seg fault.   I've attached the one-line bed file, can someone try to reproduce this problem to see if it's specific to my setup or not?

These are standard bed files dowloaded from UCSC.  I'm using a fresh install of BioPerl, so to verify that the problem is specific to the new JBrowse release I went back to an earlier version of JBrowse (from September 2010) and tried processing the same bed files.  The earlier version worked.

When processing normal-sized bed files there's a significant delay (proportional to the size of the bed file) before the seg fault.  During the delay memory usage climbs linearly, and is 10x to 20x greater than the earlier JBrowse version.  I'm not hitting my overall memory limits, but wondering if something is causing a recursion that leads to exceeding my stack size limits?

Gregg

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax

tiny.bed (238 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JBrowse 1.2 problem generating JSON data from bed files?

Brenton Graveley
I am having issues with JBrowse 1.2 as well, though they seem different.  

First, when using bin/flatfile-to-json.pl on a bed file that worked previously I get the following:

bin/flatfile-to-json.pl --bed FlyBase_FBtr_r5_12_annotations.nochr.bed --tracklabel FlyBase_5.12 --key "FlyBase_5.12" --getLabel --autocomplete label --cssclass transcript --subfeatureClasses '{"UTR": "transcript-UTR", "CDS": "transcript-CDS"}' --getSubs --arrowheadClass transcript-arrowhead
Operation "cmp": no method found,
left argument in overloaded package Bio::Annotation::SimpleValue,
right argument in overloaded package Bio::Annotation::SimpleValue at bin/flatfile-to-json.pl line 169, <GEN1> line 22309.

Second, when loading wiggle tracks in using wig-to-json.pl, the process takes very long (4 hours so far) for a wiggle track that took about 20 minutes with the previous version.  

Brent


On Feb 18, 2011, at 11:23 AM, Gregg Helt wrote:

With the updated JBrowse 1.2, I'm getting "Segmentation fault" errors when trying to process bed files with flatfile-to-json.pl

With minimal parameters I just get the seg fault message:
bin/flatfile-to-json.pl --bed refGene.bed --tracklabel refGene
Segmentation fault

With a fuller set of parameters I'm getting an additional error message:

bin/flatfile-to-json.pl --bed all_mrna.bed --tracklabel all_mrna --key "Drosophila mRNAs" --cssClass transcript --subfeatureClasses '{"CDS": "transcript-CDS", "UTR": "transcript-UTR"}' --arrowheadClass transcript-arrowhead --getLabel --autoComplete label --getSubs

Can't store CODE items at ../../lib/Storable.pm (autosplit into ../../lib/auto/Storable/_store_fd.al) line 304, <GEN1> line 33158, at /Users/gregg/projects/webapollo/jbrowse/bin/../lib/ExternalSorter.pm line 92
Segmentation fault

I tried whittling down the bed file to simplify, but even with a one line bed file I still get the seg fault.   I've attached the one-line bed file, can someone try to reproduce this problem to see if it's specific to my setup or not?

These are standard bed files dowloaded from UCSC.  I'm using a fresh install of BioPerl, so to verify that the problem is specific to the new JBrowse release I went back to an earlier version of JBrowse (from September 2010) and tried processing the same bed files.  The earlier version worked.

When processing normal-sized bed files there's a significant delay (proportional to the size of the bed file) before the seg fault.  During the delay memory usage climbs linearly, and is 10x to 20x greater than the earlier JBrowse version.  I'm not hitting my overall memory limits, but wondering if something is causing a recursion that leads to exceeding my stack size limits?

Gregg
<tiny.bed>------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax


------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: JBrowse 1.2 problem generating JSON data from bed files?

Mitch Skinner
Thanks for the reports, these are my top priority right now.  I'll follow up to the list when they're fixed.

Mitch

On 02/18/2011 08:32 AM, Brenton Graveley wrote:
I am having issues with JBrowse 1.2 as well, though they seem different.  

First, when using bin/flatfile-to-json.pl on a bed file that worked previously I get the following:

bin/flatfile-to-json.pl --bed FlyBase_FBtr_r5_12_annotations.nochr.bed --tracklabel FlyBase_5.12 --key "FlyBase_5.12" --getLabel --autocomplete label --cssclass transcript --subfeatureClasses '{"UTR": "transcript-UTR", "CDS": "transcript-CDS"}' --getSubs --arrowheadClass transcript-arrowhead
Operation "cmp": no method found,
left argument in overloaded package Bio::Annotation::SimpleValue,
right argument in overloaded package Bio::Annotation::SimpleValue at bin/flatfile-to-json.pl line 169, <GEN1> line 22309.

Second, when loading wiggle tracks in using wig-to-json.pl, the process takes very long (4 hours so far) for a wiggle track that took about 20 minutes with the previous version.  

Brent


On Feb 18, 2011, at 11:23 AM, Gregg Helt wrote:

With the updated JBrowse 1.2, I'm getting "Segmentation fault" errors when trying to process bed files with flatfile-to-json.pl

With minimal parameters I just get the seg fault message:
bin/flatfile-to-json.pl --bed refGene.bed --tracklabel refGene
Segmentation fault

With a fuller set of parameters I'm getting an additional error message:

bin/flatfile-to-json.pl --bed all_mrna.bed --tracklabel all_mrna --key "Drosophila mRNAs" --cssClass transcript --subfeatureClasses '{"CDS": "transcript-CDS", "UTR": "transcript-UTR"}' --arrowheadClass transcript-arrowhead --getLabel --autoComplete label --getSubs

Can't store CODE items at ../../lib/Storable.pm (autosplit into ../../lib/auto/Storable/_store_fd.al) line 304, <GEN1> line 33158, at /Users/gregg/projects/webapollo/jbrowse/bin/../lib/ExternalSorter.pm line 92
Segmentation fault

I tried whittling down the bed file to simplify, but even with a one line bed file I still get the seg fault.   I've attached the one-line bed file, can someone try to reproduce this problem to see if it's specific to my setup or not?

These are standard bed files dowloaded from UCSC.  I'm using a fresh install of BioPerl, so to verify that the problem is specific to the new JBrowse release I went back to an earlier version of JBrowse (from September 2010) and tried processing the same bed files.  The earlier version worked.

When processing normal-sized bed files there's a significant delay (proportional to the size of the bed file) before the seg fault.  During the delay memory usage climbs linearly, and is 10x to 20x greater than the earlier JBrowse version.  I'm not hitting my overall memory limits, but wondering if something is causing a recursion that leads to exceeding my stack size limits?

Gregg
<tiny.bed>------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax



------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: JBrowse 1.2 problem generating JSON data from bed files?

Mitch Skinner
In reply to this post by Gregg Helt
Okay, I think I've diagnosed both of these things.

They're both triggered, in this case, by BED files that have subfeatures (e.g., exons).  The subfeatures that come out of the BED parser in JBrowse are represented using Bioperl Bio::SeqFeature::Generic objects.  Those objects contain coderefs (in their '_root_cleanup_methods' property).

That has two effects, which correspond to your two error messages below.

The first one is that those coderefs tickle a bug in the Devel::Size CPAN module:

http://codenode.com/2010/04/21/devel-size-coderef-segfault-fix/

That's causing the segfault.

The second effect occurs when the BED file with subfeatures is large enough.  In that case, the external sorting mechanism tries to write those bioperl objects out to disk, and the serialization mechanism that does the writing doesn't like coderefs.

I have some ideas about how to address these two issues, but the right approach is non-obvious IMO.  One possibility is to rewrite JBrowse's BED-handling to avoid bioperl objects.  One advantage of doing that is that it would fix the second issue and also work around the first issue.  That would take a bit of time to do, though.  A quick fix for the first issue is for users to download the Devel::Size CPAN module:

http://search.cpan.org/dist/Devel-Size/

and patch it by hand and install it.  That fix is available now; however, it's more work for users, and it doesn't address the second issue that you reported.

I'm going to look at it some more and see if I can find a quick fix that addresses both of these issues.

Mitch

On 02/18/2011 08:23 AM, Gregg Helt wrote:
With the updated JBrowse 1.2, I'm getting "Segmentation fault" errors when trying to process bed files with flatfile-to-json.pl

With minimal parameters I just get the seg fault message:
bin/flatfile-to-json.pl --bed refGene.bed --tracklabel refGene
Segmentation fault

With a fuller set of parameters I'm getting an additional error message:

bin/flatfile-to-json.pl --bed all_mrna.bed --tracklabel all_mrna --key "Drosophila mRNAs" --cssClass transcript --subfeatureClasses '{"CDS": "transcript-CDS", "UTR": "transcript-UTR"}' --arrowheadClass transcript-arrowhead --getLabel --autoComplete label --getSubs

Can't store CODE items at ../../lib/Storable.pm (autosplit into ../../lib/auto/Storable/_store_fd.al) line 304, <GEN1> line 33158, at /Users/gregg/projects/webapollo/jbrowse/bin/../lib/ExternalSorter.pm line 92
Segmentation fault

I tried whittling down the bed file to simplify, but even with a one line bed file I still get the seg fault.   I've attached the one-line bed file, can someone try to reproduce this problem to see if it's specific to my setup or not?

These are standard bed files dowloaded from UCSC.  I'm using a fresh install of BioPerl, so to verify that the problem is specific to the new JBrowse release I went back to an earlier version of JBrowse (from September 2010) and tried processing the same bed files.  The earlier version worked.

When processing normal-sized bed files there's a significant delay (proportional to the size of the bed file) before the seg fault.  During the delay memory usage climbs linearly, and is 10x to 20x greater than the earlier JBrowse version.  I'm not hitting my overall memory limits, but wondering if something is causing a recursion that leads to exceeding my stack size limits?

Gregg


------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: [apollo-dev] Re: JBrowse 1.2 problem generating JSON data from bed files?

Mitch Skinner
On 02/22/2011 06:02 AM, Chris Childers wrote:
Hey all,

I have also noticed a huge slowdown in the biodb-to-json.pl script with the new release.  This was the same Bio::SeqFeature::Store database that I was using with the older version, and I believe it loaded in less than an hour before.  With this release, the job I started on Feb 18 is still running.

Is it swapping?  How much ram does the machine have, and how long are the refseqs?  Memory usage used to be a function of the number of features, and now it's a function of the size of the largest refseq.  That function could be tweaked, though.

Mitch

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: [apollo-dev] Re: JBrowse 1.2 problem generating JSON data from bed files?

Chris Childers
The loading is not using swap memory.  The process is using ~11.2% of the system resources.  That is up from ~10.5% around nine hours ago.  The refseqs get very large.  Monodelphis has eight chromosomes, and chr1 is massive (~720 Mbp), and the other ref seqs start ~550 Mbp, 500Mbp, ~450Mbp and down from there. 

Chrs

On Wed, Feb 23, 2011 at 2:45 AM, Mitch Skinner <[hidden email]> wrote:
On 02/22/2011 06:02 AM, Chris Childers wrote:
Hey all,

I have also noticed a huge slowdown in the biodb-to-json.pl script with the new release.  This was the same Bio::SeqFeature::Store database that I was using with the older version, and I believe it loaded in less than an hour before.  With this release, the job I started on Feb 18 is still running.

Is it swapping?  How much ram does the machine have, and how long are the refseqs?  Memory usage used to be a function of the number of features, and now it's a function of the size of the largest refseq.  That function could be tweaked, though.

Mitch



--
Chris Childers
Postdoctoral Fellow
Elsik Computational Genomics Laboratory
Georgetown University
Department of Biology
406 Reiss Bldg
Washington, DC 20057
Phone 202-687-5855
Fax 202-687-5662


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax