gbrowse2 and bp_seqfeature_delete.pl slowness

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

gbrowse2 and bp_seqfeature_delete.pl slowness

Nikhil Joshi
Hi all,

I have been having problems trying to get gbrowse2 to run fast.  I
have been trying many different things for a few months now and
nothing seems to help.  I have about 10 million features total amongst
all the tracks on a soybean backbone, however, there are 20
chromosomes.  When I select a region larger than around 1Mbp on a
single chromosome, it takes a very long time to render.  I've upped
the cache time and turned off the hover balloons, but it still is
pretty slow.  Is there something I'm missing here?  I know that
soybase.org is pretty fast.  Do they use some kind of distributed
processing to render their gbrowse?  Is there some other tuning that I
can do?

Also, I am using the bp_seqfeature_delete.pl script to delete data
from the mysql database.  This script seems really slow, and it looks
like it's because it gets the entire query that needs to deleted as
perl objects and then proceeds to call delete on each individually.
This takes an inordinate amount of time... sometimes it just hangs.  I
decided to just delete the features directly from the database, which
is much faster.  But I am worried that since there are no foreign keys
in the database, I am leaving in some feature-linking detritus.  So I
guess my question is:  Is there any way to make the delete script run
faster, and if not, is there a problem with deleting things directly
from the database?

- Nik.

--
Nikhil Joshi
Bioinformatics Programmer
UC Davis Genome Center
University of California, Davis
Davis, CA
http://bioinformatics.ucdavis.edu

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: gbrowse2 and bp_seqfeature_delete.pl slowness

Jason Stajich-4


Nikhil Joshi wrote:

> Hi all,
>
> I have been having problems trying to get gbrowse2 to run fast.  I
> have been trying many different things for a few months now and
> nothing seems to help.  I have about 10 million features total amongst
> all the tracks on a soybean backbone, however, there are 20
> chromosomes.  When I select a region larger than around 1Mbp on a
> single chromosome, it takes a very long time to render.  I've upped
> the cache time and turned off the hover balloons, but it still is
> pretty slow.  Is there something I'm missing here?  I know that
> soybase.org is pretty fast.  Do they use some kind of distributed
> processing to render their gbrowse?  Is there some other tuning that I
> can do?
Do you have semantic zooming turned on? For example, so that it doesn't
try to render a whole gene structure but draws a simple arrow when the
zoom view is greater than 100kb or something?

If you monitor your processes on the server is it spending most of the
time running gbrowse when you do a request? Are you using gbrowse_slave
at all, even on this same machine not with a render farm?

> Also, I am using the bp_seqfeature_delete.pl script to delete data
> from the mysql database.  This script seems really slow, and it looks
> like it's because it gets the entire query that needs to deleted as
> perl objects and then proceeds to call delete on each individually.
> This takes an inordinate amount of time... sometimes it just hangs.  I
> decided to just delete the features directly from the database, which
> is much faster.  But I am worried that since there are no foreign keys
> in the database, I am leaving in some feature-linking detritus.  So I
> guess my question is:  Is there any way to make the delete script run
> faster, and if not, is there a problem with deleting things directly
> from the database?


> - Nik.
>

--
Jason Stajich



------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: gbrowse2 and bp_seqfeature_delete.pl slowness

Keiran Raine
In reply to this post by Nikhil Joshi

On 20 Jan 2011, at 01:46, Nikhil Joshi wrote:

> [snip]
>
> Also, I am using the bp_seqfeature_delete.pl script to delete data
> from the mysql database.  This script seems really slow, and it looks
> like it's because it gets the entire query that needs to deleted as
> perl objects and then proceeds to call delete on each individually.
> This takes an inordinate amount of time... sometimes it just hangs.  I
> decided to just delete the features directly from the database, which
> is much faster.  But I am worried that since there are no foreign keys
> in the database, I am leaving in some feature-linking detritus.  So I
> guess my question is:  Is there any way to make the delete script run
> faster, and if not, is there a problem with deleting things directly
> from the database?

I submitted a patch for bp_seqfeature_delete.pl for just this  
purpose.  I don't think there has been a release of Bio-Perl since  
though:

18a19
 > my $FAST     = 0;
27a29
 >            'fast|f'          => \$FAST,
42a45
 >           -f --fast       Deletes each item instantly not atomic  
for full dataset (mainly for deleting massive datasets linked to a type)
110a114,127
 >   if($FAST) {
 >     my $del = 0;
 >     foreach my $feat(@features) {
 >       my @tmp_feat = ($feat);
 >       my $deleted = $store->delete(@tmp_feat);
 >       $del++ if($deleted);
 >       if ($VERBOSE && $deleted) {
 >         print 'Feature ',$del," successfully deleted.\n";
 >       } elsif (!$deleted) {
 >         die "An error occurred. Some or all of the indicated  
features could not be deleted.";
 >       }
 >     }
 >   }
 >   else {
113c130
< print scalar(@features)," features successfully deleted.\n";
---
 >       print scalar(@features)," features successfully deleted.\n";
115c132
< die "An error occurred. Some or all of the indicated features could  
not be deleted.";
---
 >       die "An error occurred. Some or all of the indicated features  
could not be deleted.";
116a134
 >   }


>
> - Nik.
>
> --
> Nikhil Joshi
> Bioinformatics Programmer
> UC Davis Genome Center
> University of California, Davis
> Davis, CA
> http://bioinformatics.ucdavis.edu
>
> ------------------------------------------------------------------------------
> Protect Your Site and Customers from Malware Attacks
> Learn about various malware tactics and how to avoid them. Understand
> malware threats, the impact they can have on your business, and how  
> you
> can protect your company and customers by using code signing.
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Gmod-gbrowse mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse



--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: gbrowse2 and bp_seqfeature_delete.pl slowness

Lincoln Stein
In reply to this post by Nikhil Joshi
Hi,

There are all sorts of reasons that it might be slow and you need to look at the processes running on the server using "top" to see which it is. There are two general classes of slowness:
  1. The database query is slow. If you are using MySQL, look to see if MySQL is taking up most of the processing time during the slow track rendering step.
  2. The graphical rendering is slow. Look to see if the GBrowse CGI script (or perl) is taking most of the CPU resources.
To solve the (1) class of problems, you can tune MySQL. MySQL ships with default settings that are suitable to the desktop computers of a decade ago with very very limited RAM. Try the following options in /etc/mysql/my.conf:

key_buffer              = 768M
myisam_sort_buffer_size = 64M
max_allowed_packet      = 16M
thread_stack            = 128K
thread_cache_size       = 80
max_connections        = 500
table_cache            = 512
thread_concurrency     = 8

(thread_concurrency should be twice the number of cores in your machine).

If you are not using MySQL, then write back to this mailing list with information on which database adaptor you are using.

Rendering problems can be harder to solve. Typically there are two things you can do to help:
  1. Activate semantic zooming at the point at which rendering gets slow. If the slow track is a gene track, then switch from the "gene" glyph to the "generic" glyph which will avoid trying to draw each exon and intron. Or use "hide = 1" to hide the track completely.
  2. Activate "summary mode" in [TRACK DEFAULTS]: "show summary = 1000000". This will replace the track display with a heatmap showing the distribution of the features. For this to work, the seqfeature database needs to have been loaded with the --summary option. You can also run bp_seqfeature_load.pl with --summary on a previously-created database.
Lincoln

On Wed, Jan 19, 2011 at 8:46 PM, Nikhil Joshi <[hidden email]> wrote:
Hi all,

I have been having problems trying to get gbrowse2 to run fast.  I
have been trying many different things for a few months now and
nothing seems to help.  I have about 10 million features total amongst
all the tracks on a soybean backbone, however, there are 20
chromosomes.  When I select a region larger than around 1Mbp on a
single chromosome, it takes a very long time to render.  I've upped
the cache time and turned off the hover balloons, but it still is
pretty slow.  Is there something I'm missing here?  I know that
soybase.org is pretty fast.  Do they use some kind of distributed
processing to render their gbrowse?  Is there some other tuning that I
can do?

Also, I am using the bp_seqfeature_delete.pl script to delete data
from the mysql database.  This script seems really slow, and it looks
like it's because it gets the entire query that needs to deleted as
perl objects and then proceeds to call delete on each individually.
This takes an inordinate amount of time... sometimes it just hangs.  I
decided to just delete the features directly from the database, which
is much faster.  But I am worried that since there are no foreign keys
in the database, I am leaving in some feature-linking detritus.  So I
guess my question is:  Is there any way to make the delete script run
faster, and if not, is there a problem with deleting things directly
from the database?

- Nik.

--
Nikhil Joshi
Bioinformatics Programmer
UC Davis Genome Center
University of California, Davis
Davis, CA
http://bioinformatics.ucdavis.edu

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: gbrowse2 and bp_seqfeature_delete.pl slowness

Weeks, Nathan
In reply to this post by Nikhil Joshi
Re: [Gmod-gbrowse] gbrowse2 and bp_seqfeature_delete.pl slowness

Hi Nikhil,

We're actually still using GBrowse 1 for the SoyBase
GBrowse until I have time to migrate to GBrowse 2,
so there's no parallel rendering of tracks. I think
the two most important things contributing to the
performance are:

1. The backend MySQL database (MyISAM storage engine)
   is copied to and runs off a tmpfs, so it's entirely
   in-memory, as performance degradation may be noticeable
   even when small amounts of data are read from disk.
2. Semantic zooming is frequently used; e.g., for
   our gene models track, we have three levels:
   the "gene" glyph is used when the user views
   <= 1Mb; the "box" glyph is used when viewing
   a region > 1Mb and <= 2Mb (no need to spend time
   fetching & rendering exons when they aren't
   distinguishable at that zoom level anyway), and
   a gene density view when the user zooms out beyond
   2 Mb ("heat_map" glyph with a separately-generated
   track, though after migrating to GBrowse 2, we'll
   be able to use "summary mode").

--
Nathan Weeks
USDA-ARS
SoyBase http://soybase.org


Date: Wed, 19 Jan 2011 17:46:30 -0800
From: Nikhil Joshi <[hidden email]>
Subject: [Gmod-gbrowse] gbrowse2 and bp_seqfeature_delete.pl slowness
To: [hidden email]
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=ISO-8859-1

Hi all,

I have been having problems trying to get gbrowse2 to run fast.  I
have been trying many different things for a few months now and
nothing seems to help.  I have about 10 million features total amongst
all the tracks on a soybean backbone, however, there are 20
chromosomes.  When I select a region larger than around 1Mbp on a
single chromosome, it takes a very long time to render.  I've upped
the cache time and turned off the hover balloons, but it still is
pretty slow.  Is there something I'm missing here?  I know that
soybase.org is pretty fast.  Do they use some kind of distributed
processing to render their gbrowse?  Is there some other tuning that I
can do?

Also, I am using the bp_seqfeature_delete.pl script to delete data
from the mysql database.  This script seems really slow, and it looks
like it's because it gets the entire query that needs to deleted as
perl objects and then proceeds to call delete on each individually.
This takes an inordinate amount of time... sometimes it just hangs.  I
decided to just delete the features directly from the database, which
is much faster.  But I am worried that since there are no foreign keys
in the database, I am leaving in some feature-linking detritus.  So I
guess my question is:  Is there any way to make the delete script run
faster, and if not, is there a problem with deleting things directly
from the database?

- Nik.

--
Nikhil Joshi
Bioinformatics Programmer
UC Davis Genome Center
University of California, Davis
Davis, CA
http://bioinformatics.ucdavis.edu








------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: gbrowse2 and bp_seqfeature_delete.pl slowness

Nikhil Joshi
Hi all,

Thanks for all the replies.  I have tried tuning mysql as per one of
the replies and I have also put in the semantic zooming.  However,
neither of these options have significantly increased the speed of
gbrowse.  I have not yet tried putting them mysql database in a tmpfs,
but that seems to me to be a drastic option.  We are running the mysql
on a virtual machine, but I don't see why that should make a big
difference.  Can anyone think of any other things I could try?  If I
can't get this to work properly I may have to switch to the UCSC
genome browser.

- Nik.

On Thu, Jan 20, 2011 at 10:04 AM, Weeks, Nathan
<[hidden email]> wrote:

> Hi Nikhil,
>
> We're actually still using GBrowse 1 for the SoyBase
> GBrowse until I have time to migrate to GBrowse 2,
> so there's no parallel rendering of tracks. I think
> the two most important things contributing to the
> performance are:
>
> 1. The backend MySQL database (MyISAM storage engine)
>    is copied to and runs off a tmpfs, so it's entirely
>    in-memory, as performance degradation may be noticeable
>    even when small amounts of data are read from disk.
> 2. Semantic zooming is frequently used; e.g., for
>    our gene models track, we have three levels:
>    the "gene" glyph is used when the user views
>    <= 1Mb; the "box" glyph is used when viewing
>    a region > 1Mb and <= 2Mb (no need to spend time
>    fetching & rendering exons when they aren't
>    distinguishable at that zoom level anyway), and
>    a gene density view when the user zooms out beyond
>    2 Mb ("heat_map" glyph with a separately-generated
>    track, though after migrating to GBrowse 2, we'll
>    be able to use "summary mode").
>
> --
> Nathan Weeks
> USDA-ARS
> SoyBase http://soybase.org
>
>
> Date: Wed, 19 Jan 2011 17:46:30 -0800
> From: Nikhil Joshi <[hidden email]>
> Subject: [Gmod-gbrowse] gbrowse2 and bp_seqfeature_delete.pl slowness
> To: [hidden email]
> Message-ID:
>         <[hidden email]>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi all,
>
> I have been having problems trying to get gbrowse2 to run fast.  I
> have been trying many different things for a few months now and
> nothing seems to help.  I have about 10 million features total amongst
> all the tracks on a soybean backbone, however, there are 20
> chromosomes.  When I select a region larger than around 1Mbp on a
> single chromosome, it takes a very long time to render.  I've upped
> the cache time and turned off the hover balloons, but it still is
> pretty slow.  Is there something I'm missing here?  I know that
> soybase.org is pretty fast.  Do they use some kind of distributed
> processing to render their gbrowse?  Is there some other tuning that I
> can do?
>
> Also, I am using the bp_seqfeature_delete.pl script to delete data
> from the mysql database.  This script seems really slow, and it looks
> like it's because it gets the entire query that needs to deleted as
> perl objects and then proceeds to call delete on each individually.
> This takes an inordinate amount of time... sometimes it just hangs.  I
> decided to just delete the features directly from the database, which
> is much faster.  But I am worried that since there are no foreign keys
> in the database, I am leaving in some feature-linking detritus.  So I
> guess my question is:  Is there any way to make the delete script run
> faster, and if not, is there a problem with deleting things directly
> from the database?
>
> - Nik.
>
> --
> Nikhil Joshi
> Bioinformatics Programmer
> UC Davis Genome Center
> University of California, Davis
> Davis, CA
> http://bioinformatics.ucdavis.edu
>
>
>
>
>
>
>
>



--
Nikhil Joshi
Bioinformatics Programmer
UC Davis Genome Center
University of California, Davis
Davis, CA
http://bioinformatics.ucdavis.edu

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse