sleeping mysql processes w/Bio::DB::SeqFeature::Store

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: sleeping mysql processes w/Bio::DB::SeqFeature::Store

Lincoln Stein
Hi Folks,

With respect to the sleeping mysql process issue, here is what I observe in testing with the current development version. The scenarios described were tested with both gbrowse and gbrowse_img (gbrowse_details testing is pending).
  1. When run as a CGI script, no sleeping mysql processes are left open after gbrowse finishes its thing.
  2. When run as a FastCGI script (either mod_fcgid or mod_fastcgi), running under the threaded version of Apache creates one persistent sleeping mysql process that is reused with each subsequent request. Other processes are transiently created and destroyed while actively rendering tracks.
  3. When run as a FastCGI script with the prefork version of Apache, then there will be one persistent sleeping mysql per preforked Apache. In addition, under high load, more mysql processes are created transiently during active rendering.
I could not recreate a scenario of hundreds of sleeping mysql processes. My understanding of how mysql works is that if there are hundreds of sleeping mysql processes, then there must be an equal number of waiting Apache or GBrowse processes. Can someone who is having the problem please try a "ps auxwww | grep -i gbrowse" on their server machine at the same time that mysql processlist is indicating lots of sleeping processes?

Finally, although this was not the question that initiated the thread, in my testing it looks like highest performance is obtained using mod_fastcgi and the preforked version of Apache.

Lincoln


On Wed, Jul 10, 2013 at 10:58 AM, Lincoln Stein <[hidden email]> wrote:
I have just uploaded GBrowse version 2.55 to CPAN. It contains a bioperl patch that fixes slow loading of multiple GFF3 files, and changes the timeout values for mod_fastcgi and mod_fcgid. I have also noticed that mod_fastcgi now seems to be faster than mod_fcgid when the timeout issue is fixed.

I don't think that the FastCGI issues are related to the sleeping mysql process issue, actually, because this was first reported in the context of running GBrowse in CGI mode. I am starting work on CGI mode now.

Lincoln


On Tue, Jul 9, 2013 at 6:49 PM, Martin Mokrejs <[hidden email]> wrote:
Hi,

Lincoln Stein wrote:
> Hi Folks,
>
> With respect to the FastCGI errors, I have tracked the problem down to the following sequence of events:
>
>  1. The FastCGI executive module launches a fresh gbrowse instance.
>  2. GBrowse tries to load its default database into memory.
>  3. If the default database takes more than 3s to load, then FastCGI times it out.
>  4. FastCGI launches a new instance of GBrowse.
>  5. GBrowse tries to load its default database into memory.
>  6. FastCGI times the new instance out.
>  7. Repeat 4-6 indefinitely.
>
> Note that this only happens for databases that are slow to load, typically in-memory databases. For example, the full tutorial database takes ~8s to load on my machine (unreasonably slow for reasons that are unclear to me: I am going to start debugging bioperl next). Once the database is loaded, however, all subsequent accesses are fast.
>
> Here's what to do to fix the problem:
>
>   * for mod_fcgid, add the following directive to /etc/apache2/conf.d/gbrowse2 (in the same section as *FcgidIOTimeout*)
>       o   *FcgidConnectTimeout 30*
>   * for mod_fastcgi, change *FastCGIConfig* to
>       o *FastCGIConfig *-startDelay 30 -appConnTimeout 30 -idle-timeout 600 -maxClassProcesses 20  -initial-env GBROWSE_CONF=/etc/gbrowse2

Hmm, I would call this a workaround instead. Why isn't there something like a lock
in Gbrowse so that multiple connections to mysql wouldn't be established? This
will definitely trick again some user or sysadmin. At least if gbrowse would test
for a timeout value and complain and exit if it would be too short.

In my eyes gbrowse is full of such tricks and reading this lists only ensures me that
the code should be more careful about what user has configured, what is and what is not
available, and ... provide helpful error messages. I don't have the time to contribute
some code in this regard but anything leading to perl unitialized values should be
fixed. At least, I could fish some emails from the archives and emphasize the worst
examples. Or the other way around, jsut take a working setup and start to screw file/dir
permission, place bad filenames/dbnames/dirnames into config, set screw variable names.
It will all generate all kinds of funny messages to apache logs and that should fairly
doable to put a couple of roadblocks along the way.

This would only be helpful to everybody asking here on this list for help with some weird
messages in apache logs.

Just my 2cents, ;-)
Martin



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
<a href="tel:416%20673-8514" value="+14166738514" target="_blank">416 673-8514
Assistant: Renata Musa <[hidden email]>



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: sleeping mysql processes w/Bio::DB::SeqFeature::Store

Martin Mokrejs
Hi Lincoln,
  thank your work your answers and thorough work on this. I would still advice everybody facing those
many idle mysql connections to enable mysql logging (either slow plaintext mysql.log or binary logging
file). I believe that from the series of SQL commands which were executed through the later on "idle"
connection one could deduce what routine in gbrowse opened the connection and maybe even why it was
NOT closed. Or at least, it would clearly show that Apache is at fault.

Martin

Lincoln Stein wrote:

> Hi Folks,
>
> With respect to the sleeping mysql process issue, here is what I observe in testing with the current development version. The scenarios described were tested with both gbrowse and gbrowse_img (gbrowse_details testing is pending).
>
>  1. When run as a CGI script, no sleeping mysql processes are left open after gbrowse finishes its thing.
>  2. When run as a FastCGI script (either mod_fcgid or mod_fastcgi), running under the threaded version of Apache creates one persistent sleeping mysql process that is reused with each subsequent request. Other processes are transiently created and destroyed while actively rendering tracks.
>  3. When run as a FastCGI script with the prefork version of Apache, then there will be one persistent sleeping mysql per preforked Apache. In addition, under high load, more mysql processes are created transiently during active rendering.
>
> I could not recreate a scenario of hundreds of sleeping mysql processes. My understanding of how mysql works is that if there are hundreds of sleeping mysql processes, then there must be an equal number of waiting Apache or GBrowse processes. Can someone who is having the problem please try a "ps auxwww | grep -i gbrowse" on their server machine at the same time that mysql processlist is indicating lots of sleeping processes?
>
> Finally, although this was not the question that initiated the thread, in my testing it looks like highest performance is obtained using mod_fastcgi and the preforked version of Apache.
>
> Lincoln
>
>
> On Wed, Jul 10, 2013 at 10:58 AM, Lincoln Stein <[hidden email] <mailto:[hidden email]>> wrote:
>
>     I have just uploaded GBrowse version 2.55 to CPAN. It contains a bioperl patch that fixes slow loading of multiple GFF3 files, and changes the timeout values for mod_fastcgi and mod_fcgid. I have also noticed that mod_fastcgi now seems to be faster than mod_fcgid when the timeout issue is fixed.
>
>     I don't think that the FastCGI issues are related to the sleeping mysql process issue, actually, because this was first reported in the context of running GBrowse in CGI mode. I am starting work on CGI mode now.
>
>     Lincoln
>
>
>     On Tue, Jul 9, 2013 at 6:49 PM, Martin Mokrejs <[hidden email] <mailto:[hidden email]>> wrote:
>
>         Hi,
>
>         Lincoln Stein wrote:
>         > Hi Folks,
>         >
>         > With respect to the FastCGI errors, I have tracked the problem down to the following sequence of events:
>         >
>         >  1. The FastCGI executive module launches a fresh gbrowse instance.
>         >  2. GBrowse tries to load its default database into memory.
>         >  3. If the default database takes more than 3s to load, then FastCGI times it out.
>         >  4. FastCGI launches a new instance of GBrowse.
>         >  5. GBrowse tries to load its default database into memory.
>         >  6. FastCGI times the new instance out.
>         >  7. Repeat 4-6 indefinitely.
>         >
>         > Note that this only happens for databases that are slow to load, typically in-memory databases. For example, the full tutorial database takes ~8s to load on my machine (unreasonably slow for reasons that are unclear to me: I am going to start debugging bioperl next). Once the database is loaded, however, all subsequent accesses are fast.
>         >
>         > Here's what to do to fix the problem:
>         >
>         >   * for mod_fcgid, add the following directive to /etc/apache2/conf.d/gbrowse2 (in the same section as *FcgidIOTimeout*)
>         >       o   *FcgidConnectTimeout 30*
>         >   * for mod_fastcgi, change *FastCGIConfig* to
>         >       o *FastCGIConfig *-startDelay 30 -appConnTimeout 30 -idle-timeout 600 -maxClassProcesses 20  -initial-env GBROWSE_CONF=/etc/gbrowse2
>
>         Hmm, I would call this a workaround instead. Why isn't there something like a lock
>         in Gbrowse so that multiple connections to mysql wouldn't be established? This
>         will definitely trick again some user or sysadmin. At least if gbrowse would test
>         for a timeout value and complain and exit if it would be too short.
>
>         In my eyes gbrowse is full of such tricks and reading this lists only ensures me that
>         the code should be more careful about what user has configured, what is and what is not
>         available, and ... provide helpful error messages. I don't have the time to contribute
>         some code in this regard but anything leading to perl unitialized values should be
>         fixed. At least, I could fish some emails from the archives and emphasize the worst
>         examples. Or the other way around, jsut take a working setup and start to screw file/dir
>         permission, place bad filenames/dbnames/dirnames into config, set screw variable names.
>         It will all generate all kinds of funny messages to apache logs and that should fairly
>         doable to put a couple of roadblocks along the way.
>
>         This would only be helpful to everybody asking here on this list for help with some weird
>         messages in apache logs.
>
>         Just my 2cents, ;-)
>         Martin
>
>
>
>
>     --
>     Lincoln D. Stein
>     Director, Informatics and Biocomputing Platform
>     Ontario Institute for Cancer Research
>     101 College St., Suite 800
>     Toronto, ON, Canada M5G0A3
>     416 673-8514 <tel:416%20673-8514>
>     Assistant: Renata Musa <[hidden email] <mailto:[hidden email]>>
>
>
>
>
> --
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <[hidden email] <mailto:[hidden email]>>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: sleeping mysql processes w/Bio::DB::SeqFeature::Store

Lincoln Stein
By design, GBrowse does not close mysql connections but keeps them open between requests. For any GBrowse process, there may be as many mysql connections as there are mysql databases associated with the data source. This avoids the performance hit of reopening the mysql connection each time a track needs to be rendered.

I can change this behavior if it is causing problems. I have not actually tested the latter to measure its impact on performance.

Lincoln


On Thu, Jul 11, 2013 at 10:50 AM, Martin Mokrejs <[hidden email]> wrote:
Hi Lincoln,
  thank your work your answers and thorough work on this. I would still advice everybody facing those
many idle mysql connections to enable mysql logging (either slow plaintext mysql.log or binary logging
file). I believe that from the series of SQL commands which were executed through the later on "idle"
connection one could deduce what routine in gbrowse opened the connection and maybe even why it was
NOT closed. Or at least, it would clearly show that Apache is at fault.

Martin

Lincoln Stein wrote:
> Hi Folks,
>
> With respect to the sleeping mysql process issue, here is what I observe in testing with the current development version. The scenarios described were tested with both gbrowse and gbrowse_img (gbrowse_details testing is pending).
>
>  1. When run as a CGI script, no sleeping mysql processes are left open after gbrowse finishes its thing.
>  2. When run as a FastCGI script (either mod_fcgid or mod_fastcgi), running under the threaded version of Apache creates one persistent sleeping mysql process that is reused with each subsequent request. Other processes are transiently created and destroyed while actively rendering tracks.
>  3. When run as a FastCGI script with the prefork version of Apache, then there will be one persistent sleeping mysql per preforked Apache. In addition, under high load, more mysql processes are created transiently during active rendering.
>
> I could not recreate a scenario of hundreds of sleeping mysql processes. My understanding of how mysql works is that if there are hundreds of sleeping mysql processes, then there must be an equal number of waiting Apache or GBrowse processes. Can someone who is having the problem please try a "ps auxwww | grep -i gbrowse" on their server machine at the same time that mysql processlist is indicating lots of sleeping processes?
>
> Finally, although this was not the question that initiated the thread, in my testing it looks like highest performance is obtained using mod_fastcgi and the preforked version of Apache.
>
> Lincoln
>
>
> On Wed, Jul 10, 2013 at 10:58 AM, Lincoln Stein <[hidden email] <mailto:[hidden email]>> wrote:
>
>     I have just uploaded GBrowse version 2.55 to CPAN. It contains a bioperl patch that fixes slow loading of multiple GFF3 files, and changes the timeout values for mod_fastcgi and mod_fcgid. I have also noticed that mod_fastcgi now seems to be faster than mod_fcgid when the timeout issue is fixed.
>
>     I don't think that the FastCGI issues are related to the sleeping mysql process issue, actually, because this was first reported in the context of running GBrowse in CGI mode. I am starting work on CGI mode now.
>
>     Lincoln
>
>
>     On Tue, Jul 9, 2013 at 6:49 PM, Martin Mokrejs <[hidden email] <mailto:[hidden email]>> wrote:
>
>         Hi,
>
>         Lincoln Stein wrote:
>         > Hi Folks,
>         >
>         > With respect to the FastCGI errors, I have tracked the problem down to the following sequence of events:
>         >
>         >  1. The FastCGI executive module launches a fresh gbrowse instance.
>         >  2. GBrowse tries to load its default database into memory.
>         >  3. If the default database takes more than 3s to load, then FastCGI times it out.
>         >  4. FastCGI launches a new instance of GBrowse.
>         >  5. GBrowse tries to load its default database into memory.
>         >  6. FastCGI times the new instance out.
>         >  7. Repeat 4-6 indefinitely.
>         >
>         > Note that this only happens for databases that are slow to load, typically in-memory databases. For example, the full tutorial database takes ~8s to load on my machine (unreasonably slow for reasons that are unclear to me: I am going to start debugging bioperl next). Once the database is loaded, however, all subsequent accesses are fast.
>         >
>         > Here's what to do to fix the problem:
>         >
>         >   * for mod_fcgid, add the following directive to /etc/apache2/conf.d/gbrowse2 (in the same section as *FcgidIOTimeout*)
>         >       o   *FcgidConnectTimeout 30*
>         >   * for mod_fastcgi, change *FastCGIConfig* to
>         >       o *FastCGIConfig *-startDelay 30 -appConnTimeout 30 -idle-timeout 600 -maxClassProcesses 20  -initial-env GBROWSE_CONF=/etc/gbrowse2
>
>         Hmm, I would call this a workaround instead. Why isn't there something like a lock
>         in Gbrowse so that multiple connections to mysql wouldn't be established? This
>         will definitely trick again some user or sysadmin. At least if gbrowse would test
>         for a timeout value and complain and exit if it would be too short.
>
>         In my eyes gbrowse is full of such tricks and reading this lists only ensures me that
>         the code should be more careful about what user has configured, what is and what is not
>         available, and ... provide helpful error messages. I don't have the time to contribute
>         some code in this regard but anything leading to perl unitialized values should be
>         fixed. At least, I could fish some emails from the archives and emphasize the worst
>         examples. Or the other way around, jsut take a working setup and start to screw file/dir
>         permission, place bad filenames/dbnames/dirnames into config, set screw variable names.
>         It will all generate all kinds of funny messages to apache logs and that should fairly
>         doable to put a couple of roadblocks along the way.
>
>         This would only be helpful to everybody asking here on this list for help with some weird
>         messages in apache logs.
>
>         Just my 2cents, ;-)
>         Martin
>
>
>
>
>     --
>     Lincoln D. Stein
>     Director, Informatics and Biocomputing Platform
>     Ontario Institute for Cancer Research
>     101 College St., Suite 800
>     Toronto, ON, Canada M5G0A3
>     <a href="tel:416%20673-8514" value="+14166738514">416 673-8514 <tel:416%20673-8514>
>     Assistant: Renata Musa <[hidden email] <mailto:[hidden email]>>
>
>
>
>
> --
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> <a href="tel:416%20673-8514" value="+14166738514">416 673-8514
> Assistant: Renata Musa <[hidden email] <mailto:[hidden email]>>



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: sleeping mysql processes w/Bio::DB::SeqFeature::Store

Martin Mokrejs
Lincoln Stein wrote:
> By design, GBrowse does not close mysql connections but keeps them
> open between requests. For any GBrowse process, there may be as many
> mysql connections as there are mysql databases associated with the

Sure, connects to databases are expensive but I think one GB process
should re-use just a one, single connection. Mysql has a per-client
cache, buffers and this is I think causing too much unnecessary memory
pressure on the mysql/kernel. I would rather leave mysql have greater
query and result caches while having same overall memory footprint.

I haven't studied this, in theory it could be that mysql flushes/trashes
client caches when client switches between two different databases ...
("USE FOO;"). If that is true maybe this could turn to be a real problem
to flip back and forth between databases. Or maybe not, the client cache
is just to buffer data while sending to client ... which in case of
localhost installations is quick. So is probably empty most of the time
and easy to drop and recreate.

> data source. This avoids the performance hit of reopening the mysql
> connection each time a track needs to be rendered.

Yes, new connects to mysql are slow to establish but one could be enough.

> I can change this behavior if it is causing problems. I have not
> actually tested the latter to measure its impact on performance.

If it is relatively easy to implement then I would propose to try re-using
a single connection to mysql (each GB instance would have just one DB connection
to the same host:port). I think this will be helpful to at least small
servers which cannot handle too my clients simultaneously. With current
approach people can run out of the number of clients/databases open
too quickly. Sounds like having 20 genome databases with 10 tracks each
is asking for 200 connections if each genome db is accessed by just a single
user. I would expect more remote users ... I think this is a lot. Each
mysql process can serve hundreds to thousands of clients.

These are just untested ideas but I bet this will help.
Martin

>
> Lincoln
>
>
> On Thu, Jul 11, 2013 at 10:50 AM, Martin Mokrejs <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hi Lincoln,
>       thank your work your answers and thorough work on this. I would still advice everybody facing those
>     many idle mysql connections to enable mysql logging (either slow plaintext mysql.log or binary logging
>     file). I believe that from the series of SQL commands which were executed through the later on "idle"
>     connection one could deduce what routine in gbrowse opened the connection and maybe even why it was
>     NOT closed. Or at least, it would clearly show that Apache is at fault.
>
>     Martin
>
>     Lincoln Stein wrote:
>     > Hi Folks,
>     >
>     > With respect to the sleeping mysql process issue, here is what I observe in testing with the current development version. The scenarios described were tested with both gbrowse and gbrowse_img (gbrowse_details testing is pending).
>     >
>     >  1. When run as a CGI script, no sleeping mysql processes are left open after gbrowse finishes its thing.
>     >  2. When run as a FastCGI script (either mod_fcgid or mod_fastcgi), running under the threaded version of Apache creates one persistent sleeping mysql process that is reused with each subsequent request. Other processes are transiently created and destroyed while actively rendering tracks.
>     >  3. When run as a FastCGI script with the prefork version of Apache, then there will be one persistent sleeping mysql per preforked Apache. In addition, under high load, more mysql processes are created transiently during active rendering.
>     >
>     > I could not recreate a scenario of hundreds of sleeping mysql processes. My understanding of how mysql works is that if there are hundreds of sleeping mysql processes, then there must be an equal number of waiting Apache or GBrowse processes. Can someone who is having the problem please try a "ps auxwww | grep -i gbrowse" on their server machine at the same time that mysql processlist is indicating lots of sleeping processes?
>     >
>     > Finally, although this was not the question that initiated the thread, in my testing it looks like highest performance is obtained using mod_fastcgi and the preforked version of Apache.
>     >
>     > Lincoln
>     >
>     >
>     > On Wed, Jul 10, 2013 at 10:58 AM, Lincoln Stein <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>     >
>     >     I have just uploaded GBrowse version 2.55 to CPAN. It contains a bioperl patch that fixes slow loading of multiple GFF3 files, and changes the timeout values for mod_fastcgi and mod_fcgid. I have also noticed that mod_fastcgi now seems to be faster than mod_fcgid when the timeout issue is fixed.
>     >
>     >     I don't think that the FastCGI issues are related to the sleeping mysql process issue, actually, because this was first reported in the context of running GBrowse in CGI mode. I am starting work on CGI mode now.
>     >
>     >     Lincoln
>     >
>     >
>     >     On Tue, Jul 9, 2013 at 6:49 PM, Martin Mokrejs <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>     >
>     >         Hi,
>     >
>     >         Lincoln Stein wrote:
>     >         > Hi Folks,
>     >         >
>     >         > With respect to the FastCGI errors, I have tracked the problem down to the following sequence of events:
>     >         >
>     >         >  1. The FastCGI executive module launches a fresh gbrowse instance.
>     >         >  2. GBrowse tries to load its default database into memory.
>     >         >  3. If the default database takes more than 3s to load, then FastCGI times it out.
>     >         >  4. FastCGI launches a new instance of GBrowse.
>     >         >  5. GBrowse tries to load its default database into memory.
>     >         >  6. FastCGI times the new instance out.
>     >         >  7. Repeat 4-6 indefinitely.
>     >         >
>     >         > Note that this only happens for databases that are slow to load, typically in-memory databases. For example, the full tutorial database takes ~8s to load on my machine (unreasonably slow for reasons that are unclear to me: I am going to start debugging bioperl next). Once the database is loaded, however, all subsequent accesses are fast.
>     >         >
>     >         > Here's what to do to fix the problem:
>     >         >
>     >         >   * for mod_fcgid, add the following directive to /etc/apache2/conf.d/gbrowse2 (in the same section as *FcgidIOTimeout*)
>     >         >       o   *FcgidConnectTimeout 30*
>     >         >   * for mod_fastcgi, change *FastCGIConfig* to
>     >         >       o *FastCGIConfig *-startDelay 30 -appConnTimeout 30 -idle-timeout 600 -maxClassProcesses 20  -initial-env GBROWSE_CONF=/etc/gbrowse2
>     >
>     >         Hmm, I would call this a workaround instead. Why isn't there something like a lock
>     >         in Gbrowse so that multiple connections to mysql wouldn't be established? This
>     >         will definitely trick again some user or sysadmin. At least if gbrowse would test
>     >         for a timeout value and complain and exit if it would be too short.
>     >
>     >         In my eyes gbrowse is full of such tricks and reading this lists only ensures me that
>     >         the code should be more careful about what user has configured, what is and what is not
>     >         available, and ... provide helpful error messages. I don't have the time to contribute
>     >         some code in this regard but anything leading to perl unitialized values should be
>     >         fixed. At least, I could fish some emails from the archives and emphasize the worst
>     >         examples. Or the other way around, jsut take a working setup and start to screw file/dir
>     >         permission, place bad filenames/dbnames/dirnames into config, set screw variable names.
>     >         It will all generate all kinds of funny messages to apache logs and that should fairly
>     >         doable to put a couple of roadblocks along the way.
>     >
>     >         This would only be helpful to everybody asking here on this list for help with some weird
>     >         messages in apache logs.
>     >
>     >         Just my 2cents, ;-)
>     >         Martin
>     >
>     >
>     >
>     >
>     >     --
>     >     Lincoln D. Stein
>     >     Director, Informatics and Biocomputing Platform
>     >     Ontario Institute for Cancer Research
>     >     101 College St., Suite 800
>     >     Toronto, ON, Canada M5G0A3
>     >     416 673-8514 <tel:416%20673-8514> <tel:416%20673-8514>
>     >     Assistant: Renata Musa <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>     >
>     >
>     >
>     >
>     > --
>     > Lincoln D. Stein
>     > Director, Informatics and Biocomputing Platform
>     > Ontario Institute for Cancer Research
>     > 101 College St., Suite 800
>     > Toronto, ON, Canada M5G0A3
>     > 416 673-8514 <tel:416%20673-8514>
>     > Assistant: Renata Musa <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>
>
>
>
> --
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <[hidden email] <mailto:[hidden email]>>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: sleeping mysql processes w/Bio::DB::SeqFeature::Store

Walter, Mathias
Hi Martin,

2013/7/12 Martin Mokrejs <[hidden email]>:
> Hi Walter,

My first name is Mathias, as you can see in my tiny signature. ;-)

>   I think you have quite good points. Did you intentionally drop the
> gbrowse mailing list from recipients? ;-)

No, I didn't. I just did not learn to always hit 'Reply all' (opposed
to the Netiquette) in case of gmod posts. Opposed to any other mailing
list I know, the returning address is the author of the post rather
than the mailing list itself. That is still confusing to me.
Interestingly, I read comments regarding to the exclusion/inclusion of
the mailing list once or twice a week. I already had a discussion with
Scott Cain about this point. Anyway ...

> That is a good point, provided gbrowse is parallelized internally and queries
> are really taking long time.

There is at least one long runing query per track which subsequently
results in as many short running queries as elements are in the
queried region. BTW: This could be improved in the future.

And even a batch of short running queries could be distributed across
connections of a connection pool rather than executing them one by
one.

> My suspicion is that perl processing takes at
> least 2-3x longer than is the time spent by mysql, but I did not measure this.

Which is indeed an indicator to parallize GBrowse internally - which
requires a thread pool. ;-)

BTW: I've developed my own perl DBI MySQL adapter to access my
database schema (of quite a lot of databases [> 2000]) and Ive found
some bottlenecks.

--
Regards,
Mathias

>
> Walter, Mathias wrote:
>> Hi Martin,
>>
>> I think we talk about the same. I propose that one GBrowse process
>> should have as many connections to one host:port as it needs to
>> request as much as possible queries in parallel. Moreover, if a load
>
> But could same gbrowse instance really process multiple data streams in
> parallel, effectively?
>
>> balancer is used to distribute the queries to more than one MySQL node
>> transparently, you would have much less benefit if you use just one
>> single connection.
>
> That is a good point, provided gbrowse is parallelized internally and queries
> are really taking long time. My suspicion is that perl processing takes at
> least 2-3x longer than is the time spent by mysql, but I did not measure this.
> I could be well wrong here.
>
>>
>> Maybe I have to mention that my opinion/experience is based on the
>> assumption that one GBrowse instance is used to process the rendering
>> of a particular region to view and each track in this region is
>> rendered in parallel. At least that would make sense to me.
>
> I don't know how this is done internally in Gbrowse.
>
>>
>>
>> BTW: Having more than one MySQL process running at the same machine
>> but different ports is IMHO not a good idea (duplicated memory
>> footprint, more than one not fully filled buffer pool, etc.).
>
> I agree but maybe there are people having some small dbs on other ports.
> I just wanted to propose something rather general.
>
> Martin
>
>>
>> --
>> Regards,
>> Mathias
>>
>> 2013/7/12 Martin Mokrejs <[hidden email]>:
>>> Hi Walter,
>>>   do we talk about the same? I proposed that one gbrowse process has just one
>>> mysql connection to one host:port. So if you have databases on several machines
>>> or on same machine but different ports, there would be more database connections.
>>> I will probably go and lookup how many MB each client connection takes on the server
>>> side. This is not just a question of memory but also of switching between the
>>> processes/contexts/CPUs. That takes more overhead than just eating "some" memory, really.
>>> Martin
>>>
>>>
>>> Walter, Mathias wrote:
>>>> Hi Martin,
>>>>
>>>> I absolutely dislike the idea sharing just one single connection for
>>>> GBrowse because it avoids parallelism. Even if GBrowse do not support
>>>> parallel database requests yet (I don't know it), one single
>>>> connection would prevent this for the future.
>>>>
>>>> BTW: Nowadays (with common database server hardware), its is no issue
>>>> anymore, having 800 or thousands of simultaneous active connections
>>>> and hundreds of them sleeping.
>>>>
>>>> --
>>>> Kind regards,
>>>> Mathias
>>>>
>>>> 2013/7/11 Martin Mokrejs <[hidden email]>:
>>>>> Lincoln Stein wrote:
>>>>>> By design, GBrowse does not close mysql connections but keeps them
>>>>>> open between requests. For any GBrowse process, there may be as many
>>>>>> mysql connections as there are mysql databases associated with the
>>>>>
>>>>> Sure, connects to databases are expensive but I think one GB process
>>>>> should re-use just a one, single connection. Mysql has a per-client
>>>>> cache, buffers and this is I think causing too much unnecessary memory
>>>>> pressure on the mysql/kernel. I would rather leave mysql have greater
>>>>> query and result caches while having same overall memory footprint.
>>>>>
>>>>> I haven't studied this, in theory it could be that mysql flushes/trashes
>>>>> client caches when client switches between two different databases ...
>>>>> ("USE FOO;"). If that is true maybe this could turn to be a real problem
>>>>> to flip back and forth between databases. Or maybe not, the client cache
>>>>> is just to buffer data while sending to client ... which in case of
>>>>> localhost installations is quick. So is probably empty most of the time
>>>>> and easy to drop and recreate.
>>>>>
>>>>>> data source. This avoids the performance hit of reopening the mysql
>>>>>> connection each time a track needs to be rendered.
>>>>>
>>>>> Yes, new connects to mysql are slow to establish but one could be enough.
>>>>>
>>>>>> I can change this behavior if it is causing problems. I have not
>>>>>> actually tested the latter to measure its impact on performance.
>>>>>
>>>>> If it is relatively easy to implement then I would propose to try re-using
>>>>> a single connection to mysql (each GB instance would have just one DB connection
>>>>> to the same host:port). I think this will be helpful to at least small
>>>>> servers which cannot handle too my clients simultaneously. With current
>>>>> approach people can run out of the number of clients/databases open
>>>>> too quickly. Sounds like having 20 genome databases with 10 tracks each
>>>>> is asking for 200 connections if each genome db is accessed by just a single
>>>>> user. I would expect more remote users ... I think this is a lot. Each
>>>>> mysql process can serve hundreds to thousands of clients.
>>>>>
>>>>> These are just untested ideas but I bet this will help.
>>>>> Martin
>>>>>
>>>>>>
>>>>>> Lincoln
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 11, 2013 at 10:50 AM, Martin Mokrejs <[hidden email] <mailto:[hidden email]>> wrote:
>>>>>>
>>>>>>     Hi Lincoln,
>>>>>>       thank your work your answers and thorough work on this. I would still advice everybody facing those
>>>>>>     many idle mysql connections to enable mysql logging (either slow plaintext mysql.log or binary logging
>>>>>>     file). I believe that from the series of SQL commands which were executed through the later on "idle"
>>>>>>     connection one could deduce what routine in gbrowse opened the connection and maybe even why it was
>>>>>>     NOT closed. Or at least, it would clearly show that Apache is at fault.
>>>>>>
>>>>>>     Martin
>>>>>>
>>>>>>     Lincoln Stein wrote:
>>>>>>     > Hi Folks,
>>>>>>     >
>>>>>>     > With respect to the sleeping mysql process issue, here is what I observe in testing with the current development version. The scenarios described were tested with both gbrowse and gbrowse_img (gbrowse_details testing is pending).
>>>>>>     >
>>>>>>     >  1. When run as a CGI script, no sleeping mysql processes are left open after gbrowse finishes its thing.
>>>>>>     >  2. When run as a FastCGI script (either mod_fcgid or mod_fastcgi), running under the threaded version of Apache creates one persistent sleeping mysql process that is reused with each subsequent request. Other processes are transiently created and destroyed while actively rendering tracks.
>>>>>>     >  3. When run as a FastCGI script with the prefork version of Apache, then there will be one persistent sleeping mysql per preforked Apache. In addition, under high load, more mysql processes are created transiently during active rendering.
>>>>>>     >
>>>>>>     > I could not recreate a scenario of hundreds of sleeping mysql processes. My understanding of how mysql works is that if there are hundreds of sleeping mysql processes, then there must be an equal number of waiting Apache or GBrowse processes. Can someone who is having the problem please try a "ps auxwww | grep -i gbrowse" on their server machine at the same time that mysql processlist is indicating lots of sleeping processes?
>>>>>>     >
>>>>>>     > Finally, although this was not the question that initiated the thread, in my testing it looks like highest performance is obtained using mod_fastcgi and the preforked version of Apache.
>>>>>>     >
>>>>>>     > Lincoln
>>>>>>     >
>>>>>>     >
>>>>>>     > On Wed, Jul 10, 2013 at 10:58 AM, Lincoln Stein <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>>>>>>     >
>>>>>>     >     I have just uploaded GBrowse version 2.55 to CPAN. It contains a bioperl patch that fixes slow loading of multiple GFF3 files, and changes the timeout values for mod_fastcgi and mod_fcgid. I have also noticed that mod_fastcgi now seems to be faster than mod_fcgid when the timeout issue is fixed.
>>>>>>     >
>>>>>>     >     I don't think that the FastCGI issues are related to the sleeping mysql process issue, actually, because this was first reported in the context of running GBrowse in CGI mode. I am starting work on CGI mode now.
>>>>>>     >
>>>>>>     >     Lincoln
>>>>>>     >
>>>>>>     >
>>>>>>     >     On Tue, Jul 9, 2013 at 6:49 PM, Martin Mokrejs <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>>>>>>     >
>>>>>>     >         Hi,
>>>>>>     >
>>>>>>     >         Lincoln Stein wrote:
>>>>>>     >         > Hi Folks,
>>>>>>     >         >
>>>>>>     >         > With respect to the FastCGI errors, I have tracked the problem down to the following sequence of events:
>>>>>>     >         >
>>>>>>     >         >  1. The FastCGI executive module launches a fresh gbrowse instance.
>>>>>>     >         >  2. GBrowse tries to load its default database into memory.
>>>>>>     >         >  3. If the default database takes more than 3s to load, then FastCGI times it out.
>>>>>>     >         >  4. FastCGI launches a new instance of GBrowse.
>>>>>>     >         >  5. GBrowse tries to load its default database into memory.
>>>>>>     >         >  6. FastCGI times the new instance out.
>>>>>>     >         >  7. Repeat 4-6 indefinitely.
>>>>>>     >         >
>>>>>>     >         > Note that this only happens for databases that are slow to load, typically in-memory databases. For example, the full tutorial database takes ~8s to load on my machine (unreasonably slow for reasons that are unclear to me: I am going to start debugging bioperl next). Once the database is loaded, however, all subsequent accesses are fast.
>>>>>>     >         >
>>>>>>     >         > Here's what to do to fix the problem:
>>>>>>     >         >
>>>>>>     >         >   * for mod_fcgid, add the following directive to /etc/apache2/conf.d/gbrowse2 (in the same section as *FcgidIOTimeout*)
>>>>>>     >         >       o   *FcgidConnectTimeout 30*
>>>>>>     >         >   * for mod_fastcgi, change *FastCGIConfig* to
>>>>>>     >         >       o *FastCGIConfig *-startDelay 30 -appConnTimeout 30 -idle-timeout 600 -maxClassProcesses 20  -initial-env GBROWSE_CONF=/etc/gbrowse2
>>>>>>     >
>>>>>>     >         Hmm, I would call this a workaround instead. Why isn't there something like a lock
>>>>>>     >         in Gbrowse so that multiple connections to mysql wouldn't be established? This
>>>>>>     >         will definitely trick again some user or sysadmin. At least if gbrowse would test
>>>>>>     >         for a timeout value and complain and exit if it would be too short.
>>>>>>     >
>>>>>>     >         In my eyes gbrowse is full of such tricks and reading this lists only ensures me that
>>>>>>     >         the code should be more careful about what user has configured, what is and what is not
>>>>>>     >         available, and ... provide helpful error messages. I don't have the time to contribute
>>>>>>     >         some code in this regard but anything leading to perl unitialized values should be
>>>>>>     >         fixed. At least, I could fish some emails from the archives and emphasize the worst
>>>>>>     >         examples. Or the other way around, jsut take a working setup and start to screw file/dir
>>>>>>     >         permission, place bad filenames/dbnames/dirnames into config, set screw variable names.
>>>>>>     >         It will all generate all kinds of funny messages to apache logs and that should fairly
>>>>>>     >         doable to put a couple of roadblocks along the way.
>>>>>>     >
>>>>>>     >         This would only be helpful to everybody asking here on this list for help with some weird
>>>>>>     >         messages in apache logs.
>>>>>>     >
>>>>>>     >         Just my 2cents, ;-)
>>>>>>     >         Martin
>>>>>>     >
>>>>>>     >
>>>>>>     >
>>>>>>     >
>>>>>>     >     --
>>>>>>     >     Lincoln D. Stein
>>>>>>     >     Director, Informatics and Biocomputing Platform
>>>>>>     >     Ontario Institute for Cancer Research
>>>>>>     >     101 College St., Suite 800
>>>>>>     >     Toronto, ON, Canada M5G0A3
>>>>>>     >     416 673-8514 <tel:416%20673-8514> <tel:416%20673-8514>
>>>>>>     >     Assistant: Renata Musa <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>>>>>>     >
>>>>>>     >
>>>>>>     >
>>>>>>     >
>>>>>>     > --
>>>>>>     > Lincoln D. Stein
>>>>>>     > Director, Informatics and Biocomputing Platform
>>>>>>     > Ontario Institute for Cancer Research
>>>>>>     > 101 College St., Suite 800
>>>>>>     > Toronto, ON, Canada M5G0A3
>>>>>>     > 416 673-8514 <tel:416%20673-8514>
>>>>>>     > Assistant: Renata Musa <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Lincoln D. Stein
>>>>>> Director, Informatics and Biocomputing Platform
>>>>>> Ontario Institute for Cancer Research
>>>>>> 101 College St., Suite 800
>>>>>> Toronto, ON, Canada M5G0A3
>>>>>> 416 673-8514
>>>>>> Assistant: Renata Musa <[hidden email] <mailto:[hidden email]>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> See everything from the browser to the database with AppDynamics
>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>> Start your free trial of AppDynamics Pro today!
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Gmod-gbrowse mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>>
>>>>
>>
>>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: sleeping mysql processes w/Bio::DB::SeqFeature::Store

Lincoln Stein
In reply to this post by Martin Mokrejs
I'm trying to figure out how to do what you propose, which sounds like a great idea, but I don't know how it will work with authentication. Right now db handles are cached at the dsn level, which ensures that authentication is performed at database connect time. If several tracks use the same DSN, then they get the same database handle. To change this so that handles are cached at the server:port level, you'd have to place "use datasource" in front of each query sequence. But if we have two databases, one accessible by user "foo" password "blah" and the other by user "bar", password "none", how do I switch users back and forth?

Lincoln


On Thu, Jul 11, 2013 at 4:07 PM, Martin Mokrejs <[hidden email]> wrote:
Lincoln Stein wrote:
> By design, GBrowse does not close mysql connections but keeps them
> open between requests. For any GBrowse process, there may be as many
> mysql connections as there are mysql databases associated with the

Sure, connects to databases are expensive but I think one GB process
should re-use just a one, single connection. Mysql has a per-client
cache, buffers and this is I think causing too much unnecessary memory
pressure on the mysql/kernel. I would rather leave mysql have greater
query and result caches while having same overall memory footprint.

I haven't studied this, in theory it could be that mysql flushes/trashes
client caches when client switches between two different databases ...
("USE FOO;"). If that is true maybe this could turn to be a real problem
to flip back and forth between databases. Or maybe not, the client cache
is just to buffer data while sending to client ... which in case of
localhost installations is quick. So is probably empty most of the time
and easy to drop and recreate.

> data source. This avoids the performance hit of reopening the mysql
> connection each time a track needs to be rendered.

Yes, new connects to mysql are slow to establish but one could be enough.

> I can change this behavior if it is causing problems. I have not
> actually tested the latter to measure its impact on performance.

If it is relatively easy to implement then I would propose to try re-using
a single connection to mysql (each GB instance would have just one DB connection
to the same host:port). I think this will be helpful to at least small
servers which cannot handle too my clients simultaneously. With current
approach people can run out of the number of clients/databases open
too quickly. Sounds like having 20 genome databases with 10 tracks each
is asking for 200 connections if each genome db is accessed by just a single
user. I would expect more remote users ... I think this is a lot. Each
mysql process can serve hundreds to thousands of clients.

These are just untested ideas but I bet this will help.
Martin

>
> Lincoln
>
>
> On Thu, Jul 11, 2013 at 10:50 AM, Martin Mokrejs <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hi Lincoln,
>       thank your work your answers and thorough work on this. I would still advice everybody facing those
>     many idle mysql connections to enable mysql logging (either slow plaintext mysql.log or binary logging
>     file). I believe that from the series of SQL commands which were executed through the later on "idle"
>     connection one could deduce what routine in gbrowse opened the connection and maybe even why it was
>     NOT closed. Or at least, it would clearly show that Apache is at fault.
>
>     Martin
>
>     Lincoln Stein wrote:
>     > Hi Folks,
>     >
>     > With respect to the sleeping mysql process issue, here is what I observe in testing with the current development version. The scenarios described were tested with both gbrowse and gbrowse_img (gbrowse_details testing is pending).
>     >
>     >  1. When run as a CGI script, no sleeping mysql processes are left open after gbrowse finishes its thing.
>     >  2. When run as a FastCGI script (either mod_fcgid or mod_fastcgi), running under the threaded version of Apache creates one persistent sleeping mysql process that is reused with each subsequent request. Other processes are transiently created and destroyed while actively rendering tracks.
>     >  3. When run as a FastCGI script with the prefork version of Apache, then there will be one persistent sleeping mysql per preforked Apache. In addition, under high load, more mysql processes are created transiently during active rendering.
>     >
>     > I could not recreate a scenario of hundreds of sleeping mysql processes. My understanding of how mysql works is that if there are hundreds of sleeping mysql processes, then there must be an equal number of waiting Apache or GBrowse processes. Can someone who is having the problem please try a "ps auxwww | grep -i gbrowse" on their server machine at the same time that mysql processlist is indicating lots of sleeping processes?
>     >
>     > Finally, although this was not the question that initiated the thread, in my testing it looks like highest performance is obtained using mod_fastcgi and the preforked version of Apache.
>     >
>     > Lincoln
>     >
>     >
>     > On Wed, Jul 10, 2013 at 10:58 AM, Lincoln Stein <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>     >
>     >     I have just uploaded GBrowse version 2.55 to CPAN. It contains a bioperl patch that fixes slow loading of multiple GFF3 files, and changes the timeout values for mod_fastcgi and mod_fcgid. I have also noticed that mod_fastcgi now seems to be faster than mod_fcgid when the timeout issue is fixed.
>     >
>     >     I don't think that the FastCGI issues are related to the sleeping mysql process issue, actually, because this was first reported in the context of running GBrowse in CGI mode. I am starting work on CGI mode now.
>     >
>     >     Lincoln
>     >
>     >
>     >     On Tue, Jul 9, 2013 at 6:49 PM, Martin Mokrejs <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>     >
>     >         Hi,
>     >
>     >         Lincoln Stein wrote:
>     >         > Hi Folks,
>     >         >
>     >         > With respect to the FastCGI errors, I have tracked the problem down to the following sequence of events:
>     >         >
>     >         >  1. The FastCGI executive module launches a fresh gbrowse instance.
>     >         >  2. GBrowse tries to load its default database into memory.
>     >         >  3. If the default database takes more than 3s to load, then FastCGI times it out.
>     >         >  4. FastCGI launches a new instance of GBrowse.
>     >         >  5. GBrowse tries to load its default database into memory.
>     >         >  6. FastCGI times the new instance out.
>     >         >  7. Repeat 4-6 indefinitely.
>     >         >
>     >         > Note that this only happens for databases that are slow to load, typically in-memory databases. For example, the full tutorial database takes ~8s to load on my machine (unreasonably slow for reasons that are unclear to me: I am going to start debugging bioperl next). Once the database is loaded, however, all subsequent accesses are fast.
>     >         >
>     >         > Here's what to do to fix the problem:
>     >         >
>     >         >   * for mod_fcgid, add the following directive to /etc/apache2/conf.d/gbrowse2 (in the same section as *FcgidIOTimeout*)
>     >         >       o   *FcgidConnectTimeout 30*
>     >         >   * for mod_fastcgi, change *FastCGIConfig* to
>     >         >       o *FastCGIConfig *-startDelay 30 -appConnTimeout 30 -idle-timeout 600 -maxClassProcesses 20  -initial-env GBROWSE_CONF=/etc/gbrowse2
>     >
>     >         Hmm, I would call this a workaround instead. Why isn't there something like a lock
>     >         in Gbrowse so that multiple connections to mysql wouldn't be established? This
>     >         will definitely trick again some user or sysadmin. At least if gbrowse would test
>     >         for a timeout value and complain and exit if it would be too short.
>     >
>     >         In my eyes gbrowse is full of such tricks and reading this lists only ensures me that
>     >         the code should be more careful about what user has configured, what is and what is not
>     >         available, and ... provide helpful error messages. I don't have the time to contribute
>     >         some code in this regard but anything leading to perl unitialized values should be
>     >         fixed. At least, I could fish some emails from the archives and emphasize the worst
>     >         examples. Or the other way around, jsut take a working setup and start to screw file/dir
>     >         permission, place bad filenames/dbnames/dirnames into config, set screw variable names.
>     >         It will all generate all kinds of funny messages to apache logs and that should fairly
>     >         doable to put a couple of roadblocks along the way.
>     >
>     >         This would only be helpful to everybody asking here on this list for help with some weird
>     >         messages in apache logs.
>     >
>     >         Just my 2cents, ;-)
>     >         Martin
>     >
>     >
>     >
>     >
>     >     --
>     >     Lincoln D. Stein
>     >     Director, Informatics and Biocomputing Platform
>     >     Ontario Institute for Cancer Research
>     >     101 College St., Suite 800
>     >     Toronto, ON, Canada M5G0A3
>     >     <a href="tel:416%20673-8514" value="+14166738514">416 673-8514 <tel:416%20673-8514> <tel:416%20673-8514>
>     >     Assistant: Renata Musa <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>     >
>     >
>     >
>     >
>     > --
>     > Lincoln D. Stein
>     > Director, Informatics and Biocomputing Platform
>     > Ontario Institute for Cancer Research
>     > 101 College St., Suite 800
>     > Toronto, ON, Canada M5G0A3
>     > <a href="tel:416%20673-8514" value="+14166738514">416 673-8514 <tel:416%20673-8514>
>     > Assistant: Renata Musa <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>
>
>
>
> --
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> <a href="tel:416%20673-8514" value="+14166738514">416 673-8514
> Assistant: Renata Musa <[hidden email] <mailto:[hidden email]>>



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: sleeping mysql processes w/Bio::DB::SeqFeature::Store

Martin Mokrejs
Lincoln Stein wrote:
> I'm trying to figure out how to do what you propose, which sounds
> like a great idea, but I don't know how it will work with
> authentication. Right now db handles are cached at the dsn level,
> which ensures that authentication is performed at database connect
> time. If several tracks use the same DSN, then they get the same
> database handle. To change this so that handles are cached at the
> server:port level, you'd have to place "use datasource" in front of

Yes, that what I meant with "USE FOO;" command. This could appear to be
a real stopper if that is too expensive but lets see.

> each query sequence. But if we have two databases, one accessible by
> user "foo" password "blah" and the other by user "bar", password
> "none", how do I switch users back and forth?

I think you cannot. I just did not think of this scenario. For this
you need both connection handles as far as I know.

I thought about general scenario of just a single server:port with many
dbs but accessible under same user:pass. How many such databases are
around which have different users:pass combinations? I would tune gbrowse
a generic use-case, so a read-only access to most stuff.

BTW, aren't those connections with increased (non-default) privs/dbs the
cause of cumulating idle connections?

Martin

>
> Lincoln
>
>
> On Thu, Jul 11, 2013 at 4:07 PM, Martin Mokrejs <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Lincoln Stein wrote:
>     > By design, GBrowse does not close mysql connections but keeps them
>     > open between requests. For any GBrowse process, there may be as many
>     > mysql connections as there are mysql databases associated with the
>
>     Sure, connects to databases are expensive but I think one GB process
>     should re-use just a one, single connection. Mysql has a per-client
>     cache, buffers and this is I think causing too much unnecessary memory
>     pressure on the mysql/kernel. I would rather leave mysql have greater
>     query and result caches while having same overall memory footprint.
>
>     I haven't studied this, in theory it could be that mysql flushes/trashes
>     client caches when client switches between two different databases ...
>     ("USE FOO;"). If that is true maybe this could turn to be a real problem
>     to flip back and forth between databases. Or maybe not, the client cache
>     is just to buffer data while sending to client ... which in case of
>     localhost installations is quick. So is probably empty most of the time
>     and easy to drop and recreate.
>
>     > data source. This avoids the performance hit of reopening the mysql
>     > connection each time a track needs to be rendered.
>
>     Yes, new connects to mysql are slow to establish but one could be enough.
>
>     > I can change this behavior if it is causing problems. I have not
>     > actually tested the latter to measure its impact on performance.
>
>     If it is relatively easy to implement then I would propose to try re-using
>     a single connection to mysql (each GB instance would have just one DB connection
>     to the same host:port). I think this will be helpful to at least small
>     servers which cannot handle too my clients simultaneously. With current
>     approach people can run out of the number of clients/databases open
>     too quickly. Sounds like having 20 genome databases with 10 tracks each
>     is asking for 200 connections if each genome db is accessed by just a single
>     user. I would expect more remote users ... I think this is a lot. Each
>     mysql process can serve hundreds to thousands of clients.
>
>     These are just untested ideas but I bet this will help.
>     Martin
>
>     >
>     > Lincoln
>     >
>     >
>     > On Thu, Jul 11, 2013 at 10:50 AM, Martin Mokrejs <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>     >
>     >     Hi Lincoln,
>     >       thank your work your answers and thorough work on this. I would still advice everybody facing those
>     >     many idle mysql connections to enable mysql logging (either slow plaintext mysql.log or binary logging
>     >     file). I believe that from the series of SQL commands which were executed through the later on "idle"
>     >     connection one could deduce what routine in gbrowse opened the connection and maybe even why it was
>     >     NOT closed. Or at least, it would clearly show that Apache is at fault.
>     >
>     >     Martin
>     >
>     >     Lincoln Stein wrote:
>     >     > Hi Folks,
>     >     >
>     >     > With respect to the sleeping mysql process issue, here is what I observe in testing with the current development version. The scenarios described were tested with both gbrowse and gbrowse_img (gbrowse_details testing is pending).
>     >     >
>     >     >  1. When run as a CGI script, no sleeping mysql processes are left open after gbrowse finishes its thing.
>     >     >  2. When run as a FastCGI script (either mod_fcgid or mod_fastcgi), running under the threaded version of Apache creates one persistent sleeping mysql process that is reused with each subsequent request. Other processes are transiently created and destroyed while actively rendering tracks.
>     >     >  3. When run as a FastCGI script with the prefork version of Apache, then there will be one persistent sleeping mysql per preforked Apache. In addition, under high load, more mysql processes are created transiently during active rendering.
>     >     >
>     >     > I could not recreate a scenario of hundreds of sleeping mysql processes. My understanding of how mysql works is that if there are hundreds of sleeping mysql processes, then there must be an equal number of waiting Apache or GBrowse processes. Can someone who is having the problem please try a "ps auxwww | grep -i gbrowse" on their server machine at the same time that mysql processlist is indicating lots of sleeping processes?
>     >     >
>     >     > Finally, although this was not the question that initiated the thread, in my testing it looks like highest performance is obtained using mod_fastcgi and the preforked version of Apache.
>     >     >
>     >     > Lincoln
>     >     >
>     >     >
>     >     > On Wed, Jul 10, 2013 at 10:58 AM, Lincoln Stein <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:
>     >     >
>     >     >     I have just uploaded GBrowse version 2.55 to CPAN. It contains a bioperl patch that fixes slow loading of multiple GFF3 files, and changes the timeout values for mod_fastcgi and mod_fcgid. I have also noticed that mod_fastcgi now seems to be faster than mod_fcgid when the timeout issue is fixed.
>     >     >
>     >     >     I don't think that the FastCGI issues are related to the sleeping mysql process issue, actually, because this was first reported in the context of running GBrowse in CGI mode. I am starting work on CGI mode now.
>     >     >
>     >     >     Lincoln
>     >     >
>     >     >
>     >     >     On Tue, Jul 9, 2013 at 6:49 PM, Martin Mokrejs <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:
>     >     >
>     >     >         Hi,
>     >     >
>     >     >         Lincoln Stein wrote:
>     >     >         > Hi Folks,
>     >     >         >
>     >     >         > With respect to the FastCGI errors, I have tracked the problem down to the following sequence of events:
>     >     >         >
>     >     >         >  1. The FastCGI executive module launches a fresh gbrowse instance.
>     >     >         >  2. GBrowse tries to load its default database into memory.
>     >     >         >  3. If the default database takes more than 3s to load, then FastCGI times it out.
>     >     >         >  4. FastCGI launches a new instance of GBrowse.
>     >     >         >  5. GBrowse tries to load its default database into memory.
>     >     >         >  6. FastCGI times the new instance out.
>     >     >         >  7. Repeat 4-6 indefinitely.
>     >     >         >
>     >     >         > Note that this only happens for databases that are slow to load, typically in-memory databases. For example, the full tutorial database takes ~8s to load on my machine (unreasonably slow for reasons that are unclear to me: I am going to start debugging bioperl next). Once the database is loaded, however, all subsequent accesses are fast.
>     >     >         >
>     >     >         > Here's what to do to fix the problem:
>     >     >         >
>     >     >         >   * for mod_fcgid, add the following directive to /etc/apache2/conf.d/gbrowse2 (in the same section as *FcgidIOTimeout*)
>     >     >         >       o   *FcgidConnectTimeout 30*
>     >     >         >   * for mod_fastcgi, change *FastCGIConfig* to
>     >     >         >       o *FastCGIConfig *-startDelay 30 -appConnTimeout 30 -idle-timeout 600 -maxClassProcesses 20  -initial-env GBROWSE_CONF=/etc/gbrowse2
>     >     >
>     >     >         Hmm, I would call this a workaround instead. Why isn't there something like a lock
>     >     >         in Gbrowse so that multiple connections to mysql wouldn't be established? This
>     >     >         will definitely trick again some user or sysadmin. At least if gbrowse would test
>     >     >         for a timeout value and complain and exit if it would be too short.
>     >     >
>     >     >         In my eyes gbrowse is full of such tricks and reading this lists only ensures me that
>     >     >         the code should be more careful about what user has configured, what is and what is not
>     >     >         available, and ... provide helpful error messages. I don't have the time to contribute
>     >     >         some code in this regard but anything leading to perl unitialized values should be
>     >     >         fixed. At least, I could fish some emails from the archives and emphasize the worst
>     >     >         examples. Or the other way around, jsut take a working setup and start to screw file/dir
>     >     >         permission, place bad filenames/dbnames/dirnames into config, set screw variable names.
>     >     >         It will all generate all kinds of funny messages to apache logs and that should fairly
>     >     >         doable to put a couple of roadblocks along the way.
>     >     >
>     >     >         This would only be helpful to everybody asking here on this list for help with some weird
>     >     >         messages in apache logs.
>     >     >
>     >     >         Just my 2cents, ;-)
>     >     >         Martin
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >     --
>     >     >     Lincoln D. Stein
>     >     >     Director, Informatics and Biocomputing Platform
>     >     >     Ontario Institute for Cancer Research
>     >     >     101 College St., Suite 800
>     >     >     Toronto, ON, Canada M5G0A3
>     >     >     416 673-8514 <tel:416%20673-8514> <tel:416%20673-8514> <tel:416%20673-8514>
>     >     >     Assistant: Renata Musa <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > --
>     >     > Lincoln D. Stein
>     >     > Director, Informatics and Biocomputing Platform
>     >     > Ontario Institute for Cancer Research
>     >     > 101 College St., Suite 800
>     >     > Toronto, ON, Canada M5G0A3
>     >     > 416 673-8514 <tel:416%20673-8514> <tel:416%20673-8514>
>     >     > Assistant: Renata Musa <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>
>     >
>     >
>     >
>     >
>     > --
>     > Lincoln D. Stein
>     > Director, Informatics and Biocomputing Platform
>     > Ontario Institute for Cancer Research
>     > 101 College St., Suite 800
>     > Toronto, ON, Canada M5G0A3
>     > 416 673-8514 <tel:416%20673-8514>
>     > Assistant: Renata Musa <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>
>
>
>
> --
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <[hidden email] <mailto:[hidden email]>>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
12