non-responsive web site?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

non-responsive web site?

joe carlson
Hi Julie et al,

I was wondering if you’ve seen anything like the behavior we’re seeing the last couple of days.

We have someone who wants to put down very large datasets through a custom query. Basically all 1.7 million protein sequences with some header lines. The query is not so bad - just a lot of data - and the log files indicates it seems to run OK. I believe I’ve upped the query limits from where you have set them.

But we also have a service monitor script which every couple of minutes does a couple of simple things to make sure the web site is still up, and does a reboot when needed. We had put this in when we had a problem with running out of memory. In the past couple of days his monitoring script is restarting the web site when our user is attempting to pull the big data set.

I am not sure if the web site is really locked up when this monitor script says it is. I’ve run the big query here a couple of times and sometimes I can get results without a restart and other times with a restart. But I don’t see the web pages as being unresponsive in a browser. I also cannot find anything in the logs that indicates a tomcat error.

What sort of tools do you use to check the health of your web server? Is is anything standard, or a home brew? I at a loss on trying to find our what to look at.

Thanks

Joe


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: non-responsive web site?

Sergio Contrino
hi joe,
would it be possible to monitor the cpu usage for the database server,
and maybe use this data to influence the behaviour of the service
monitor script (assuming that the web site could be slow to answer
because it is waiting for the database)?
thanks
sergio


On 13/09/15 01:07, Joe Carlson wrote:

> Hi Julie et al,
>
> I was wondering if you’ve seen anything like the behavior we’re seeing the last couple of days.
>
> We have someone who wants to put down very large datasets through a custom query. Basically all 1.7 million protein sequences with some header lines. The query is not so bad - just a lot of data - and the log files indicates it seems to run OK. I believe I’ve upped the query limits from where you have set them.
>
> But we also have a service monitor script which every couple of minutes does a couple of simple things to make sure the web site is still up, and does a reboot when needed. We had put this in when we had a problem with running out of memory. In the past couple of days his monitoring script is restarting the web site when our user is attempting to pull the big data set.
>
> I am not sure if the web site is really locked up when this monitor script says it is. I’ve run the big query here a couple of times and sometimes I can get results without a restart and other times with a restart. But I don’t see the web pages as being unresponsive in a browser. I also cannot find anything in the logs that indicates a tomcat error.
>
> What sort of tools do you use to check the health of your web server? Is is anything standard, or a home brew? I at a loss on trying to find our what to look at.
>
> Thanks
>
> Joe
>
>
> _______________________________________________
> dev mailing list
> [hidden email]
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>

--
sergio contrino                  InterMine, University of Cambridge
https://sergiocontrino.github.io           http://www.intermine.org

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: non-responsive web site?

Justin Clark-Casey
Out of interest, Joe, what actions does your monitor script take?  I presume one of them is to make a request to the site?

Also, have you seen [1]?  Looks like a lot of general performance tuning information on that page.  In particular, it makes the point that like a lot of server
software, Tomcat is tuned for out-of-the-box ease-of-use and not for larger loads.  It might be worth trying to increase the number of threads available to
connectors though I imagine the default of 5 (I think it's 5) is probably still plenty unless an InterMine installation is really popular.  But you may well
have already gone through all this :)

And all that said, the caveat here is that I haven't had to do any such investigation myself (yet), though I have done similar work on different types of systems.

Regards,

--
Justin Clark-Casey, Synbiomine/InterMine Software Developer
http://synbiomine.org
http://twitter.com/justincc

On 14/09/15 14:03, sergio contrino wrote:

> hi joe,
> would it be possible to monitor the cpu usage for the database server, and maybe use this data to influence the behaviour of the service monitor script
> (assuming that the web site could be slow to answer because it is waiting for the database)?
> thanks
> sergio
>
>
> On 13/09/15 01:07, Joe Carlson wrote:
>> Hi Julie et al,
>>
>> I was wondering if you’ve seen anything like the behavior we’re seeing the last couple of days.
>>
>> We have someone who wants to put down very large datasets through a custom query. Basically all 1.7 million protein sequences with some header lines. The
>> query is not so bad - just a lot of data - and the log files indicates it seems to run OK. I believe I’ve upped the query limits from where you have set them.
>>
>> But we also have a service monitor script which every couple of minutes does a couple of simple things to make sure the web site is still up, and does a
>> reboot when needed. We had put this in when we had a problem with running out of memory. In the past couple of days his monitoring script is restarting the
>> web site when our user is attempting to pull the big data set.
>>
>> I am not sure if the web site is really locked up when this monitor script says it is. I’ve run the big query here a couple of times and sometimes I can get
>> results without a restart and other times with a restart. But I don’t see the web pages as being unresponsive in a browser. I also cannot find anything in the
>> logs that indicates a tomcat error.
>>
>> What sort of tools do you use to check the health of your web server? Is is anything standard, or a home brew? I at a loss on trying to find our what to look at.
>>
>> Thanks
>>
>> Joe
>>
>>
>> _______________________________________________
>> dev mailing list
>> [hidden email]
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>
>

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: non-responsive web site?

Colin
This is sort of random but have you experimented with the "NIO" connector in tomcat? We deployed this recently for tomcat and it seems to have fixed some problems that we were having.


I originally I reported the problem that I saw as a "spikey CPU" issue a couple months back but it really has been "acting up" more and more lately http://gmod.827538.n3.nabble.com/Tomcat-CPU-spikes-increase-memory-td4048635.html


When I dug into some google searches, the NIO connector came up as a possible fix, and even that it can cause website hanging without the NIO. Could be a possibly fix.




-Colin

On Thu, Sep 17, 2015 at 7:58 AM, Justin Clark-Casey <[hidden email]> wrote:
Out of interest, Joe, what actions does your monitor script take?  I presume one of them is to make a request to the site?

Also, have you seen [1]?  Looks like a lot of general performance tuning information on that page.  In particular, it makes the point that like a lot of server software, Tomcat is tuned for out-of-the-box ease-of-use and not for larger loads.  It might be worth trying to increase the number of threads available to connectors though I imagine the default of 5 (I think it's 5) is probably still plenty unless an InterMine installation is really popular.  But you may well have already gone through all this :)

And all that said, the caveat here is that I haven't had to do any such investigation myself (yet), though I have done similar work on different types of systems.

Regards,

--
Justin Clark-Casey, Synbiomine/InterMine Software Developer
http://synbiomine.org
http://twitter.com/justincc


On 14/09/15 14:03, sergio contrino wrote:
hi joe,
would it be possible to monitor the cpu usage for the database server, and maybe use this data to influence the behaviour of the service monitor script
(assuming that the web site could be slow to answer because it is waiting for the database)?
thanks
sergio


On 13/09/15 01:07, Joe Carlson wrote:
Hi Julie et al,

I was wondering if you’ve seen anything like the behavior we’re seeing the last couple of days.

We have someone who wants to put down very large datasets through a custom query. Basically all 1.7 million protein sequences with some header lines. The
query is not so bad - just a lot of data - and the log files indicates it seems to run OK. I believe I’ve upped the query limits from where you have set them.

But we also have a service monitor script which every couple of minutes does a couple of simple things to make sure the web site is still up, and does a
reboot when needed. We had put this in when we had a problem with running out of memory. In the past couple of days his monitoring script is restarting the
web site when our user is attempting to pull the big data set.

I am not sure if the web site is really locked up when this monitor script says it is. I’ve run the big query here a couple of times and sometimes I can get
results without a restart and other times with a restart. But I don’t see the web pages as being unresponsive in a browser. I also cannot find anything in the
logs that indicates a tomcat error.

What sort of tools do you use to check the health of your web server? Is is anything standard, or a home brew? I at a loss on trying to find our what to look at.

Thanks

Joe


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev



_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: non-responsive web site?

joe carlson
In reply to this post by Justin Clark-Casey
Hi Justin and Colin,

Early on, we were seeing problems with running out of memory in tomcat
and the web site locking up. We put in place a cron job that made 2
requests every few minutes to check the health status. Both requests
were wget's with a 15 second timeout. The first request for a bit of
genomic data, the second is for the homepage

The genomic slice request is
http://phytozome.jgi.doe.gov/phytomine/service/regions/sequence?query=%7B%22regions%22:[%22Chr19%5Ct10692566%5Ct10700215%22],%22organism%22:%22251%22%7D'

We decided that the cause of the memory problems was
org.intermine.bio.web.export.GenomicRegionSequenceExporter. This code
caches the chromosome sequence. While this is OK for a (puny) fly
sequence, it's not so good for out 50+ plant genomes. Over time that
just consumed too much memory. We considered coding some freeing of the
cache with time, but in the end just decided to take it out. That solved
the webserver memory problems and we went for months without a restart.

Then they started happening again. It coincided with a user who wanted
to make a query with a huge download. The query cost estimate probably
exceeds the default value of the web service request - I had bumped up
the number - but I wanted to see if we could do it. The user said it did
not work a couple of times, but more recently said it was successful.

I did some experimenting to see how long it took the health status call
to the webservice request while doing these big requests. This was on a
test server so I'm certain there was no other activity. I had 2 big
downloads happening and then plotted the time wget reported. Most times
were essentially instantaneous, but ~ 5 times when both big requests
were active the health status check took more than 15 seconds to respond.

It seemed that the slower responses coincided with CPU spikes. I have
been assuming this is from garbage collection. David found a post from
oracle about trouble shooting long gc pauses that I'm going to look into.

We had been using psi-probe as a tomcat monitor. But after some
rearrangement here I saw that this was not working. I need to get that
back in action. One of the things I am hoping to learn soon are the
tools the people use when diagnosing issues like this.

I'll check out your write ups, Colin. By the way, what is your "[1]"
referring to?

Thanks,

Joe


On 09/17/2015 05:58 AM, Justin Clark-Casey wrote:

> Out of interest, Joe, what actions does your monitor script take?  I
> presume one of them is to make a request to the site?
>
> Also, have you seen [1]?  Looks like a lot of general performance
> tuning information on that page.  In particular, it makes the point
> that like a lot of server software, Tomcat is tuned for out-of-the-box
> ease-of-use and not for larger loads.  It might be worth trying to
> increase the number of threads available to connectors though I
> imagine the default of 5 (I think it's 5) is probably still plenty
> unless an InterMine installation is really popular.  But you may well
> have already gone through all this :)
>
> And all that said, the caveat here is that I haven't had to do any
> such investigation myself (yet), though I have done similar work on
> different types of systems.
>
> Regards,
>
> --
> Justin Clark-Casey, Synbiomine/InterMine Software Developer
> http://synbiomine.org
> http://twitter.com/justincc
>
> On 14/09/15 14:03, sergio contrino wrote:
>> hi joe,
>> would it be possible to monitor the cpu usage for the database
>> server, and maybe use this data to influence the behaviour of the
>> service monitor script
>> (assuming that the web site could be slow to answer because it is
>> waiting for the database)?
>> thanks
>> sergio
>>
>>
>> On 13/09/15 01:07, Joe Carlson wrote:
>>> Hi Julie et al,
>>>
>>> I was wondering if you’ve seen anything like the behavior we’re
>>> seeing the last couple of days.
>>>
>>> We have someone who wants to put down very large datasets through a
>>> custom query. Basically all 1.7 million protein sequences with some
>>> header lines. The
>>> query is not so bad - just a lot of data - and the log files
>>> indicates it seems to run OK. I believe I’ve upped the query limits
>>> from where you have set them.
>>>
>>> But we also have a service monitor script which every couple of
>>> minutes does a couple of simple things to make sure the web site is
>>> still up, and does a
>>> reboot when needed. We had put this in when we had a problem with
>>> running out of memory. In the past couple of days his monitoring
>>> script is restarting the
>>> web site when our user is attempting to pull the big data set.
>>>
>>> I am not sure if the web site is really locked up when this monitor
>>> script says it is. I’ve run the big query here a couple of times and
>>> sometimes I can get
>>> results without a restart and other times with a restart. But I
>>> don’t see the web pages as being unresponsive in a browser. I also
>>> cannot find anything in the
>>> logs that indicates a tomcat error.
>>>
>>> What sort of tools do you use to check the health of your web
>>> server? Is is anything standard, or a home brew? I at a loss on
>>> trying to find our what to look at.
>>>
>>> Thanks
>>>
>>> Joe
>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [hidden email]
>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>
>>
>
> _______________________________________________
> dev mailing list
> [hidden email]
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: non-responsive web site?

Justin Clark-Casey
Sorry Joe, I keep saying to myself that I need to write some reference checker for my e-mails.  It was [1].

I agree, spike in CPU does sound like it could be GC related.  Depending on the GC implementation, it could also be pausing threads across the board though I'd
be quite surprised if that was happening in a modern JVM.

Love the write-up Colin did.  I'll be very interested to hear if that resolves your problems.

[1] https://www.mulesoft.com/tcat/tomcat-performance

--
Justin Clark-Casey, Synbiomine/InterMine Software Developer
http://synbiomine.org
http://twitter.com/justincc

On 17/09/15 18:01, Joe Carlson wrote:

> Hi Justin and Colin,
>
> Early on, we were seeing problems with running out of memory in tomcat and the web site locking up. We put in place a cron job that made 2 requests every few
> minutes to check the health status. Both requests were wget's with a 15 second timeout. The first request for a bit of genomic data, the second is for the homepage
>
> The genomic slice request is
> http://phytozome.jgi.doe.gov/phytomine/service/regions/sequence?query=%7B%22regions%22:[%22Chr19%5Ct10692566%5Ct10700215%22],%22organism%22:%22251%22%7D'
>
> We decided that the cause of the memory problems was org.intermine.bio.web.export.GenomicRegionSequenceExporter. This code caches the chromosome sequence. While
> this is OK for a (puny) fly sequence, it's not so good for out 50+ plant genomes. Over time that just consumed too much memory. We considered coding some
> freeing of the cache with time, but in the end just decided to take it out. That solved the webserver memory problems and we went for months without a restart.
>
> Then they started happening again. It coincided with a user who wanted to make a query with a huge download. The query cost estimate probably exceeds the
> default value of the web service request - I had bumped up the number - but I wanted to see if we could do it. The user said it did not work a couple of times,
> but more recently said it was successful.
>
> I did some experimenting to see how long it took the health status call to the webservice request while doing these big requests. This was on a test server so
> I'm certain there was no other activity. I had 2 big downloads happening and then plotted the time wget reported. Most times were essentially instantaneous, but
> ~ 5 times when both big requests were active the health status check took more than 15 seconds to respond.
>
> It seemed that the slower responses coincided with CPU spikes. I have been assuming this is from garbage collection. David found a post from oracle about
> trouble shooting long gc pauses that I'm going to look into.
>
> We had been using psi-probe as a tomcat monitor. But after some rearrangement here I saw that this was not working. I need to get that back in action. One of
> the things I am hoping to learn soon are the tools the people use when diagnosing issues like this.
>
> I'll check out your write ups, Colin. By the way, what is your "[1]" referring to?
>
> Thanks,
>
> Joe
>
>
> On 09/17/2015 05:58 AM, Justin Clark-Casey wrote:
>> Out of interest, Joe, what actions does your monitor script take?  I presume one of them is to make a request to the site?
>>
>> Also, have you seen [1]?  Looks like a lot of general performance tuning information on that page.  In particular, it makes the point that like a lot of
>> server software, Tomcat is tuned for out-of-the-box ease-of-use and not for larger loads.  It might be worth trying to increase the number of threads
>> available to connectors though I imagine the default of 5 (I think it's 5) is probably still plenty unless an InterMine installation is really popular.  But
>> you may well have already gone through all this :)
>>
>> And all that said, the caveat here is that I haven't had to do any such investigation myself (yet), though I have done similar work on different types of
>> systems.
>>
>> Regards,
>>
>> --
>> Justin Clark-Casey, Synbiomine/InterMine Software Developer
>> http://synbiomine.org
>> http://twitter.com/justincc
>>
>> On 14/09/15 14:03, sergio contrino wrote:
>>> hi joe,
>>> would it be possible to monitor the cpu usage for the database server, and maybe use this data to influence the behaviour of the service monitor script
>>> (assuming that the web site could be slow to answer because it is waiting for the database)?
>>> thanks
>>> sergio
>>>
>>>
>>> On 13/09/15 01:07, Joe Carlson wrote:
>>>> Hi Julie et al,
>>>>
>>>> I was wondering if you’ve seen anything like the behavior we’re seeing the last couple of days.
>>>>
>>>> We have someone who wants to put down very large datasets through a custom query. Basically all 1.7 million protein sequences with some header lines. The
>>>> query is not so bad - just a lot of data - and the log files indicates it seems to run OK. I believe I’ve upped the query limits from where you have set them.
>>>>
>>>> But we also have a service monitor script which every couple of minutes does a couple of simple things to make sure the web site is still up, and does a
>>>> reboot when needed. We had put this in when we had a problem with running out of memory. In the past couple of days his monitoring script is restarting the
>>>> web site when our user is attempting to pull the big data set.
>>>>
>>>> I am not sure if the web site is really locked up when this monitor script says it is. I’ve run the big query here a couple of times and sometimes I can get
>>>> results without a restart and other times with a restart. But I don’t see the web pages as being unresponsive in a browser. I also cannot find anything in the
>>>> logs that indicates a tomcat error.
>>>>
>>>> What sort of tools do you use to check the health of your web server? Is is anything standard, or a home brew? I at a loss on trying to find our what to
>>>> look at.
>>>>
>>>> Thanks
>>>>
>>>> Joe
>>>>
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> [hidden email]
>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>
>>>
>>
>> _______________________________________________
>> dev mailing list
>> [hidden email]
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: non-responsive web site?

Colin
Unfortunately, my idea for using NIO didn't actually fix all our problems :(

It seemed that after the java memory limits were hit (which we had set as 10GB), it seems to just go back to being unstable and causing spikey cpu. 


I will try and look into of the other issues that are probably related to memory. Maybe we also have the problem that Joe described with the genome sequence exporter...

-Colin

On Thu, Sep 17, 2015 at 12:22 PM, Justin Clark-Casey <[hidden email]> wrote:
Sorry Joe, I keep saying to myself that I need to write some reference checker for my e-mails.  It was [1].

I agree, spike in CPU does sound like it could be GC related.  Depending on the GC implementation, it could also be pausing threads across the board though I'd be quite surprised if that was happening in a modern JVM.

Love the write-up Colin did.  I'll be very interested to hear if that resolves your problems.

[1] https://www.mulesoft.com/tcat/tomcat-performance

--
Justin Clark-Casey, Synbiomine/InterMine Software Developer
http://synbiomine.org
http://twitter.com/justincc

On 17/09/15 18:01, Joe Carlson wrote:
Hi Justin and Colin,

Early on, we were seeing problems with running out of memory in tomcat and the web site locking up. We put in place a cron job that made 2 requests every few
minutes to check the health status. Both requests were wget's with a 15 second timeout. The first request for a bit of genomic data, the second is for the homepage

The genomic slice request is
http://phytozome.jgi.doe.gov/phytomine/service/regions/sequence?query=%7B%22regions%22:[%22Chr19%5Ct10692566%5Ct10700215%22],%22organism%22:%22251%22%7D'

We decided that the cause of the memory problems was org.intermine.bio.web.export.GenomicRegionSequenceExporter. This code caches the chromosome sequence. While
this is OK for a (puny) fly sequence, it's not so good for out 50+ plant genomes. Over time that just consumed too much memory. We considered coding some
freeing of the cache with time, but in the end just decided to take it out. That solved the webserver memory problems and we went for months without a restart.

Then they started happening again. It coincided with a user who wanted to make a query with a huge download. The query cost estimate probably exceeds the
default value of the web service request - I had bumped up the number - but I wanted to see if we could do it. The user said it did not work a couple of times,
but more recently said it was successful.

I did some experimenting to see how long it took the health status call to the webservice request while doing these big requests. This was on a test server so
I'm certain there was no other activity. I had 2 big downloads happening and then plotted the time wget reported. Most times were essentially instantaneous, but
~ 5 times when both big requests were active the health status check took more than 15 seconds to respond.

It seemed that the slower responses coincided with CPU spikes. I have been assuming this is from garbage collection. David found a post from oracle about
trouble shooting long gc pauses that I'm going to look into.

We had been using psi-probe as a tomcat monitor. But after some rearrangement here I saw that this was not working. I need to get that back in action. One of
the things I am hoping to learn soon are the tools the people use when diagnosing issues like this.

I'll check out your write ups, Colin. By the way, what is your "[1]" referring to?

Thanks,

Joe


On 09/17/2015 05:58 AM, Justin Clark-Casey wrote:
Out of interest, Joe, what actions does your monitor script take?  I presume one of them is to make a request to the site?

Also, have you seen [1]?  Looks like a lot of general performance tuning information on that page.  In particular, it makes the point that like a lot of
server software, Tomcat is tuned for out-of-the-box ease-of-use and not for larger loads.  It might be worth trying to increase the number of threads
available to connectors though I imagine the default of 5 (I think it's 5) is probably still plenty unless an InterMine installation is really popular.  But
you may well have already gone through all this :)

And all that said, the caveat here is that I haven't had to do any such investigation myself (yet), though I have done similar work on different types of
systems.

Regards,

--
Justin Clark-Casey, Synbiomine/InterMine Software Developer
http://synbiomine.org
http://twitter.com/justincc

On 14/09/15 14:03, sergio contrino wrote:
hi joe,
would it be possible to monitor the cpu usage for the database server, and maybe use this data to influence the behaviour of the service monitor script
(assuming that the web site could be slow to answer because it is waiting for the database)?
thanks
sergio


On 13/09/15 01:07, Joe Carlson wrote:
Hi Julie et al,

I was wondering if you’ve seen anything like the behavior we’re seeing the last couple of days.

We have someone who wants to put down very large datasets through a custom query. Basically all 1.7 million protein sequences with some header lines. The
query is not so bad - just a lot of data - and the log files indicates it seems to run OK. I believe I’ve upped the query limits from where you have set them.

But we also have a service monitor script which every couple of minutes does a couple of simple things to make sure the web site is still up, and does a
reboot when needed. We had put this in when we had a problem with running out of memory. In the past couple of days his monitoring script is restarting the
web site when our user is attempting to pull the big data set.

I am not sure if the web site is really locked up when this monitor script says it is. I’ve run the big query here a couple of times and sometimes I can get
results without a restart and other times with a restart. But I don’t see the web pages as being unresponsive in a browser. I also cannot find anything in the
logs that indicates a tomcat error.

What sort of tools do you use to check the health of your web server? Is is anything standard, or a home brew? I at a loss on trying to find our what to
look at.

Thanks

Joe


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev



_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev



_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: non-responsive web site?

Julie Sullivan
Hi Colin,

Which version of InterMine are you on? We've had some issues but they've
been addressed:

        https://github.com/intermine/intermine/issues/679
        https://github.com/intermine/intermine/issues/860

ha! I see you commented on the last ticket.

Let me know if you are up to date. If so, maybe make a ticket describing
what you find, and we can do some investigating as well.

Julie

On 21/09/15 19:32, Colin wrote:

> Unfortunately, my idea for using NIO didn't actually fix all our problems :(
>
> It seemed that after the java memory limits were hit (which we had set
> as 10GB), it seems to just go back to being unstable and causing spikey
> cpu.
>
>
> I will try and look into of the other issues that are probably related
> to memory. Maybe we also have the problem that Joe described with the
> genome sequence exporter...
>
> -Colin
>
> On Thu, Sep 17, 2015 at 12:22 PM, Justin Clark-Casey <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Sorry Joe, I keep saying to myself that I need to write some
>     reference checker for my e-mails.  It was [1].
>
>     I agree, spike in CPU does sound like it could be GC related.
>     Depending on the GC implementation, it could also be pausing threads
>     across the board though I'd be quite surprised if that was happening
>     in a modern JVM.
>
>     Love the write-up Colin did.  I'll be very interested to hear if
>     that resolves your problems.
>
>     [1] https://www.mulesoft.com/tcat/tomcat-performance
>
>     --
>     Justin Clark-Casey, Synbiomine/InterMine Software Developer
>     http://synbiomine.org
>     http://twitter.com/justincc
>
>     On 17/09/15 18:01, Joe Carlson wrote:
>
>         Hi Justin and Colin,
>
>         Early on, we were seeing problems with running out of memory in
>         tomcat and the web site locking up. We put in place a cron job
>         that made 2 requests every few
>         minutes to check the health status. Both requests were wget's
>         with a 15 second timeout. The first request for a bit of genomic
>         data, the second is for the homepage
>
>         The genomic slice request is
>         http://phytozome.jgi.doe.gov/phytomine/service/regions/sequence?query=%7B%22regions%22:[%22Chr19%5Ct10692566%5Ct10700215%22],%22organism%22:%22251%22%7D'
>
>         We decided that the cause of the memory problems was
>         org.intermine.bio.web.export.GenomicRegionSequenceExporter. This
>         code caches the chromosome sequence. While
>         this is OK for a (puny) fly sequence, it's not so good for out
>         50+ plant genomes. Over time that just consumed too much memory.
>         We considered coding some
>         freeing of the cache with time, but in the end just decided to
>         take it out. That solved the webserver memory problems and we
>         went for months without a restart.
>
>         Then they started happening again. It coincided with a user who
>         wanted to make a query with a huge download. The query cost
>         estimate probably exceeds the
>         default value of the web service request - I had bumped up the
>         number - but I wanted to see if we could do it. The user said it
>         did not work a couple of times,
>         but more recently said it was successful.
>
>         I did some experimenting to see how long it took the health
>         status call to the webservice request while doing these big
>         requests. This was on a test server so
>         I'm certain there was no other activity. I had 2 big downloads
>         happening and then plotted the time wget reported. Most times
>         were essentially instantaneous, but
>         ~ 5 times when both big requests were active the health status
>         check took more than 15 seconds to respond.
>
>         It seemed that the slower responses coincided with CPU spikes. I
>         have been assuming this is from garbage collection. David found
>         a post from oracle about
>         trouble shooting long gc pauses that I'm going to look into.
>
>         We had been using psi-probe as a tomcat monitor. But after some
>         rearrangement here I saw that this was not working. I need to
>         get that back in action. One of
>         the things I am hoping to learn soon are the tools the people
>         use when diagnosing issues like this.
>
>         I'll check out your write ups, Colin. By the way, what is your
>         "[1]" referring to?
>
>         Thanks,
>
>         Joe
>
>
>         On 09/17/2015 05:58 AM, Justin Clark-Casey wrote:
>
>             Out of interest, Joe, what actions does your monitor script
>             take?  I presume one of them is to make a request to the site?
>
>             Also, have you seen [1]?  Looks like a lot of general
>             performance tuning information on that page.  In particular,
>             it makes the point that like a lot of
>             server software, Tomcat is tuned for out-of-the-box
>             ease-of-use and not for larger loads.  It might be worth
>             trying to increase the number of threads
>             available to connectors though I imagine the default of 5 (I
>             think it's 5) is probably still plenty unless an InterMine
>             installation is really popular.  But
>             you may well have already gone through all this :)
>
>             And all that said, the caveat here is that I haven't had to
>             do any such investigation myself (yet), though I have done
>             similar work on different types of
>             systems.
>
>             Regards,
>
>             --
>             Justin Clark-Casey, Synbiomine/InterMine Software Developer
>             http://synbiomine.org
>             http://twitter.com/justincc
>
>             On 14/09/15 14:03, sergio contrino wrote:
>
>                 hi joe,
>                 would it be possible to monitor the cpu usage for the
>                 database server, and maybe use this data to influence
>                 the behaviour of the service monitor script
>                 (assuming that the web site could be slow to answer
>                 because it is waiting for the database)?
>                 thanks
>                 sergio
>
>
>                 On 13/09/15 01:07, Joe Carlson wrote:
>
>                     Hi Julie et al,
>
>                     I was wondering if you’ve seen anything like the
>                     behavior we’re seeing the last couple of days.
>
>                     We have someone who wants to put down very large
>                     datasets through a custom query. Basically all 1.7
>                     million protein sequences with some header lines. The
>                     query is not so bad - just a lot of data - and the
>                     log files indicates it seems to run OK. I believe
>                     I’ve upped the query limits from where you have set
>                     them.
>
>                     But we also have a service monitor script which
>                     every couple of minutes does a couple of simple
>                     things to make sure the web site is still up, and does a
>                     reboot when needed. We had put this in when we had a
>                     problem with running out of memory. In the past
>                     couple of days his monitoring script is restarting the
>                     web site when our user is attempting to pull the big
>                     data set.
>
>                     I am not sure if the web site is really locked up
>                     when this monitor script says it is. I’ve run the
>                     big query here a couple of times and sometimes I can get
>                     results without a restart and other times with a
>                     restart. But I don’t see the web pages as being
>                     unresponsive in a browser. I also cannot find
>                     anything in the
>                     logs that indicates a tomcat error.
>
>                     What sort of tools do you use to check the health of
>                     your web server? Is is anything standard, or a home
>                     brew? I at a loss on trying to find our what to
>                     look at.
>
>                     Thanks
>
>                     Joe
>
>
>                     _______________________________________________
>                     dev mailing list
>                     [hidden email] <mailto:[hidden email]>
>                     http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>
>
>
>             _______________________________________________
>             dev mailing list
>             [hidden email] <mailto:[hidden email]>
>             http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>
>
>
>
>
> _______________________________________________
> dev mailing list
> [hidden email]
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev