chado performance/denormalising (nat-div or otherwise)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

chado performance/denormalising (nat-div or otherwise)

Bob MacCallum
I see from some old-ish pages found by googling that flybase and others have denormalised chado or pre-built HTML pages for performance reasons.  Also there is plenty about gbrowse being faster with denormalised data stores (anything but Chado it seems).

My personal (temporary) solution to make Bio::Chado::Schema powered queries usable for web service provision has been to use memcached to cache the responses at quite a high level (effectively the URLs requested from our JS web client).  It remains to be seen how well this will scale in the future...

Does anyone have recent experience with Chado denormalisation?
I can't see any obvious magic plugins for DBIx::Class.

It's also possible that we missed out a few indices when we made the natural diversity module - so I could try some tools to identify slow SQL queries (anyone recommend one to save me googling?)

I would give you the link to our site (you can find it if you really want to) but the webservice is on the opposite side of the Atlantic to the main web server so performance is currently massively hobbled by network latency.

We can live with the memcached solution for now, but I was just wondering what the options are further down the line (I am writing project renewal text...)



------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: chado performance/denormalising (nat-div or otherwise)

Siddhartha Basu
Hi Bob,
I am not sure if you might have got better responses, but here are few
of my thoughts ...

On Wed, 31 Jul 2013, Bob MacCallum wrote:

>    I see from some old-ish pages found by googling that flybase and others
>    have denormalised chado or pre-built HTML pages for performance reasons.
>    Also there is plenty about gbrowse being faster with denormalised data
>    stores (anything but Chado it seems).
>
>    My personal (temporary) solution to make Bio::Chado::Schema powered
>    queries usable for web service provision has been to use memcached to
>    cache the responses at quite a high level (effectively the URLs requested
>    from our JS web client).  It remains to be seen how well this will scale
>    in the future...

This is more or less the solution we currently have installed at
dictyBase for our public gene page. In our case the cache is implemented as a plugin to
Mojolicious webservice and it uses the CHI module to abstarct the
backend API.
* For our old gene page installation we are using File backend.
* For the gene page with newer genomes we are using memcache.
* The primary idea was to get our API(JSON response) to be around or
  less than 500ms
We follow few rule of thumbs to maintain the cache...
* Run a batchwise weekly cache warmer(weekend nights) that expires after 7
  days.
* Any updates on the gene page (by curators) invalidates the cache. The
  cache gets refreshed either by weekly run or if anybody visit the gene
  page next time. We probably need to automate this part.
* Only data content(current json) is cached, no static
  assets(js/image/css) is cached.

These more or less works reasonably well with us. Memcache offer more
scalability and CHI module offers flexibility. With CHI, it's extremely easy to
distribute/load balance/mirror cache to multiple backend without
changing a single line of code.



>
>    Does anyone have recent experience with Chado denormalisation?
>    I can't see any obvious magic plugins for DBIx::Class.
>
>    It's also possible that we missed out a few indices when we made the
>    natural diversity module - so I could try some tools to identify slow SQL
>    queries (anyone recommend one to save me googling?)

One idea would be turn on slow query logging(if not activated already) in postgresql.conf in your
server. Do play around with the minimal query
time(log_min_duration_statment in one of them) setup to tease out
the slow queries.


>
>    I would give you the link to our site (you can find it if you really want
>    to) but the webservice is on the opposite side of the Atlantic to the main
>    web server so performance is currently massively hobbled by network
>    latency.
>
>    We can live with the memcached solution for now, but I was just wondering
>    what the options are further down the line (I am writing project renewal
>    text...)

Here are few general suggestions...
* For faster query, delegate your data to intermine, should not be very difficult to do
  chado->intermine data transfer periodically. Then you could very easily use
  the intermine backend API for all of your searches. Intermine API is
  available in almost for most of the popular languages.

* Use a dedicated search engine such as ElasticSearch. Its extremely
  easy to build a search index from chado. ElasticSearch by default
  provide plenty of scalable options. And the perl client library is one
  of the best i have seen.

* For Pg backend, if possible, try out the streaming replication(>9.0). Have a
  separate and dedicated select and insert server(master/slave) and let
  pg take care of data syncing. For load balancing and connection pooling
  pgpoolII and pgbouncer are established and popular options.

I have tried these setups(except intermine) and would probably put them
in our staging server by the end of this fall/around winter. If things
works out as expected we should have it in our prod by next spring.

Hope this helps,
thanks,
-siddhartha


> ------------------------------------------------------------------------------
> Get your SQL database under version control now!
> Version control is standard for application code, but databases havent
> caught up. So what steps can you take to put your SQL databases under
> version control? Why should you start doing it? Read more to find out.
> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk

> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema