Postgres replication of Chado?

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Postgres replication of Chado?

Marc Farnum Rendino
Folks -

I'm setting up a community annotation system, using much of GMOD
including Chado (with Postgres), and one of the requirements is to
replicate it to another site, for failover purposes.

There seem to be a few possibilities of technologies / packages for
replication, however no clear choices. For instance "Warm Standby"
looks interesting
(<http://www.travishegner.com/2009/06/postgresql-83-warm-stand-by-replication.html>)
though requires too much manual intervention (for this project) to
fail back.

It's also not clear to me what the Postgres version requirements /
recommendations are for Chado (or other parts of GMOD), which would
play a large part in determining viable replication options. For
example, if Postgres 9 is OK, that opens the door for its new
streaming replication. Though the most recent mention I've found is
that 8.3 is probably OK.

BTW: I already have MySQL replication working for another part of the
project; is it still the recommendation to use Postgres vs. MySQL?

Might anyone be able to share some guidance?

Thanks,

Marc

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Daniel Quest
Hey Marc,

As far as I know, the recommended database is still postgres.  Some
members of the community also use oracle in production.  The version
currently on the GMOD image (and therefore the most tested) is
PostgreSQL 8.4.

I don't have a lot of practical experience with this problem as we
chose to solve it via hardware (e.g. we have a network attached
storage device that has the data for the database and can also serve
to multiple servers should one go down).  We have not planned for
earthquakes and other natural disasters at the millisecond level...
meaning if someone where to change the Chado instance and then an
earthquake hit destroying everything on the network attached storage,
we would have to go to our off-site backup solution and may loose
those edits.  One must always balance consistency, availability and
partition tolerance... our solution favors consistency and
availability.  I think many of the MODs are in this camp.  I see no
reason why you could not stream writes across servers in the way you
suggest in theory.  Hopefully someone else on the list has practical
experience with this.

Hope this is helpful
-Daniel


On Tue, Mar 22, 2011 at 8:15 AM, Marc Farnum Rendino <[hidden email]> wrote:

> Folks -
>
> I'm setting up a community annotation system, using much of GMOD
> including Chado (with Postgres), and one of the requirements is to
> replicate it to another site, for failover purposes.
>
> There seem to be a few possibilities of technologies / packages for
> replication, however no clear choices. For instance "Warm Standby"
> looks interesting
> (<http://www.travishegner.com/2009/06/postgresql-83-warm-stand-by-replication.html>)
> though requires too much manual intervention (for this project) to
> fail back.
>
> It's also not clear to me what the Postgres version requirements /
> recommendations are for Chado (or other parts of GMOD), which would
> play a large part in determining viable replication options. For
> example, if Postgres 9 is OK, that opens the door for its new
> streaming replication. Though the most recent mention I've found is
> that 8.3 is probably OK.
>
> BTW: I already have MySQL replication working for another part of the
> project; is it still the recommendation to use Postgres vs. MySQL?
>
> Might anyone be able to share some guidance?
>
> Thanks,
>
> Marc
>
> ------------------------------------------------------------------------------
> Enable your software for Intel(R) Active Management Technology to meet the
> growing manageability and security demands of your customers. Businesses
> are taking advantage of Intel(R) vPro (TM) technology - will your software
> be a part of the solution? Download the Intel(R) Manageability Checker
> today! http://p.sf.net/sfu/intel-dev2devmar
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Stephen Ficklin-2
In reply to this post by Marc Farnum Rendino
Hi Marc,

You may have already considered this, but one other thing to know is if
other tools you are using would be compatible with MySQL.  Could the
distributed annotation system you plan to implement use either database?

In the case of Tripal, we have not tested it with MySQL so I'm not sure how
it would perform.  Perhaps other tools have similar issues.

Stephen

-----Original Message-----
From: Marc Farnum Rendino [mailto:[hidden email]]
Sent: Tuesday, March 22, 2011 8:15 AM
To: [hidden email]
Subject: [Gmod-schema] Postgres replication of Chado?

Folks -

I'm setting up a community annotation system, using much of GMOD
including Chado (with Postgres), and one of the requirements is to
replicate it to another site, for failover purposes.

There seem to be a few possibilities of technologies / packages for
replication, however no clear choices. For instance "Warm Standby"
looks interesting
(<http://www.travishegner.com/2009/06/postgresql-83-warm-stand-by-replicatio
n.html>)
though requires too much manual intervention (for this project) to
fail back.

It's also not clear to me what the Postgres version requirements /
recommendations are for Chado (or other parts of GMOD), which would
play a large part in determining viable replication options. For
example, if Postgres 9 is OK, that opens the door for its new
streaming replication. Though the most recent mention I've found is
that 8.3 is probably OK.

BTW: I already have MySQL replication working for another part of the
project; is it still the recommendation to use Postgres vs. MySQL?

Might anyone be able to share some guidance?

Thanks,

Marc

----------------------------------------------------------------------------
--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Josh Goodman
In reply to this post by Marc Farnum Rendino
Hi Marc,

FlyBase runs 8.4.x in production and has had no problems in terms of Chado.  I don't know enough
about PostgreSQL 9 and the guts of Chado (triggers and such) to say whether Chado will work on it.
I would definitely favor giving it a try over MySQL replication.  Moving Chado to MySQL will cause
problems for just about every application that you use with Chado.  It isn't impossible but I would
not recommend it unless you are prepared to do some significant internal code edits and maintain
those edits over time.

Alternatively, if you run into problems with 9.x you can always give Slony a try (http://slony.info/).

Another package you might want to check out is called HAProxy (http://haproxy.1wt.eu/).  It is a
TCP/HTTP software load balancer that can proxy any kind of TCP traffic such as database connections.
 I use it to proxy Chado database traffic and you could easily configure it to handle a master/slave
database setup with automatic failover.

Cheers,
Josh

On 03/22/2011 08:15 AM, Marc Farnum Rendino wrote:

> Folks -
>
> I'm setting up a community annotation system, using much of GMOD
> including Chado (with Postgres), and one of the requirements is to
> replicate it to another site, for failover purposes.
>
> There seem to be a few possibilities of technologies / packages for
> replication, however no clear choices. For instance "Warm Standby"
> looks interesting
> (<http://www.travishegner.com/2009/06/postgresql-83-warm-stand-by-replication.html>)
> though requires too much manual intervention (for this project) to
> fail back.
>
> It's also not clear to me what the Postgres version requirements /
> recommendations are for Chado (or other parts of GMOD), which would
> play a large part in determining viable replication options. For
> example, if Postgres 9 is OK, that opens the door for its new
> streaming replication. Though the most recent mention I've found is
> that 8.3 is probably OK.
>
> BTW: I already have MySQL replication working for another part of the
> project; is it still the recommendation to use Postgres vs. MySQL?
>
> Might anyone be able to share some guidance?
>
> Thanks,
>
> Marc
>
> ------------------------------------------------------------------------------
> Enable your software for Intel(R) Active Management Technology to meet the
> growing manageability and security demands of your customers. Businesses
> are taking advantage of Intel(R) vPro (TM) technology - will your software
> be a part of the solution? Download the Intel(R) Manageability Checker
> today! http://p.sf.net/sfu/intel-dev2devmar
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Jonathan "Duke" Leto
In reply to this post by Marc Farnum Rendino
Howdy,

> It's also not clear to me what the Postgres version requirements /
> recommendations are for Chado (or other parts of GMOD), which would
> play a large part in determining viable replication options. For
> example, if Postgres 9 is OK, that opens the door for its new
> streaming replication. Though the most recent mention I've found is
> that 8.3 is probably OK.
>
> BTW: I already have MySQL replication working for another part of the
> project; is it still the recommendation to use Postgres vs. MySQL?
>
> Might anyone be able to share some guidance?

>From what I hear, replication in PostgreSQL 8.x was suboptimal, and
it has been redesigned in 9.x and people have good things to say.

I would personally use Postgres 9.x as a solution. If you run into
any issues, people on this mailing list will help resolve them.

One thing to keep in mind is that if you build your software on an older
version of Postgres, it will become end-of-lifed sooner than newer versions and
you will no longer get security patches/updates/etc...


Duke


--
Jonathan "Duke" Leto
[hidden email]
http://leto.net

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Daniel Quest
On Tue, Mar 22, 2011 at 9:47 AM, Daniel Quest <[hidden email]> wrote:
> As far as I know, the recommended database is still postgres.  Some
> members of the community also use oracle in production.  The version
> currently on the GMOD image (and therefore the most tested) is
> PostgreSQL 8.4.

Ah; good data point!

> I don't have a lot of practical experience with this problem as we
> chose to solve it via hardware

My *preferred* solution would be AWS. :)

Though that's not an option right now, so replication it is.

> I see no
> reason why you could not stream writes across servers in the way you
> suggest in theory.  Hopefully someone else on the list has practical
> experience with this.
>
> Hope this is helpful
> -Daniel

Indeed; thanks!

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Stephen Ficklin-2
On Tue, Mar 22, 2011 at 10:10 AM, Stephen Ficklin <[hidden email]> wrote:
> You may have already considered this, but one other thing to know is if
> other tools you are using would be compatible with MySQL.  Could the
> distributed annotation system you plan to implement use either database?
>
> In the case of Tripal, we have not tested it with MySQL so I'm not sure how
> it would perform.  Perhaps other tools have similar issues.

Indeed; this project has no requirements to use any particular DB
system - we have the luxury of building, from "scratch", a
free-standing system (as much as that's possible) - so I'd like to
start with what's best supported by each component and build from
there.

Thanks,

Marc

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Jonathan "Duke" Leto
On Tue, Mar 22, 2011 at 4:32 PM, Jonathan "Duke" Leto <[hidden email]> wrote:
> From what I hear, replication in PostgreSQL 8.x was suboptimal

Hmm; it seems a rather robust area - lots of add-on projects that have
been around for awhile, getting good real-world use.

> it has been redesigned in 9.x and people have good things to say.

Though, yes, the built-in replication abilities have certainly
expanded in Postgres 9.

> I would personally use Postgres 9.x as a solution. If you run into
> any issues, people on this mailing list will help resolve them.

Is anyone using Postgres 9 (with Chado) yet?

>From my research so far, it seems a bit "bleeding edge" for me. :)

> One thing to keep in mind is that if you build your software on an older
> version of Postgres, it will become end-of-lifed sooner than newer versions and
> you will no longer get security patches/updates/etc...

True - though this project is intended two serve two purposes: 1) the
annotation itself; 2) building up such capabilities for the group, for
other such projects. So we'll learn as we go, and makes
adjustments/updates.

Thanks,

Marc

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Josh Goodman
On Tue, Mar 22, 2011 at 10:11 AM, Josh Goodman <[hidden email]> wrote:
> Hi Marc,
>
> FlyBase runs 8.4.x in production

Ah; good to know.

> I would definitely favor giving it a try over MySQL replication...

Absolutely - I see moving to MySQL as a path only if other
requirements mandate it. (Which luckily is not the case for this
project.)

> Alternatively, if you run into problems with 9.x you can always give Slony a try (http://slony.info/).

I have looked at Slony, though for my design/requirements, I'm
favoring Bucardo right now; possibly rubyrep.

Does anyone have any recommendations about specific Postgres
replication tools such as the above?

> Another package you might want to check out is called HAProxy (http://haproxy.1wt.eu/).  It is a
> TCP/HTTP software load balancer that can proxy any kind of TCP traffic such as database connections.
>  I use it to proxy Chado database traffic and you could easily configure it to handle a master/slave
> database setup with automatic failover.

Interesting; I'll take a look - thanks!

- Marc

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
Wow; HAProxy does indeed look good - bookmarked for future reference.

It unfortunately doesn't fit this project though, since it'd be a
single point of failure - and we don't really have the luxury of
adding another machine/instance of the level of performance it'd
require (since all usage would flow through it).

FWIW, here's a bit more on our design:

A server (running GBrowse, Tripal, file services, etc.) at each of two
geographically-disparate institutions in our group (also spread out
geographically). The servers are configured in a failover/fail-back
config; if one becomes unavailable (crash, network outage, whatever)
the other takes over.

This is "easy" to do via a lightweight (micro) instance at AWS that
monitors both physical servers and simply updates DNS as needed; users
never notice which server they're using. DNS does introduce a small
potential for some lag when a switch occurs, though it's infrequent
and, in the worst case, 60 seconds. In this case, AWS is also a single
point of failure - though not only is it more highly-available than
any of us could build, an AWS failure only becomes an issue for our
system if the authoritative server also becomes unavailable at the
same time; overall an extremely minimal risk.

When both systems are available, they're replicating from the
"authoritative" server (the one that users are hitting) to the other.
There's also some potential lag here, but both servers are on
high-bandwidth, fairly reliable connections, so this too is a minimal
concern.

This is also the "not easy" part; even though it's simplified by (in
our design) by the fact that only one server is authoritative at a
time, the replication adds a good deal of complexity (thus increasing
risk).

Running the whole system via AWS would reduce many of these concerns -
and the added complexities to mitigate them - to near zero, though
that's unfortunately not an option at the moment.

On Wed, Mar 23, 2011 at 8:54 AM, Marc Farnum Rendino <[hidden email]> wrote:
> On Tue, Mar 22, 2011 at 10:11 AM, Josh Goodman <[hidden email]> wrote:
>> Another package you might want to check out is called HAProxy (http://haproxy.1wt.eu/)...
>
> Interesting; I'll take a look - thanks!

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Josh Goodman
Hi Marc,

I'm curious how the worst case of lag for a DNS update is 60 seconds?  Are all your users using the
same DNS server that is being updated?  FlyBase just did this a few weeks ago for our main server
and it took ~30 hours for DNS changes to be fully propagated globally.  I understood this to be the
norm.

To get around the single point of failure with HAProxy I use two machines and the VRRP features of
keepalived (http://keepalived.org/).  The two machines are setup in a master/slave configuration and
share a single public IP.  The master is initially assigned the IP but if the slave detects that the
master has failed it assumes the master role, issues a request for the IP, and automatically assigns
it to one of its existing network devices.

The system footprint of HAProxy is extremely small.  I rarely see it use more than 0.3% CPU and 2m
of memory.  That is with it handling all our HTTP, FTP, psql, and internal svn/wiki traffic.

Josh


On 03/23/2011 09:29 AM, Marc Farnum Rendino wrote:

> Wow; HAProxy does indeed look good - bookmarked for future reference.
>
> It unfortunately doesn't fit this project though, since it'd be a
> single point of failure - and we don't really have the luxury of
> adding another machine/instance of the level of performance it'd
> require (since all usage would flow through it).
>
> FWIW, here's a bit more on our design:
>
> A server (running GBrowse, Tripal, file services, etc.) at each of two
> geographically-disparate institutions in our group (also spread out
> geographically). The servers are configured in a failover/fail-back
> config; if one becomes unavailable (crash, network outage, whatever)
> the other takes over.
>
> This is "easy" to do via a lightweight (micro) instance at AWS that
> monitors both physical servers and simply updates DNS as needed; users
> never notice which server they're using. DNS does introduce a small
> potential for some lag when a switch occurs, though it's infrequent
> and, in the worst case, 60 seconds. In this case, AWS is also a single
> point of failure - though not only is it more highly-available than
> any of us could build, an AWS failure only becomes an issue for our
> system if the authoritative server also becomes unavailable at the
> same time; overall an extremely minimal risk.
>
> When both systems are available, they're replicating from the
> "authoritative" server (the one that users are hitting) to the other.
> There's also some potential lag here, but both servers are on
> high-bandwidth, fairly reliable connections, so this too is a minimal
> concern.
>
> This is also the "not easy" part; even though it's simplified by (in
> our design) by the fact that only one server is authoritative at a
> time, the replication adds a good deal of complexity (thus increasing
> risk).
>
> Running the whole system via AWS would reduce many of these concerns -
> and the added complexities to mitigate them - to near zero, though
> that's unfortunately not an option at the moment.
>
> On Wed, Mar 23, 2011 at 8:54 AM, Marc Farnum Rendino <[hidden email]> wrote:
>> On Tue, Mar 22, 2011 at 10:11 AM, Josh Goodman <[hidden email]> wrote:
>>> Another package you might want to check out is called HAProxy (http://haproxy.1wt.eu/)...
>>
>> Interesting; I'll take a look - thanks!

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
On Wed, Mar 23, 2011 at 11:23 AM, Josh Goodman <[hidden email]> wrote:
> I'm curious how the worst case of lag for a DNS update is 60 seconds?

Ah; I could have been more specific: We're using DNS to switch between
servers; when one server becomes unavailable, DNS is updated (just the
A record) so that requests for our domain name now resolve to the IP
address of the server that is available. The potential lag comes from
the fact that the shortest DNS TTL (time to live) is 60 seconds.

Example scenario:
- User makes a request of our service & it's handled by server A.
- The IP address for our domain name is then cached on their machine -
for 60 seconds.
- Server A becomes unavailable.
- Our monitoring system notices and switches the IP address to server B.
- User makes another request immediately after the first.
- However, since less than 60 seconds have gone by, the cached IP
address is used.
- 60 seconds elapse & their request now nets them the new IP address
(and a working server :)

Also: A short TTL (60 seconds) is, as with everything, a tradeoff - it
means that changes propagate as quickly as possible (unless a server
somewhere is forcing a longer TTL; possible) AND it means a higher
load on DNS servers plus a small (ms) delay for the user, every 60
seconds, as it must re-resolve the domain name to an IP address again.

So some small caveats, though the cost/benefit is reasonable (for us anyway).

> Are all your users using the
> same DNS server that is being updated?  FlyBase just did this a few weeks ago for our main server
> and it took ~30 hours for DNS changes to be fully propagated globally.  I understood this to be the
> norm.

If I understand correctly, that amount of time would be due to either:
a) really long TTLs, or b) making more involved changes *at the
registrar*, like where the name servers live.

> To get around the single point of failure with HAProxy I use two machines and the VRRP features of
> keepalived...
>
> The system footprint of HAProxy is extremely small.  I rarely see it use more than 0.3% CPU and 2m
> of memory.  That is with it handling all our HTTP, FTP, psql, and internal svn/wiki traffic.

Ah; thanks for the further detail - very interesting.

Are you doing any TLS/SSL - such as for authentication/authorization?

Thanks,

Marc

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Marc Farnum Rendino
On Wed, Mar 23, 2011 at 8:54 AM, Marc Farnum Rendino <[hidden email]> wrote:
> I have looked at Slony, though for my design/requirements, I'm
> favoring Bucardo right now; possibly rubyrep.

Hmm; I got down to the nitty-gritty with Bucardo, which does look good - except:

"fullcopy is a type of Bucardo sync that copies an entire table...
This is the only sync that can be used for tables that have no primary
key and no unique index."

And it appears that there are indeed tables in Chado that lack both a
primary key and a unique index - correct?

Could I perhaps add a column (ex: a simple increment of a unique
index), that Postgres would maintain without getting in the way of
Chado/GBrowse/etc.?

Any advice appreciated!

Thanks,

Marc

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Robert Buels
Hi Marc,

Could you provide an example of a table that has both no primary key and
no unique index?

Rob


------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
Sorry; I should have said - it looks like the following tables have
neither a private key nor a unique index:

gencode_codon_aa
gencode_startcodon
gff_meta
tmp_gff_load_cache

Please tell me if I'm reading the table definition incorrectly though.

Thanks,

Marc

On Tue, Apr 12, 2011 at 8:01 PM, Robert Buels <[hidden email]> wrote:
> Hi Marc,
>
> Could you provide an example of a table that has both no primary key and
> no unique index?
>
> Rob

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Robert Buels
On 04/13/2011 06:22 AM, Marc Farnum Rendino wrote:
> gencode_codon_aa

I added a unique constraint, thanks for pointing that out!  Table DDL is
now:

CREATE TABLE gencode_codon_aa (
         gencode_id      INTEGER NOT NULL REFERENCES gencode(gencode_id),
         codon           CHAR(3) NOT NULL,
         aa              CHAR(1) NOT NULL,
         CONSTRAINT gencode_codon_unique UNIQUE( gencode_id, codon )
);

> gencode_startcodon

Ditto.  New table DDL:

CREATE TABLE gencode_startcodon (
         gencode_id      INTEGER NOT NULL REFERENCES gencode(gencode_id),
         codon           CHAR(3),
         CONSTRAINT gencode_startcodon_unique UNIQUE( gencode_id, codon )
);


> gff_meta
> tmp_gff_load_cache

I'm pretty sure these are caching and locking tables used by Scott's
GFF3 bulk loader, but Scott will have to confirm.  I doubt you need to
keep them in the replication though.

Rob

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Scott Cain
Hi Rob and Marc,

gff_meta is a locking table to prevent more than one instance of the
GFF3 bulk loader from loading data at the same time, and probably
doesn't have to be moved (hopefully, you won't be loading the slave
and master separately at the same time!)

tmp_gff_load_cache is a little more permanent than the name implies,
unfortunately.  It keeps track of all of the feature names and ids
that have been loaded, to allow the GFF3 bulk loader to make sure that
it's not violating a unique constraint when generating the bulk
loading files.  While it can be recreated by the bulk loader, when the
database gets large, recreating it can take a long time, so keeping it
around is not a bad idea.  I'll look at adding an index; this table
gets created by the bulk loader and is not part of the "official"
chado schema.

Scott


On Wed, Apr 13, 2011 at 12:00 PM, Robert Buels <[hidden email]> wrote:

> On 04/13/2011 06:22 AM, Marc Farnum Rendino wrote:
>> gencode_codon_aa
>
> I added a unique constraint, thanks for pointing that out!  Table DDL is
> now:
>
> CREATE TABLE gencode_codon_aa (
>         gencode_id      INTEGER NOT NULL REFERENCES gencode(gencode_id),
>         codon           CHAR(3) NOT NULL,
>         aa              CHAR(1) NOT NULL,
>         CONSTRAINT gencode_codon_unique UNIQUE( gencode_id, codon )
> );
>
>> gencode_startcodon
>
> Ditto.  New table DDL:
>
> CREATE TABLE gencode_startcodon (
>         gencode_id      INTEGER NOT NULL REFERENCES gencode(gencode_id),
>         codon           CHAR(3),
>         CONSTRAINT gencode_startcodon_unique UNIQUE( gencode_id, codon )
> );
>
>
>> gff_meta
>> tmp_gff_load_cache
>
> I'm pretty sure these are caching and locking tables used by Scott's
> GFF3 bulk loader, but Scott will have to confirm.  I doubt you need to
> keep them in the replication though.
>
> Rob
>
> ------------------------------------------------------------------------------
> Forrester Wave Report - Recovery time is now measured in hours and minutes
> not days. Key insights are discussed in the 2010 Forrester Wave Report as
> part of an in-depth evaluation of disaster recovery service providers.
> Forrester found the best-in-class provider in terms of services and vision.
> Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Scott Cain
I don't really have time right now to add a auto incrementing primary
key and properly test the changes, so I created a bug report for it so
I don't forget:

  https://sourceforge.net/tracker/?group_id=27707&atid=391291

Scott


On Wed, Apr 13, 2011 at 12:56 PM, Scott Cain <[hidden email]> wrote:

> Hi Rob and Marc,
>
> gff_meta is a locking table to prevent more than one instance of the
> GFF3 bulk loader from loading data at the same time, and probably
> doesn't have to be moved (hopefully, you won't be loading the slave
> and master separately at the same time!)
>
> tmp_gff_load_cache is a little more permanent than the name implies,
> unfortunately.  It keeps track of all of the feature names and ids
> that have been loaded, to allow the GFF3 bulk loader to make sure that
> it's not violating a unique constraint when generating the bulk
> loading files.  While it can be recreated by the bulk loader, when the
> database gets large, recreating it can take a long time, so keeping it
> around is not a bad idea.  I'll look at adding an index; this table
> gets created by the bulk loader and is not part of the "official"
> chado schema.
>
> Scott
>
>
> On Wed, Apr 13, 2011 at 12:00 PM, Robert Buels <[hidden email]> wrote:
>> On 04/13/2011 06:22 AM, Marc Farnum Rendino wrote:
>>> gencode_codon_aa
>>
>> I added a unique constraint, thanks for pointing that out!  Table DDL is
>> now:
>>
>> CREATE TABLE gencode_codon_aa (
>>         gencode_id      INTEGER NOT NULL REFERENCES gencode(gencode_id),
>>         codon           CHAR(3) NOT NULL,
>>         aa              CHAR(1) NOT NULL,
>>         CONSTRAINT gencode_codon_unique UNIQUE( gencode_id, codon )
>> );
>>
>>> gencode_startcodon
>>
>> Ditto.  New table DDL:
>>
>> CREATE TABLE gencode_startcodon (
>>         gencode_id      INTEGER NOT NULL REFERENCES gencode(gencode_id),
>>         codon           CHAR(3),
>>         CONSTRAINT gencode_startcodon_unique UNIQUE( gencode_id, codon )
>> );
>>
>>
>>> gff_meta
>>> tmp_gff_load_cache
>>
>> I'm pretty sure these are caching and locking tables used by Scott's
>> GFF3 bulk loader, but Scott will have to confirm.  I doubt you need to
>> keep them in the replication though.
>>
>> Rob
>>
>> ------------------------------------------------------------------------------
>> Forrester Wave Report - Recovery time is now measured in hours and minutes
>> not days. Key insights are discussed in the 2010 Forrester Wave Report as
>> part of an in-depth evaluation of disaster recovery service providers.
>> Forrester found the best-in-class provider in terms of services and vision.
>> Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Robert Buels
On Wed, Apr 13, 2011 at 12:00 PM, Robert Buels <[hidden email]> wrote:
> I added a unique constraint, thanks for pointing that out!  Table DDL is
> now...

Wow; that's service! :)

Is it possible for me to do this "retroactively" for my tables, with
the current data in place?

>> gff_meta
>> tmp_gff_load_cache
>
> I'm pretty sure these are caching and locking tables used by Scott's GFF3
> bulk loader, but Scott will have to confirm.  I doubt you need to keep them
> in the replication though.

Ah; just what I was thinking - and I see Scott has replied, so I'll
continue over there.

Thanks!

- Marc

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Scott Cain
On Wed, Apr 13, 2011 at 12:56 PM, Scott Cain <[hidden email]> wrote:
> gff_meta is a locking table to prevent more than one instance of the
> GFF3 bulk loader from loading data at the same time

Ah; nice - although I built in safeguards to my GFF3-loading code, so
that I make sure to call the bulk loader (or validator) only one at a
time, and otherwise queue up to retry.

On the other hand, I see that Tripal plans to build in GFF3 loading
soon, so that may obviate my code. :)

> (hopefully, you won't be loading the slave
> and master separately at the same time!)

Right. :)

Part of my system design is that only one will be in active use by
participants at any one time, so I don't have to worry about all the
"interesting" cases at the DB/table level.

> tmp_gff_load_cache is a little more permanent than the name implies...

Ah; OK - good to know.

> I'll look at adding an index; this table
> gets created by the bulk loader and is not part of the "official"
> chado schema.

I appreciate that - although, since I believe it's necessary, for
replication, to specify either "all tables" or each specific table
(vs. "all tables except..."), this has some "urgency" for me - what
about me simply adding a unique index to both my tables, with my data
in place?

I'm glad to be the "Guinea pig" here and report the results - though
DB is not my native tongue, so I could use some help setting up good
test cases to confirm it works well in the general case vs. just mine.

Thanks!

- Marc

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
12