Postgres replication of Chado?

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
On Wed, Apr 13, 2011 at 1:17 PM, Scott Cain <[hidden email]> wrote:
> I don't really have time right now to add a auto incrementing primary
> key and properly test the changes, so I created a bug report for it so
> I don't forget:
>
>  https://sourceforge.net/tracker/?group_id=27707&atid=391291

Great; thanks - though I believe it's necessary to do so for *all*
tables, so I've added the gff_meta table to that bug report too; hope
that's OK.

And like I said, I'm glad to test; let me know if that'd be of any help.

Thanks!

- Marc

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Marc Farnum Rendino
On Thu, Apr 14, 2011 at 6:00 AM, Marc Farnum Rendino <[hidden email]> wrote:
> Is it possible for me to [add a unique index] "retroactively" for my tables, with
> the current data in place?

To be clearer:

The SQL to do this seems fairly straightforward; my main concern is to
ensure that doing so won't cause problems elsewhere in Chado, GBrowse,
etc.

Thanks,

Marc

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Robert Buels
In reply to this post by Marc Farnum Rendino
On 04/14/2011 03:27 AM, Marc Farnum Rendino wrote:
> Great; thanks - though I believe it's necessary to do so for *all*
> tables, so I've added the gff_meta table to that bug report too; hope
> that's OK.

Bucardo doesn't have config to exclude certain table from replication?
That sounds hard to believe.

R

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Robert Buels
In reply to this post by Marc Farnum Rendino
On 04/14/2011 03:00 AM, Marc Farnum Rendino wrote:
> Is it possible for me to do this "retroactively" for my tables, with
> the current data in place?

Sure, just formulate the proper ALTER TABLE statement to add those two
constraints.

(I thought I just heard a little bird whistling outside, and it almost
sounded like it was singing 'chado needs a schema migration system'.  Huh.)

(And now somebody should reply to me with a parrot squawking 'patches
welcome'.)

Rob

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Robert Buels
In reply to this post by Marc Farnum Rendino
On 04/14/2011 06:38 AM, Marc Farnum Rendino wrote:
 > To be clearer:
 >
 > The SQL to do this seems fairly straightforward; my main concern is to
 > ensure that doing so won't cause problems elsewhere in Chado, GBrowse,
 > etc.

Oh.  Sorry I wrote that last email before I read this.  I doubt it would
cause problems.  Is there any application that's even using those tables
right now?  I don't know of any.

Rob

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Robert Buels
On Thu, Apr 14, 2011 at 12:02 PM, Robert Buels <[hidden email]> wrote:
> On 04/14/2011 03:00 AM, Marc Farnum Rendino wrote:
>> Is it possible for me to do this "retroactively" for my tables, with
>> the current data in place?
>
> Sure, just formulate the proper ALTER TABLE statement to add those two
> constraints.

And (just to make sure :) this won't break anything else?

> (I thought I just heard a little bird whistling outside, and it almost
> sounded like it was singing 'chado needs a schema migration system'.  Huh.)
>
> (And now somebody should reply to me with a parrot squawking 'patches
> welcome'.)

:)

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Robert Buels
On Thu, Apr 14, 2011 at 11:58 AM, Robert Buels <[hidden email]> wrote:
> Bucardo doesn't have config to exclude certain table from replication? That
> sounds hard to believe.

And indeed you are correct; thanks!

- Marc

(I'd missed the "-T" switch which does an exclude; interestingly, the
bucardo wiki, when you click on "-T", redirects to "-t" (lowercase)
which is an INclude. :)

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Scott Cain
In reply to this post by Marc Farnum Rendino
I don't have any concerns about GBrowse or other software stopping
working.  My first concern is that applying a unique constraint on the
tmp table is that I have no idea if you can count on that always being
true (which would cause a problem for the loader when it chokes on the
constraint).  If you add autoincrementing primary key columns, it
should be fine (and I don't think it will affect performance of the
tmp table much).

The thing slowing me down is that I have to modify the loader code
that interacts with these tables and then test that I did that right.
Not at all difficult, just moderately time consuming.

As for migration software: I agree, though it wouldn't really help in
these two instances since neither of these tables is really "part_of"
Chado, though it might make sense add them (or something like it in
the case of the meta table).

Scott


On Thu, Apr 14, 2011 at 12:08 PM, Marc Farnum Rendino <[hidden email]> wrote:

> On Thu, Apr 14, 2011 at 12:02 PM, Robert Buels <[hidden email]> wrote:
>> On 04/14/2011 03:00 AM, Marc Farnum Rendino wrote:
>>> Is it possible for me to do this "retroactively" for my tables, with
>>> the current data in place?
>>
>> Sure, just formulate the proper ALTER TABLE statement to add those two
>> constraints.
>
> And (just to make sure :) this won't break anything else?
>
>> (I thought I just heard a little bird whistling outside, and it almost
>> sounded like it was singing 'chado needs a schema migration system'.  Huh.)
>>
>> (And now somebody should reply to me with a parrot squawking 'patches
>> welcome'.)
>
> :)
>
> ------------------------------------------------------------------------------
> Benefiting from Server Virtualization: Beyond Initial Workload
> Consolidation -- Increasing the use of server virtualization is a top
> priority.Virtualization can reduce costs, simplify management, and improve
> application availability and disaster protection. Learn more about boosting
> the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Robert Buels
Oops, I need to keep it on-list.

On 04/14/2011 09:44 AM, Scott Cain wrote:
 > As for migration software: I agree, though it wouldn't really help in
 > these two instances since neither of these tables is really "part_of"
 > Chado, though it might make sense add them (or something like it in
 > the case of the meta table).

Oh, I was talking about the unique constraints.  No, it wouldn't help
with the tmp_ and meta table.

Rob

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Robert Buels
Great - I appreciate the opinion of someone more familiar with this than I. :)

On Thu, Apr 14, 2011 at 12:04 PM, Robert Buels <[hidden email]> wrote:
> Oh.  Sorry I wrote that last email before I read this.  I doubt it would
> cause problems.  Is there any application that's even using those tables
> right now?  I don't know of any.
>
> Rob

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Scott Cain
Scott -

Sounds good; and I appreciate the background.

For my purposes, adding an auto-incrementing column as a primary key
would be fine; it's just to satisfy the requirements of Bucardo (to
enable replicating rows rather than the entire table).

So if that makes your job any easier, that's fine by me. :)

Thanks,

Marc

On Thu, Apr 14, 2011 at 12:44 PM, Scott Cain <[hidden email]> wrote:

> I don't have any concerns about GBrowse or other software stopping
> working.  My first concern is that applying a unique constraint on the
> tmp table is that I have no idea if you can count on that always being
> true (which would cause a problem for the loader when it chokes on the
> constraint).  If you add autoincrementing primary key columns, it
> should be fine (and I don't think it will affect performance of the
> tmp table much).
>
> The thing slowing me down is that I have to modify the loader code
> that interacts with these tables and then test that I did that right.
> Not at all difficult, just moderately time consuming.
>
> As for migration software: I agree, though it wouldn't really help in
> these two instances since neither of these tables is really "part_of"
> Chado, though it might make sense add them (or something like it in
> the case of the meta table).
>
> Scott

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
In reply to this post by Scott Cain
On Wed, Apr 13, 2011 at 12:56 PM, Scott Cain <[hidden email]> wrote:
> gff_meta is a locking table to prevent more than one instance of the
> GFF3 bulk loader from loading data at the same time, and probably
> doesn't have to be moved

Is the bulk loader "atomic"; if it's in mid-load and the machine dies,
do I get partial results or simply start over?

Just trying to probe the "edge cases" of your "probably" above. :)

Thanks,

Marc

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Scott Cain
Hi Marc,

The loader is atomic, unless you specifically turn it off with a
command line option.  If it dies mid-load, the tmp table may be left
in an "unclean" state (because that is not updated atomically), and if
that happens, the next time the loader is run, you will need to add
the --recreate_cache option.  I say "may" because there is a die
handler in the loader that tries to clean up the table before dying,
but that won't always work (for instance, if someone pulls the power
plug during the load).  If you had a replicated database that didn't
have this table (because you explicitly told the replicator not to
copy it), the loader would automatically recognize this and do the
"recreate cache" step automatically.

Scott


On Mon, Apr 18, 2011 at 9:14 AM, Marc Farnum Rendino <[hidden email]> wrote:

> On Wed, Apr 13, 2011 at 12:56 PM, Scott Cain <[hidden email]> wrote:
>> gff_meta is a locking table to prevent more than one instance of the
>> GFF3 bulk loader from loading data at the same time, and probably
>> doesn't have to be moved
>
> Is the bulk loader "atomic"; if it's in mid-load and the machine dies,
> do I get partial results or simply start over?
>
> Just trying to probe the "edge cases" of your "probably" above. :)
>
> Thanks,
>
> Marc
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Marc Farnum Rendino
Perfect; thanks!

And thanks again for the background; helpful for my code to drive the
bulk_loader - is there any downside (other then a brief performance
hit) to always using the "--recreate_cache" option (to make sure the
bulk_loader doesn't fail for lack of it)?

On Mon, Apr 18, 2011 at 2:31 PM, Scott Cain <[hidden email]> wrote:

> The loader is atomic, unless you specifically turn it off with a
> command line option.  If it dies mid-load, the tmp table may be left
> in an "unclean" state (because that is not updated atomically), and if
> that happens, the next time the loader is run, you will need to add
> the --recreate_cache option.  I say "may" because there is a die
> handler in the loader that tries to clean up the table before dying,
> but that won't always work (for instance, if someone pulls the power
> plug during the load).  If you had a replicated database that didn't
> have this table (because you explicitly told the replicator not to
> copy it), the loader would automatically recognize this and do the
> "recreate cache" step automatically.
>
> Scott
>
>
> On Mon, Apr 18, 2011 at 9:14 AM, Marc Farnum Rendino <[hidden email]> wrote:
>> On Wed, Apr 13, 2011 at 12:56 PM, Scott Cain <[hidden email]> wrote:
>>> gff_meta is a locking table to prevent more than one instance of the
>>> GFF3 bulk loader from loading data at the same time, and probably
>>> doesn't have to be moved
>>
>> Is the bulk loader "atomic"; if it's in mid-load and the machine dies,
>> do I get partial results or simply start over?
>>
>> Just trying to probe the "edge cases" of your "probably" above. :)
>>
>> Thanks,
>>
>> Marc
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Postgres replication of Chado?

Scott Cain
Hi Marc,

Well, of course it depends: that "brief performance hit" can be fairly
long when there's lots of data.  Other than that, no, there isn't a
down side.

Scott

On Mon, Apr 18, 2011 at 4:29 PM, Marc Farnum Rendino <[hidden email]> wrote:

> Perfect; thanks!
>
> And thanks again for the background; helpful for my code to drive the
> bulk_loader - is there any downside (other then a brief performance
> hit) to always using the "--recreate_cache" option (to make sure the
> bulk_loader doesn't fail for lack of it)?
>
> On Mon, Apr 18, 2011 at 2:31 PM, Scott Cain <[hidden email]> wrote:
>> The loader is atomic, unless you specifically turn it off with a
>> command line option.  If it dies mid-load, the tmp table may be left
>> in an "unclean" state (because that is not updated atomically), and if
>> that happens, the next time the loader is run, you will need to add
>> the --recreate_cache option.  I say "may" because there is a die
>> handler in the loader that tries to clean up the table before dying,
>> but that won't always work (for instance, if someone pulls the power
>> plug during the load).  If you had a replicated database that didn't
>> have this table (because you explicitly told the replicator not to
>> copy it), the loader would automatically recognize this and do the
>> "recreate cache" step automatically.
>>
>> Scott
>>
>>
>> On Mon, Apr 18, 2011 at 9:14 AM, Marc Farnum Rendino <[hidden email]> wrote:
>>> On Wed, Apr 13, 2011 at 12:56 PM, Scott Cain <[hidden email]> wrote:
>>>> gff_meta is a locking table to prevent more than one instance of the
>>>> GFF3 bulk loader from loading data at the same time, and probably
>>>> doesn't have to be moved
>>>
>>> Is the bulk loader "atomic"; if it's in mid-load and the machine dies,
>>> do I get partial results or simply start over?
>>>
>>> Just trying to probe the "edge cases" of your "probably" above. :)
>>>
>>> Thanks,
>>>
>>> Marc
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
12