Re: Batch replication ordering (was Re: [GENERAL] 32/64-bit - Mailing list pgsql-general

From Ed L.
Subject Re: Batch replication ordering (was Re: [GENERAL] 32/64-bit
Date
Msg-id 200304121906.11492.pgsql@bluepolka.net
Whole thread Raw
In response to Re: Batch replication ordering (was Re: [GENERAL] 32/64-bit  ("Jim C. Nasby" <jim@nasby.net>)
List pgsql-general
On Saturday April 12 2003 9:55, Jim C. Nasby wrote:
> On Fri, Apr 11, 2003 at 02:18:41PM -0600, Ed L. wrote:
> > Suppose further you wish to limit the number of updates replicated in a
> > particular cycle.  For example, suppose you have a million changes
> > queued for replication on the master, but for obvious reasons you don't
> > want to select a million rows to replicate all at once.  Suppose you
> > also don't want to grab them one tuple or one transaction at a time,
> > preferring to avoid hammering the master.  Rather, you want to grab
> > them in batches of no more than N transactions, replicate them all to a
> > slave and commit on the slave, take a breather, repeating until the
> > slave is caught up.  And during each breather, you want to have
> > committed only complete transactions so that any slave clients see
> > consistent data.
>
> I know I'm a bit ignorant to how dbmirror works, but why do the select
> from the master and the actual commits on the slave(s) have to occupy a
> single transaction? I would expect that there was some kind of
> book-keeping tables on both the master and the slaves to keep track of
> how far along mirroring/replication was; the master would need to know
> what it could remove out of the replication table (table Q as it's
> called elsewhere in this thread), and the slave needs to know where it
> left off. If this is the case, isn't it acceptable to select a chunk of
> data out of the queue table on the master, and run through it on the
> slave, with the slave committing as it pleases? Of course every time the
> slave commits it will have to update the book-keeping tables, but that
> seems to be a given... or is this exactly what you're worried about
> hammering the master with?

You are correct, IMO.  Master transaction and slave transaction do not and
cannot occupy the same transaction (since they're on different dbs).  They
may be independent, AFAIC, as long as the ordering is "close enough".
Slave may commit every 5 transactions while replaying a batch of 1000
transactions.  If there is slave and master bookkeeping, that seems to work
ok.

My interest in batching is that if you have many, many transactions in the
queue, it's a bit of hammering and traffic to go back and update the
bookkeeping tables and get the next transaction if you do it on every
transaction.  So mostly, its just seeing if I can't minimize the load on
the master and traffic on the net.

Ed


pgsql-general by date:

Previous
From: "Ed L."
Date:
Subject: Re: Batch replication ordering (was Re: [GENERAL] 32/64-bit
Next
From: "Ed L."
Date:
Subject: Re: Batch replication ordering (was Re: [GENERAL] 32/64-bit