Re: Batch replication ordering (was Re: [GENERAL] 32/64-bit - Mailing list pgsql-general

From Ed L.
Subject Re: Batch replication ordering (was Re: [GENERAL] 32/64-bit
Date
Msg-id 200304111418.41056.pgsql@bluepolka.net
Whole thread Raw
In response to Re: Batch replication ordering (was Re: [GENERAL] 32/64-bit  (Jan Wieck <JanWieck@Yahoo.com>)
Responses Re: Batch replication ordering (was Re: [GENERAL] 32/64-bit  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Re: Batch replication ordering (was Re: [GENERAL] 32/64-bit  ("Jim C. Nasby" <jim@nasby.net>)
Re: Batch replication ordering (was Re: [GENERAL] 32/64-bit  (Steven Singer <ssinger@navtechinc.com>)
List pgsql-general
On Friday April 11 2003 1:07, Jan Wieck wrote:
> "Ed L." wrote:
> > On Friday April 11 2003 10:08, Jan Wieck wrote:
> > > Clearly a bug, but we had memory leaks that clear up at transaction
> > > end.
> >
> > That seems like yet another reason for constraining the size of a batch
> > of transactions.
>
> What I cannot imagine is why one would want to try to make batches any
> other size than the original transaction.

Again, if there are a large number of transactions queued on the master,
possibly constituted by an even much larger number of actual row changes, a
batch size of 1 transaction means, in this context, the master will be hit
once for every single transaction to retrieve the queued rows corresponding
to that transaction.  A batchsize of, say 1000 transactions, on the other
hand, will save 999 queries.  Granted, you may not share my concern about
the impact of that on master or network.

> "the original transaction" - singular!!! Not a couple, few, maybe some,
> part, fraction or anything in between, above or below. Exactly ONE.

Repeated exclamation points, pedanticism, and all caps shouting duly noted.

Now, suppose there are in fact memory leaks that only clear up at
transaction end.  Suppose the memory leak increases with the size of the
transaction (which nobody asserted but I assume may be true).  Suppose one
wishes to avoid a per-transaction hit on the master during replication.
Suppose one needs to process a replication queue in batches due to an
overly large replication queue.  Then, I claim, one may quite reasonably
wish to retrieve replication tuples from the master in batches containing
rows constituting more than one transaction.  All complete transactions,
yes, but more than one.

> If you have a deferred referential integrity constraint defined (one of
> the reasons why half of a transaction cannot work at all), where does
> the backend remember the ctid's and other information for the triggers
> to call at commit time?

Again, I have no interest in partial transactions, only groups of multiple
transactions.

===

For those of you who do see the validity in batching multiple transactions,
my question is restated here:

My question:  Is there an ordering algorithm that would make a consistent
but limited batchsize replication possible?  I propose one below.

Suppose you have a replication queue on the master, with dbmirror-like
trigger-based insertions, that looks something like this:

        create table replication_queue (
                xid     integer,
                seqid   serial primary key,
                data...
        );

Here, 'xid' is the transaction ID, 'seqid' is the queue insertion order, and
'data' has all the info needed to replicate the update.  Every row
update/delete/inserte results in a row in this queue, and a transaction may
consist of one to many rows in the queue.

Suppose further you wish to limit the number of updates replicated in a
particular cycle.  For example, suppose you have a million changes queued
for replication on the master, but for obvious reasons you don't want to
select a million rows to replicate all at once.  Suppose you also don't
want to grab them one tuple or one transaction at a time, preferring to
avoid hammering the master.  Rather, you want to grab them in batches of no
more than N transactions, replicate them all to a slave and commit on the
slave, take a breather, repeating until the slave is caught up.  And during
each breather, you want to have committed only complete transactions so
that any slave clients see consistent data.

The algorithm I'm considering right now is the following:

        select xid, max(seqid) as "max_seqid"
        into temp replication_order
        from replication_queue
        group by xid
        order by max(seqid)
        limit N;

Then, to get the actual queue replication order,

        select q.xid, q.seqid, q.data
        from replication_queue q, replication_order o
        where q.xid = o.xid
        order by o.max_seqid, q.seqid;

[This is a batched variation of dbmirror's original algorithm.]

So, replication is done by transaction groupings, in ascending order
according to the maximum seqid in each transaction.  I'm hoping someone can
poke holes in this algorithm if they exist.

Ed


pgsql-general by date:

Previous
From: "scott.marlowe"
Date:
Subject: Re: Questions regarding PostgreSQL
Next
From: "scott.marlowe"
Date:
Subject: Re: medical image on postgreSQL?