Re: Synchronous Log Shipping Replication - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Synchronous Log Shipping Replication
Date
Msg-id 1221030934.3913.575.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: Synchronous Log Shipping Replication  (Hannu Krosing <hannu@krosing.net>)
Responses Re: Synchronous Log Shipping Replication  (Markus Wanner <markus@bluegap.ch>)
Re: Synchronous Log Shipping Replication  (Hannu Krosing <hannu@2ndQuadrant.com>)
Re: Synchronous Log Shipping Replication  ("Fujii Masao" <masao.fujii@gmail.com>)
List pgsql-hackers
On Wed, 2008-09-10 at 09:35 +0300, Hannu Krosing wrote:
> On Wed, 2008-09-10 at 15:15 +0900, Fujii Masao wrote:
> > On Wed, Sep 10, 2008 at 12:26 AM, Heikki Linnakangas
> > <heikki.linnakangas@enterprisedb.com> wrote:
> > > If a slave falls behind, how does it catch up? I guess you're saying that it
> > > can't fall behind, because the master will block before that happens. Also
> > > in asynchronous replication? And what about when the slave is first set up,
> > > and needs to catch up with the master?
> > 
> > The mechanism for the slave to catch up with the master should be
> > provided on the outside of postgres. 
> 
> So you mean that we still need to do initial setup (copy backup files
> and ship and replay WAL segments generated during copy) by external
> WAL-shipping tools, like walmgr.py, and then at some point switch over
> to internal WAL-shipping, when we are sure that we are within same WAL
> file on both master and slave ?
> 
> > I think that postgres should provide
> > only WAL streaming, i.e. the master always sends *current* WAL data
> > to the slave.
> >
> > Of course, the master has to send also the current WAL *file* in the
> > initial sending just after the slave starts and connects with it.
> 
> I think that it needs to send all WAL files which slave does not yet
> have, as else the slave will have gaps. On busy system you will generate
> several new WAL files in the time it takes to make master copy, transfer
> it to slave and apply WAL files generated during initial setup.
> 
> > Because, at the time, current WAL position might be in the middle of
> > WAL file. Even if the master sends only current WAL data, the slave
> > which don't have the corresponding WAL file can not handle it.
> 
> I agree, that making initial copy may be outside the scope of
> Synchronous Log Shipping Replication, but slave catching up by
> requesting all missing WAL files and applying these up to a point when
> it can switch to Sync mode should be in. Else we gain very little from
> this patch.

I agree with Hannu.

Any working solution needs to work for all required phases. If you did
it this way, you'd never catch up at all.

When you first make the copy, it will be made at time X. The point of
consistency will be sometime later and requires WAL data to make it
consistent. So you would need to do a PITR to get it to the point of
consistency. While you've been doing that, the primary server has moved
on and now there is a gap between primary and standby. You *must*
provide a facility to allow the standby to catch up with the primary.
Only sending *current* WAL is not a solution, and not acceptable.

So there must be mechanisms for sending past *and* current WAL data to
the standby, and an exact and careful mechanism for switching between
the two modes when the time is right. Replication is only synchronous
*after* the change in mode.

So the protocol needs to be something like:

1. Standby contacts primary and says it would like to catch up, but is
currently at point X (which is a point at, or after the first consistent
stopping point in WAL after standby has performed its own crash
recovery, if any was required).
2. primary initiates data transfer of old data to standby, starting at
point X
3. standby tells primary where it has got to periodically
4. at some point primary decides primary and standby are close enough
that it can now begin streaming "current WAL" (which is always the WAL
up to wal_buffers behind the the current WAL insertion point).

Bear in mind that unless wal_buffers > 16MB the final catchup will
*always* be less than one WAL file, so external file based mechanisms
alone could never be enough. So you would need wal_buffers >= 2000 to
make an external catch up facility even work at all.

This also probably means that receipt of WAL data on the standby cannot
be achieved by placing it in wal_buffers. So we probably need to write
it directly to the WAL files, then rely on the filesystem cache on the
standby to buffer the data for use by ReadRecord.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: "Pavan Deolasee"
Date:
Subject: Re: Synchronous Log Shipping Replication
Next
From: Simon Riggs
Date:
Subject: Re: Synchronous Log Shipping Replication