Re: Synchronous Log Shipping Replication - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Synchronous Log Shipping Replication
Date
Msg-id 3f0b79eb0809100157o5b7105ffo4371d8c0b3304051@mail.gmail.com
Whole thread Raw
In response to Re: Synchronous Log Shipping Replication  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Synchronous Log Shipping Replication  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Synchronous Log Shipping Replication  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Synchronous Log Shipping Replication  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Wed, Sep 10, 2008 at 4:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On Wed, 2008-09-10 at 09:35 +0300, Hannu Krosing wrote:
>> On Wed, 2008-09-10 at 15:15 +0900, Fujii Masao wrote:
>> > On Wed, Sep 10, 2008 at 12:26 AM, Heikki Linnakangas
>> > <heikki.linnakangas@enterprisedb.com> wrote:
>> > > If a slave falls behind, how does it catch up? I guess you're saying that it
>> > > can't fall behind, because the master will block before that happens. Also
>> > > in asynchronous replication? And what about when the slave is first set up,
>> > > and needs to catch up with the master?
>> >
>> > The mechanism for the slave to catch up with the master should be
>> > provided on the outside of postgres.
>>
>> So you mean that we still need to do initial setup (copy backup files
>> and ship and replay WAL segments generated during copy) by external
>> WAL-shipping tools, like walmgr.py, and then at some point switch over
>> to internal WAL-shipping, when we are sure that we are within same WAL
>> file on both master and slave ?
>>
>> > I think that postgres should provide
>> > only WAL streaming, i.e. the master always sends *current* WAL data
>> > to the slave.
>> >
>> > Of course, the master has to send also the current WAL *file* in the
>> > initial sending just after the slave starts and connects with it.
>>
>> I think that it needs to send all WAL files which slave does not yet
>> have, as else the slave will have gaps. On busy system you will generate
>> several new WAL files in the time it takes to make master copy, transfer
>> it to slave and apply WAL files generated during initial setup.
>>
>> > Because, at the time, current WAL position might be in the middle of
>> > WAL file. Even if the master sends only current WAL data, the slave
>> > which don't have the corresponding WAL file can not handle it.
>>
>> I agree, that making initial copy may be outside the scope of
>> Synchronous Log Shipping Replication, but slave catching up by
>> requesting all missing WAL files and applying these up to a point when
>> it can switch to Sync mode should be in. Else we gain very little from
>> this patch.
>
> I agree with Hannu.
>
> Any working solution needs to work for all required phases. If you did
> it this way, you'd never catch up at all.
>
> When you first make the copy, it will be made at time X. The point of
> consistency will be sometime later and requires WAL data to make it
> consistent. So you would need to do a PITR to get it to the point of
> consistency. While you've been doing that, the primary server has moved
> on and now there is a gap between primary and standby. You *must*
> provide a facility to allow the standby to catch up with the primary.
> Only sending *current* WAL is not a solution, and not acceptable.
>
> So there must be mechanisms for sending past *and* current WAL data to
> the standby, and an exact and careful mechanism for switching between
> the two modes when the time is right. Replication is only synchronous
> *after* the change in mode.
>
> So the protocol needs to be something like:
>
> 1. Standby contacts primary and says it would like to catch up, but is
> currently at point X (which is a point at, or after the first consistent
> stopping point in WAL after standby has performed its own crash
> recovery, if any was required).
> 2. primary initiates data transfer of old data to standby, starting at
> point X
> 3. standby tells primary where it has got to periodically
> 4. at some point primary decides primary and standby are close enough
> that it can now begin streaming "current WAL" (which is always the WAL
> up to wal_buffers behind the the current WAL insertion point).
>
> Bear in mind that unless wal_buffers > 16MB the final catchup will
> *always* be less than one WAL file, so external file based mechanisms
> alone could never be enough. So you would need wal_buffers >= 2000 to
> make an external catch up facility even work at all.
>
> This also probably means that receipt of WAL data on the standby cannot
> be achieved by placing it in wal_buffers. So we probably need to write
> it directly to the WAL files, then rely on the filesystem cache on the
> standby to buffer the data for use by ReadRecord.
>
> --
>  Simon Riggs           www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>

Umm.. I disagree with you ;)

Here is my initial setup sequence.

1) Start WAL receiver.    The current WAL file and subsequent ones will be transmitted by    WAL sender and WAL
receiver.This transmission will not block    the following operation for initial setup, and vice versa. That is,    the
slavecan catch up with the master without blocking the master.    I cannot accept that WAL sender is blocked for
initialsetup.
 

2) Copy the missing history files from the master to the slave.

3) Prepare recovery.conf on the slave.    You have to configure pg_standby and set recovery_target_timeline to
'latest'or the current TLI of the master.
 

4) Start postgres.    The startup process and pg_standby start archive recovery. If there    are missing WAL files,
pg_standbywaits for it and WAL replay is    suspended.
 

5) Copy the missing WAL files from the master and the slave.   Of course, we don't need to copy the WAL files which are
transmitted  by WAL sender and WAL receiver. Then, the recovery is resumed.
 

My sequence covers several cases :

* There is no missing WAL file.
* There is a lot of missing WAL file.
* There are missing history files. Failover always generates the gap of  history file because TLI is incremented when
archiverecovery is completed.
 
...

In your design, does not initial setup block the master?
Does your design cover above-mentioned case?

regards

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: Keeping creation time of objects
Next
From: Simon Riggs
Date:
Subject: Re: Synchronous Log Shipping Replication