Re: Synchronous Log Shipping Replication - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Synchronous Log Shipping Replication |
Date | |
Msg-id | 1221030934.3913.575.camel@ebony.2ndQuadrant Whole thread Raw |
In response to | Re: Synchronous Log Shipping Replication (Hannu Krosing <hannu@krosing.net>) |
Responses |
Re: Synchronous Log Shipping Replication
Re: Synchronous Log Shipping Replication Re: Synchronous Log Shipping Replication |
List | pgsql-hackers |
On Wed, 2008-09-10 at 09:35 +0300, Hannu Krosing wrote: > On Wed, 2008-09-10 at 15:15 +0900, Fujii Masao wrote: > > On Wed, Sep 10, 2008 at 12:26 AM, Heikki Linnakangas > > <heikki.linnakangas@enterprisedb.com> wrote: > > > If a slave falls behind, how does it catch up? I guess you're saying that it > > > can't fall behind, because the master will block before that happens. Also > > > in asynchronous replication? And what about when the slave is first set up, > > > and needs to catch up with the master? > > > > The mechanism for the slave to catch up with the master should be > > provided on the outside of postgres. > > So you mean that we still need to do initial setup (copy backup files > and ship and replay WAL segments generated during copy) by external > WAL-shipping tools, like walmgr.py, and then at some point switch over > to internal WAL-shipping, when we are sure that we are within same WAL > file on both master and slave ? > > > I think that postgres should provide > > only WAL streaming, i.e. the master always sends *current* WAL data > > to the slave. > > > > Of course, the master has to send also the current WAL *file* in the > > initial sending just after the slave starts and connects with it. > > I think that it needs to send all WAL files which slave does not yet > have, as else the slave will have gaps. On busy system you will generate > several new WAL files in the time it takes to make master copy, transfer > it to slave and apply WAL files generated during initial setup. > > > Because, at the time, current WAL position might be in the middle of > > WAL file. Even if the master sends only current WAL data, the slave > > which don't have the corresponding WAL file can not handle it. > > I agree, that making initial copy may be outside the scope of > Synchronous Log Shipping Replication, but slave catching up by > requesting all missing WAL files and applying these up to a point when > it can switch to Sync mode should be in. Else we gain very little from > this patch. I agree with Hannu. Any working solution needs to work for all required phases. If you did it this way, you'd never catch up at all. When you first make the copy, it will be made at time X. The point of consistency will be sometime later and requires WAL data to make it consistent. So you would need to do a PITR to get it to the point of consistency. While you've been doing that, the primary server has moved on and now there is a gap between primary and standby. You *must* provide a facility to allow the standby to catch up with the primary. Only sending *current* WAL is not a solution, and not acceptable. So there must be mechanisms for sending past *and* current WAL data to the standby, and an exact and careful mechanism for switching between the two modes when the time is right. Replication is only synchronous *after* the change in mode. So the protocol needs to be something like: 1. Standby contacts primary and says it would like to catch up, but is currently at point X (which is a point at, or after the first consistent stopping point in WAL after standby has performed its own crash recovery, if any was required). 2. primary initiates data transfer of old data to standby, starting at point X 3. standby tells primary where it has got to periodically 4. at some point primary decides primary and standby are close enough that it can now begin streaming "current WAL" (which is always the WAL up to wal_buffers behind the the current WAL insertion point). Bear in mind that unless wal_buffers > 16MB the final catchup will *always* be less than one WAL file, so external file based mechanisms alone could never be enough. So you would need wal_buffers >= 2000 to make an external catch up facility even work at all. This also probably means that receipt of WAL data on the standby cannot be achieved by placing it in wal_buffers. So we probably need to write it directly to the WAL files, then rely on the filesystem cache on the standby to buffer the data for use by ReadRecord. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: