Re: Synchronous Log Shipping Replication - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Synchronous Log Shipping Replication
Date
Msg-id 1221037034.3913.621.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: Synchronous Log Shipping Replication  (Markus Wanner <markus@bluegap.ch>)
Responses Re: Synchronous Log Shipping Replication  (Markus Wanner <markus@bluegap.ch>)
List pgsql-hackers
On Wed, 2008-09-10 at 10:06 +0200, Markus Wanner wrote:
> Hi,
> 
> Simon Riggs wrote:
> > 1. Standby contacts primary and says it would like to catch up, but is
> > currently at point X (which is a point at, or after the first consistent
> > stopping point in WAL after standby has performed its own crash
> > recovery, if any was required).
> > 2. primary initiates data transfer of old data to standby, starting at
> > point X
> > 3. standby tells primary where it has got to periodically
> > 4. at some point primary decides primary and standby are close enough
> > that it can now begin streaming "current WAL" (which is always the WAL
> > up to wal_buffers behind the the current WAL insertion point).
> 
> Hm.. wouldn't it be simpler, to start streaming right away and "cache" 

The standby server won't come up until you have:
* copied the base backup
* sent it to standby server
* bring up standby, have it realise it is a replication partner and
begin requesting WAL from primary (in some way)

There will be a gap (probably) between the initial WAL files and the
current tail of wal_buffers by the time all of the above has happened.
We will then need to copy more WAL across until we get to a point where
the most recent WAL record available on standby is ahead of the tail of
wal_buffers on primary so that streaming can start.

If we start caching WAL right away we would need to have two receivers.
One to receive the missing WAL data and one to receive the current WAL
data. We can't apply the WAL until we have the earlier missing WAL data,
so cacheing it seems difficult. On a large server this might be GBs of
data. Seems easier to not cache current WAL and to have just a single
WALReceiver process that performs a mode change once it has caught up.
(And I should say "if it catches up", since it is possible that it never
actually will catch up, in practical terms, since this depends upon the
relative power of the servers involved.). So there's no need to store
more WAL on standby than is required to restart recovery from last
restartpoint. i.e. we stream WAL at all times, not just in normal
running mode.

Seems easiest to have:
* Startup process only reads next WAL record when the ReceivedLogPtr >
ReadRecPtr, so it knows nothing of how WAL is received. Startup process
reads directly from WAL files in *all* cases. ReceivedLogPtr is in
shared memory and accessed via spinlock. Startup process only ever reads
this pointer. (Notice that Startup process is modeless).
* WALReceiver reads data from primary and writes it to WAL files,
fsyncing (if ever requested to do so). WALReceiver updates
ReceivedLogPtr.

That is much simpler and more modular. Buffering of the WAL files is
handled by filesystem buffering.

If standby crashes, all data is safely written to WAL files and we
restart from correct place.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: "Fujii Masao"
Date:
Subject: Re: Synchronous Log Shipping Replication
Next
From: Simon Riggs
Date:
Subject: Re: Synchronous Log Shipping Replication