Re: Synchronous Log Shipping Replication - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Synchronous Log Shipping Replication |
Date | |
Msg-id | 1221037034.3913.621.camel@ebony.2ndQuadrant Whole thread Raw |
In response to | Re: Synchronous Log Shipping Replication (Markus Wanner <markus@bluegap.ch>) |
Responses |
Re: Synchronous Log Shipping Replication
|
List | pgsql-hackers |
On Wed, 2008-09-10 at 10:06 +0200, Markus Wanner wrote: > Hi, > > Simon Riggs wrote: > > 1. Standby contacts primary and says it would like to catch up, but is > > currently at point X (which is a point at, or after the first consistent > > stopping point in WAL after standby has performed its own crash > > recovery, if any was required). > > 2. primary initiates data transfer of old data to standby, starting at > > point X > > 3. standby tells primary where it has got to periodically > > 4. at some point primary decides primary and standby are close enough > > that it can now begin streaming "current WAL" (which is always the WAL > > up to wal_buffers behind the the current WAL insertion point). > > Hm.. wouldn't it be simpler, to start streaming right away and "cache" The standby server won't come up until you have: * copied the base backup * sent it to standby server * bring up standby, have it realise it is a replication partner and begin requesting WAL from primary (in some way) There will be a gap (probably) between the initial WAL files and the current tail of wal_buffers by the time all of the above has happened. We will then need to copy more WAL across until we get to a point where the most recent WAL record available on standby is ahead of the tail of wal_buffers on primary so that streaming can start. If we start caching WAL right away we would need to have two receivers. One to receive the missing WAL data and one to receive the current WAL data. We can't apply the WAL until we have the earlier missing WAL data, so cacheing it seems difficult. On a large server this might be GBs of data. Seems easier to not cache current WAL and to have just a single WALReceiver process that performs a mode change once it has caught up. (And I should say "if it catches up", since it is possible that it never actually will catch up, in practical terms, since this depends upon the relative power of the servers involved.). So there's no need to store more WAL on standby than is required to restart recovery from last restartpoint. i.e. we stream WAL at all times, not just in normal running mode. Seems easiest to have: * Startup process only reads next WAL record when the ReceivedLogPtr > ReadRecPtr, so it knows nothing of how WAL is received. Startup process reads directly from WAL files in *all* cases. ReceivedLogPtr is in shared memory and accessed via spinlock. Startup process only ever reads this pointer. (Notice that Startup process is modeless). * WALReceiver reads data from primary and writes it to WAL files, fsyncing (if ever requested to do so). WALReceiver updates ReceivedLogPtr. That is much simpler and more modular. Buffering of the WAL files is handled by filesystem buffering. If standby crashes, all data is safely written to WAL files and we restart from correct place. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: