Streaming Rep - 2-phase backups and reducing time to full replication - Mailing list pgsql-hackers

From Simon Riggs
Subject Streaming Rep - 2-phase backups and reducing time to full replication
Date
Msg-id 1261508080.7442.7129.camel@ebony
Whole thread Raw
Responses Re: Streaming Rep - 2-phase backups and reducing time to full replication
Re: Streaming Rep - 2-phase backups and reducing time to full replication
List pgsql-hackers
Some ideas to improve current behaviour of SR

http://wiki.postgresql.org/wiki/Streaming_Replication

The current startup process is copied below. (7) gives some issues if it
is a very long step, notably that the master may fill with data and then
break off the connection before replication is fully configured.

7. Make a base backup of the primary server, load this data onto the
standby.

8. Set up XLOG archiving, connections and authentication in the standby
server like the primary, so that the standby might work as a primary
after failover.

9. Create a recovery command file in the standby server; the following
parameters are required for streaming replication.

10. Start postgres in the standby server. It will start streaming
replication.


It occurs to me to ask which files we need as a minimum before we can
begin step (10)? If we could start step (10) before step (7) is complete
then we would avoid many of our space problems and replication would
enter "safe" mode considerably sooner, in some cases dozens of hours
earlier.

We read the recovery.conf at the start of StartupXLog(). So by that
point we need only the following files
* All *.conf files
* pg_control (and probably much of the rest of global/ directory)

Some very quick surgery on a current-version data directory shows this
is correct, apart from the call to RelationCacheInitFileRemove() which
can be altered to accept a missing directory as proof that the file has
been removed.

If we then think of the starting procedure as happening in two parts:
i) sufficient startup to get to the point where we bring up the
walreceiver, while startup process waits further confirmation
ii) following further confirmation startup process now begins recovering
database

So if we do the base backup in two stages the sequence of actions could
become

9. Create a recovery command file in the standby server with parameters
required for streaming replication.

7. (a) Make a base backup of minimal essential files from primary
server, load this data onto the standby.

10. Start postgres in the standby server. It will start streaming
replication.

7. (b) Continue with second phase of base backup, copying all remaining
files, ending with pg_stop_backup()

* Next step is to waken startup process so it can continue recovery


We don't need to introduce another call, we just need to have a
mechanism for telling the startup process to sleep because we are doing
a two-phase backup and another mechanism for waking it when the whole
backup is complete. That sounds like a recovery.conf parameter and an
additional kind of trigger file, perhaps the backup file?

This seems like a simple and useful option for 8.5

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Florian Pflug
Date:
Subject: Re: alpha3 release schedule?
Next
From: Tom Lane
Date:
Subject: Re: Tuplestore should remember the memory context it's created in