On Sat, Jul 28, 2012 at 2:00 AM, Ben Chobot <bench@silentmedia.com> wrote:
> We make heavy use of streaming replication on PG 9.1 and it's been great for
> us. We do have one issue with it, though, and that's when we switch master
> nodes - currently, the documentation says that you must run pg_basebackup on
> your old master to turn it into a slave. That makes sense when the old
> master had crashed, but it seems that in the case of a planned switch, we
> could do better. Here's what we tried that seemed to work... are we shooting
> ourselves in the foot?
>
> 1. Cleanly shut down the current master.
> 2. Pick a slave, turn it into the new master.
Before promoting the standby, you have to confirm that all WAL files
the old master
generated have been shipped to the standby which you'll promote. Because the
standby might terminate the replication before receiving all WAL
files. Note that
there is no clean way to confirm that. For example, to confirm that, you need to
execute CHECKPOINT in the standby, run pg_controldata in both old master and
standby, and check whether their latest checkpoint locations are the same. You
may think to compare the latest checkpoint location in the old master and
pg_last_xlog_replay_location in the standby. But the former indicates
the *starting*
location of the last WAL record (i.e., shutdown checkpoint WAL record). OTOH,
the latter indicates the *ending* location of it. So you should not compare them
without taking into consideration the above mismatch.
If the standby failed to receive some WAL files, you need to manually copy them
in pg_xlog from the old master to the standby.
> 3. Copy the new pg_xlog history file over to the old master.
> 4. On any other slaves (many of our clusters are 3 nodes), we already have
> "recovery_target_timeline=latest" and wal archiving, so they should already
> be working as slaves of the new master.
> 5. Set up recovery.conf on the old master to be like the other slaves.
> 6. Start up the old master.
>
> Have we just avoided running pg_basebackup, or have we just given ourselves
> data corruption?
If you change your operations in the above-mentioned way, I think you can
avoid pg_basebackup on the planned switch. I've not tested your operations.
So please test them carefully before applying them to your system.
> Because we're using wal archiving, can we simplify and
> leave out step 3?
Yes.
Regards,
--
Fujii Masao