Hi.
I'm trying to work out failover and disaster recovery procedures for a
cluster of three servers. Streaming replication is being used with a high
wal_keep_segments, no log shipping is happening. I need to avoid the
several hours it takes to rebuild a hot standby from scratch.
ServerA is the master.
ServerB is a streaming hot standby and prefered failover server.
ServerC is a streaming hot standby.
For a planned failover, maintenance on ServerA:
1. Shutdown ServerB & ServerC
2. Shutdown ServerA
3. Copy pg_xlog from ServerA to ServerB and ServerC
4. Reconfigure ServerB as master, start it up.
5. Reconfigure ServerC as streaming hot standby of ServerB. Start it.
6. After maintenance, reconfigure ServerA as streaming hot standby
of ServerB. Start it.
For an unplanned failover, ServerA has exploded:
1. Run 'SELECT pg_last_xlog_receive_location()' on ServerB and ServerC,
determining which is most up to date.
2. Shutdown ServerB and ServerC
3. If ServerC is more up to date, copy pg_xlog from ServerC to ServerB.
4. Reconfigure ServerB as master, start it up.
5. Reconfigure ServerC as streaming hot standby of ServerB, start it up.
Does this look correct to people?
Am I going to end up in trouble copying files into pg_xlog like this on a
busy system?
Is it overengineered? eg. will a master ensure everything is streamed to
connected hot standbys before a graceful shutdown?
--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/