Streaming replication based failover

Let us suppose that there is a hot standby replication set up in a cluster. The db1 server is a master and dbX are replicas. Also suppose that we need to do a failover on one of our slaves.

First we need to define what replica will be a new master.

In case of the master failure you need to find the most caught up replica. To do this compare WAL replay locations on replicas and chose the one with the biggest value.

postgres=# SELECT pg_last_xlog_replay_location();
 pg_last_xlog_replay_location 
------------------------------
 BAA/37DA2888
(1 row)

Note that If you chose not the most caught up one than other replicas that have replayed later WAL entries must be reconfigured from scratch. Otherwise their data can be corrupted and you will not get any warnings about it.

In case of the planned switchover choose one that will be a new master on your own.

Then stop all the slaves except the new master. Let say the remaining slave is db2.

Use the command below to help the master and the remaining slave get ahead of other slaves if you are not sure that they already are. The command creates a minimal WAL entry.

postgres=# SELECT txid_current();

Now touch the failover file on the remaining slave to promote it as a new master.

postgres@db2: ~ $ touch /path/to/failover

And you will see similar things in logs.

2011-04-05 11:46:48 MSD @ 69971 [4d9ac05d.11153-5]:LOG:  trigger file found: /db/data/failover
2011-04-05 11:46:48 MSD @ 69974 [4d9ac05d.11156-2]:FATAL:  terminating walreceiver process due to administrator command
2011-04-05 11:46:48 MSD @ 69971 [4d9ac05d.11153-6]:LOG:  redo done at 0/4012A68
2011-04-05 11:46:48 MSD @ 69971 [4d9ac05d.11153-7]:LOG:  last completed transaction was at log time 2011-04-05 11:41:29.199406+04
2011-04-05 11:46:48 MSD @ 69971 [4d9ac05d.11153-8]:LOG:  selected new timeline ID: 2
2011-04-05 11:46:49 MSD @ 69971 [4d9ac05d.11153-9]:LOG:  archive recovery complete
2011-04-05 11:46:49 MSD @ 69972 [4d9ac05d.11154-1]:LOG:  checkpoint starting: end-of-recovery immediate wait
2011-04-05 11:46:49 MSD @ 69972 [4d9ac05d.11154-2]:LOG:  checkpoint complete: wrote 22 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=0.010 s, sync=0.044 s, total=0.060 s
2011-04-05 11:46:49 MSD @ 69969 [4d9ac05c.11151-2]:LOG:  database system is ready to accept connections

The failover file will be removed automatically and recovery.conf will be renamed to recovery.done.

On the stopped slaves delete everything in the pg_xlog directory.

postgres@dbX: ~ $ rm -rf /db/data/pg_xlog/*

And copy the history file from the new master.

postgres@dbX: ~ $ scp db2:/db/data/pg_xlog/*.history /db/data/pg_xlog/

Then change the DSN to the new master and add the following instruction in recovery.conf. This will make replicas to follow the latest created timeline.

recovery_target_timeline = 'latest'

Now start the postgres service and wait until it is synced up to the new master.

dbX: ~ # /etc/init.d/postgresql start