Thread: Procedure after failover

Procedure after failover

From
Paul Jungwirth
Date:
Hi All,

I have Postgres 9.3 on Ubuntu 14.04 set up in a master/slave
configuration with streaming replication. On the master I ran `sudo
service postgresql stop` and then on the slave I ran `sudo touch
$trigger_file`. Now the slave seems to be running fine, but I'm trying
to figure out the process for getting things back to normal. I think
it is roughly like this, but I'd love for someone to confirm:

1. Change each server's {postgresql,recovery}.conf so the (old) slave
will replicate back to the (old) master. Restart the (old) slave, then
start the (old) master.
2. Once the (old) master has caught up, run `sudo service postgresql
stop` on the (old) slave, then `sudo touch $trigger_file` on the (old)
master. Now the (old) master is a master again.
3. Change each server's {postgresql,recovery}.conf files to their
original settings. Restart the master, then start the slave.

Will this work?

What if there were changes on the master that didn't get replicated
before I originally shut it down? (Or does using init.d delay shutdown
until all WAL updates have made it out?)

Is there a better way to do it? Do I need to wipe the (old) master and
use pg_dump/pg_restore before I bring it back up?

If it helps, here is my postgresql.conf on the master:

archive_mode = on
archive_command = 'rsync -aq -e "ssh -o StrictHostKeyChecking=no" %p
10.0.21.10:/secure/pgsql/archive/%f'
archive_timeout = 3600

Here is postgresql.conf on the slave:

hot_standby = on

and recovery.conf on the slave:

standby_mode = 'on'
primary_conninfo = 'XXXXXXX'
trigger_file = '/secure/pgsql/main/trigger'
restore_command = 'cp /secure/pgsql/archive/%f %p'
archive_cleanup_command =
'/usr/lib/postgresql/9.3/bin/pg_archivecleanup /secure/pgsql/archive/
%r'

Thanks,
Paul

--
_________________________________
Pulchritudo splendor veritatis.


Re: Procedure after failover

From
Paul Jungwirth
Date:
A bit more info:

> What if there were changes on the master that didn't get replicated
> before I originally shut it down?

It looks like Ubuntu's init.d script does a "fast" shutdown, i.e.
SIGINT on this page:

http://www.postgresql.org/docs/9.3/static/server-shutdown.html

I can't tell from the doc what happens re WAL archives though. Is that
what the page means by "online backup mode"? My suspicion is that
because I shut down the master "fast", I'm going to have to wipe it
and then pg_restore it from the slave, because it might have data that
never made it out to the slave. Is that correct?

Thanks,
Paul

--
_________________________________
Pulchritudo splendor veritatis.