Restoring normal master-slave roles after replication failure - Mailing list pgsql-admin

From Alex Balashov
Subject Restoring normal master-slave roles after replication failure
Date
Msg-id 551C295D.6060602@evaristesys.com
Whole thread Raw
Responses Re: Restoring normal master-slave roles after replication failure
List pgsql-admin
Hello all,

I'm terribly sorry if this question has been asked before in a number of
different forms; I wasn't able to find quite the nuance I was looking
for in the archives.

We're running PostgreSQL 9.4 with streaming replication + hot_standby +
  the newfangled physical replication slots. On the master, the
wal_archive level is 'hot_standby', and on the slave, it's 'archive'[1].

As I have understood the conventional wisdom from scouring the
documentation and list archives, if the master fails and we
trigger-promote the slave to master and then, after fixing the master
server, we need to restore it to the master role and restore the slave
server back down to the hot_standby role, we need to reinitialise
replication. That is, we need to do a brand new base backup of the slave
back to the master.

My question is:

Isn't there a better way that lends itself to some degree of automation?
We have a ~500 GB database and while we can do a base backup, it's
neither pretty nor quick. Can we not configure $OLD_SLAVE as a master
(with WAL senders, replication role and all), put $OLD_MASTER in
recovery mode and have it play out the accumulated WALs in reverse from
$NEW_MASTER, then shut down $NEW_MASTER, reconfigure both replicas to
their appropriate roles, and bring them back up like before?

If this is possible, does it depend on setting wal_archive to
'hot_standby' level on the slave during the course of normal operation,
too? Shouldn't 'archive' be sufficient to restore from? What exactly
does the 'hot_standby' WAL level contain that 'archive' doesn't?

If I'm wrong, is there no other way to do this except to re-do a whole
base backup back to the master? How do people with very large databases
deal with this? Just grin and bear it?

Many thanks in advance, and I apologise again if this has previously
been answered in some canonical way.

-- Alex

[1] This is as it should be, right? We want to have backup WALs in case
we experience a total physical meltdown of the master and need to do
PITR, and/or to archive them to WAL-E from the slave if the slave were
promoted to master.

--
Alex Balashov | Principal | Evariste Systems LLC
303 Perimeter Center North, Suite 300
Atlanta, GA 30346
United States

Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct)
Web: http://www.evaristesys.com/, http://www.csrpswitch.com/


pgsql-admin by date:

Previous
From: Denish Patel
Date:
Subject: Re: Permission select pg_stat_replication
Next
From: "Gilberto Castillo"
Date:
Subject: Re: Restoring normal master-slave roles after replication failure