Thread: Restoring normal master-slave roles after replication failure
Hello all, I'm terribly sorry if this question has been asked before in a number of different forms; I wasn't able to find quite the nuance I was looking for in the archives. We're running PostgreSQL 9.4 with streaming replication + hot_standby + the newfangled physical replication slots. On the master, the wal_archive level is 'hot_standby', and on the slave, it's 'archive'[1]. As I have understood the conventional wisdom from scouring the documentation and list archives, if the master fails and we trigger-promote the slave to master and then, after fixing the master server, we need to restore it to the master role and restore the slave server back down to the hot_standby role, we need to reinitialise replication. That is, we need to do a brand new base backup of the slave back to the master. My question is: Isn't there a better way that lends itself to some degree of automation? We have a ~500 GB database and while we can do a base backup, it's neither pretty nor quick. Can we not configure $OLD_SLAVE as a master (with WAL senders, replication role and all), put $OLD_MASTER in recovery mode and have it play out the accumulated WALs in reverse from $NEW_MASTER, then shut down $NEW_MASTER, reconfigure both replicas to their appropriate roles, and bring them back up like before? If this is possible, does it depend on setting wal_archive to 'hot_standby' level on the slave during the course of normal operation, too? Shouldn't 'archive' be sufficient to restore from? What exactly does the 'hot_standby' WAL level contain that 'archive' doesn't? If I'm wrong, is there no other way to do this except to re-do a whole base backup back to the master? How do people with very large databases deal with this? Just grin and bear it? Many thanks in advance, and I apologise again if this has previously been answered in some canonical way. -- Alex [1] This is as it should be, right? We want to have backup WALs in case we experience a total physical meltdown of the master and need to do PITR, and/or to archive them to WAL-E from the slave if the slave were promoted to master. -- Alex Balashov | Principal | Evariste Systems LLC 303 Perimeter Center North, Suite 300 Atlanta, GA 30346 United States Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
Hello Alex, Have you seen pgpool or rpmgr-3? > > Hello all, > > I'm terribly sorry if this question has been asked before in a number of > different forms; I wasn't able to find quite the nuance I was looking > for in the archives. > > We're running PostgreSQL 9.4 with streaming replication + hot_standby + > the newfangled physical replication slots. On the master, the > wal_archive level is 'hot_standby', and on the slave, it's 'archive'[1]. > > As I have understood the conventional wisdom from scouring the > documentation and list archives, if the master fails and we > trigger-promote the slave to master and then, after fixing the master > server, we need to restore it to the master role and restore the slave > server back down to the hot_standby role, we need to reinitialise > replication. That is, we need to do a brand new base backup of the slave > back to the master. > > My question is: > > Isn't there a better way that lends itself to some degree of automation? > We have a ~500 GB database and while we can do a base backup, it's > neither pretty nor quick. Can we not configure $OLD_SLAVE as a master > (with WAL senders, replication role and all), put $OLD_MASTER in > recovery mode and have it play out the accumulated WALs in reverse from > $NEW_MASTER, then shut down $NEW_MASTER, reconfigure both replicas to > their appropriate roles, and bring them back up like before? > > If this is possible, does it depend on setting wal_archive to > 'hot_standby' level on the slave during the course of normal operation, > too? Shouldn't 'archive' be sufficient to restore from? What exactly > does the 'hot_standby' WAL level contain that 'archive' doesn't? > > If I'm wrong, is there no other way to do this except to re-do a whole > base backup back to the master? How do people with very large databases > deal with this? Just grin and bear it? > > Many thanks in advance, and I apologise again if this has previously > been answered in some canonical way. > > -- Alex > > [1] This is as it should be, right? We want to have backup WALs in case > we experience a total physical meltdown of the master and need to do > PITR, and/or to archive them to WAL-E from the slave if the slave were > promoted to master. > > -- > Alex Balashov | Principal | Evariste Systems LLC > 303 Perimeter Center North, Suite 300 > Atlanta, GA 30346 > United States > > Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct) > Web: http://www.evaristesys.com/, http://www.csrpswitch.com/ > > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin--- > This message was processed by Kaspersky Mail Gateway 5.6.28/RELEASE > running at host imx2.etecsa.cu > Visit our web-site: <http://www.kaspersky.com>, <http://www.viruslist.com> > Saludos, Gilberto Castillo ETECSA, La Habana, Cuba --- This message was processed by Kaspersky Mail Gateway 5.6.28/RELEASE running at host imx3.etecsa.cu Visit our web-site: <http://www.kaspersky.com>, <http://www.viruslist.com>
Hi Gilberto, On 04/01/2015 02:29 PM, Gilberto Castillo wrote: > Have you seen pgpool or rpmgr-3? I have, but prefer not to use these tools if possible as it would complicate the setup. I'm not so much looking for front-end simplicity as I am looking for a way to avoid having to do a base backup back to the master. I am fine with having to reconfigure master and slave roles manually. -- Alex Balashov | Principal | Evariste Systems LLC 303 Perimeter Center North, Suite 300 Atlanta, GA 30346 United States Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
> > Hi Gilberto, > > On 04/01/2015 02:29 PM, Gilberto Castillo wrote: > >> Have you seen pgpool or rpmgr-3? > > I have, but prefer not to use these tools if possible as it would > complicate the setup. > > I'm not so much looking for front-end simplicity as I am looking for a > way to avoid having to do a base backup back to the master. I am fine > with having to reconfigure master and slave roles manually. I commented, and I used pgpool ensures failover a transparent manner to end users. Now use rpmgr-3, which allows me the same; but without the complicated motaje of pgpool. But if you want to do manuel, you define archive_command = "hot_stanby" on both nodes Saludos, Gilberto Castillo ETECSA, La Habana, Cuba --- This message was processed by Kaspersky Mail Gateway 5.6.28/RELEASE running at host imx3.etecsa.cu Visit our web-site: <http://www.kaspersky.com>, <http://www.viruslist.com>
On 04/01/2015 02:40 PM, Gilberto Castillo wrote: > But if you want to do manuel, you define archive_command = "hot_stanby" on > both nodes Are you implying that my hypothesis is correct, i.e. that we can play the WALs on the slave out in reverse onto the slave? -- Alex Balashov | Principal | Evariste Systems LLC 303 Perimeter Center North, Suite 300 Atlanta, GA 30346 United States Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
> > On 04/01/2015 02:40 PM, Gilberto Castillo wrote: > >> But if you want to do manuel, you define archive_command = "hot_stanby" >> on >> both nodes > > Are you implying that my hypothesis is correct, i.e. that we can play > the WALs on the slave out in reverse onto the slave? I say who pg_recovery defines whether the node is a slave or not Saludos, Gilberto Castillo ETECSA, La Habana, Cuba --- This message was processed by Kaspersky Mail Gateway 5.6.28/RELEASE running at host imx3.etecsa.cu Visit our web-site: <http://www.kaspersky.com>, <http://www.viruslist.com>