Thread: Streaming replication failover

Streaming replication failover

From

Ben Chobot

Date:

31 December 2011, 20:02:59

I'm in the process of setting up a 9.1-based SR cluster, and I've got a question on how failover is expected to work in the case of multiple slaves. http://www.postgresql.org/docs/9.1/static/warm-standby-failover.html says:

"Some people choose to use a third server to provide backup for the new primary until the new standby server is recreated, though clearly this complicates the system configuration and operational processes."

I think a third server sounds like a swell idea, but I'm unclear how a slave based on the old master can act as a slave for the new master. I thought that once the master switched, you'd get a new wal timeline, and any existing slaves won't follow along with the new timeline. Which is really unfortunate, because it means the only way to have a slave, immediately after a failover, is to build one ASAP and hope you don't have insurmountable problems while that's going on. For a big database, that can take a while.

Hopefully I'm missing something?

Re: Streaming replication failover

From

Tatsuo Ishii

Date:

31 December 2011, 22:38:49

> I'm in the process of setting up a 9.1-based SR cluster, and I've got a question on how failover is expected to work
inthe case of multiple slaves. http://www.postgresql.org/docs/9.1/static/warm-standby-failover.html says: 
>
> "Some people choose to use a third server to provide backup for the new primary until the new standby server is
recreated,though clearly this complicates the system configuration and operational processes." 
>
> I think a third server sounds like a swell idea, but I'm unclear how a slave based on the old master can act as a
slavefor the new master. I thought that once the master switched, you'd get a new wal timeline, and any existing slaves
won'tfollow along with the new timeline. Which is really unfortunate, because it means the only way to have a slave,
immediatelyafter a failover, is to build one ASAP and hope you don't have insurmountable problems while that's going
on.For a big database, that can take a while. 
>
> Hopefully I'm missing something?

I think you can specify recovery_target_timeline to 'latest' in the
standby server's recovery.conf to solve part of your problem. The
standby will follow the new primary without restoring whole database
from the new standby as long as all logs have transferred before the
old primary went down. That means this technique can only be used in
case when the primary is scheduled down though.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp