Thread: Streaming replication failover
I'm in the process of setting up a 9.1-based SR cluster, and I've got a question on how failover is expected to work in the case of multiple slaves. http://www.postgresql.org/docs/9.1/static/warm-standby-failover.html says:
"Some people choose to use a third server to provide backup for the new primary until the new standby server is recreated, though clearly this complicates the system configuration and operational processes."
I think a third server sounds like a swell idea, but I'm unclear how a slave based on the old master can act as a slave for the new master. I thought that once the master switched, you'd get a new wal timeline, and any existing slaves won't follow along with the new timeline. Which is really unfortunate, because it means the only way to have a slave, immediately after a failover, is to build one ASAP and hope you don't have insurmountable problems while that's going on. For a big database, that can take a while.
Hopefully I'm missing something?
> I'm in the process of setting up a 9.1-based SR cluster, and I've got a question on how failover is expected to work inthe case of multiple slaves. http://www.postgresql.org/docs/9.1/static/warm-standby-failover.html says: > > "Some people choose to use a third server to provide backup for the new primary until the new standby server is recreated,though clearly this complicates the system configuration and operational processes." > > I think a third server sounds like a swell idea, but I'm unclear how a slave based on the old master can act as a slavefor the new master. I thought that once the master switched, you'd get a new wal timeline, and any existing slaves won'tfollow along with the new timeline. Which is really unfortunate, because it means the only way to have a slave, immediatelyafter a failover, is to build one ASAP and hope you don't have insurmountable problems while that's going on.For a big database, that can take a while. > > Hopefully I'm missing something? I think you can specify recovery_target_timeline to 'latest' in the standby server's recovery.conf to solve part of your problem. The standby will follow the new primary without restoring whole database from the new standby as long as all logs have transferred before the old primary went down. That means this technique can only be used in case when the primary is scheduled down though. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp