Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc - Mailing list pgsql-general

From Heikki Linnakangas
Subject Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc
Date
Msg-id 50CF0990.8020506@iki.fi
Whole thread Raw
In response to Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc  (hubert depesz lubaczewski <depesz@depesz.com>)
Responses Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc
Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc
List pgsql-general
On 15.12.2012 17:06, hubert depesz lubaczewski wrote:
> I might be missing something, but what exactly does that commit give us?
>
> I mean - we were able, previously, to make slave switch to new master
> - as Phil Sorber described in here:
> http://philsorber.blogspot.com/2012/03/what-to-do-when-your-timeline-isnt.html
>
> After some talk on IRC, I understood that this patch will make it
> possible to switch to new master in plain SR replication, with no WAL
> archive (because if you have wal archive, you can use the method Phil
> described, which basically "just works").

Right, that's exactly the point of the patch. A WAL archive is no longer
necessary for failover.

> So I did setup three machines: master and two slaves.
> Master had 2 IPs - its own, and a floating one.
> Both slaves were connecting to the floating one, and recovery.conf
> looked like:
> ---------
> standby_mode = 'on'
> primary_conninfo = 'port=5920 user=replication host=172.28.173.253'
> trigger_file = '/tmp/finish.replication'
> recovery_target_timeline='latest'
> ---------
>
> After I verified that replication works to both slaves, I did failover one of
> the slaves, shut down master, and did ip takeover of floating ip to the slave
> that did takeover.

Hmm, is it possible that some WAL was generated in the old master, and
streamed to the standby, after the new master was already promoted? It's
important to kill the old master before promoting the new master.
Otherwise the timelines diverge, so that you have some WAL on the old
timeline that's not present in the new master, and some WAL in the new
master's timeline that's not present in the old master. In that
situation, if the standby has already replicated the WAL from the old
master, it can no longer start to follow the new master. I think that
would match the symptoms you're seeing.

I wouldn't rule out a bug in the patch either, though. Amit found a
worrying number of bugs in his testing, and although we stamped out all
the known bugs, it wouldn't surprise me if there's more :-(..

- Heikki


pgsql-general by date:

Previous
From: Groshev Andrey
Date:
Subject: trouble with pg_upgrade 9.0 -> 9.1
Next
From: "Kevin Grittner"
Date:
Subject: Re: problem with large inserts