Re: Cascading replication and recovery_target_timeline='latest' - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Cascading replication and recovery_target_timeline='latest'
Date
Msg-id 50469E33.902@iki.fi
Whole thread Raw
In response to Re: Cascading replication and recovery_target_timeline='latest'  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Cascading replication and recovery_target_timeline='latest'  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On 04.09.2012 16:50, Tom Lane wrote:
> Josh Berkus<josh@agliodbs.com>  writes:
>> Heikki,
>>> It is for 9.2. I'll do a little bit more testing, and barring any
>>> issues, commit the patch. What exactly is the schedule? Do we need to do
>>> a RC2 because of this?
>
>> We're currently scheduled to release next week.  If we need to do an
>> RC2, we're going to have to do some fast rescheduling; we've already
>> started the publicity machine.
>
> At this point I would argue that the only thing that should abort the
> launch is a bad regression.  Minor bugs in new features (and this must
> be minor if it wasn't noticed before) don't qualify.
>
> Having said that, it'd be good to get it fixed if we can.  The schedule
> says to wrap 9.2.0 Thursday evening --- Heikki, can you get this fixed
> tomorrow (Wednesday)?

The attached patch fixes it for me. It fixes the original problem, by
adding the missing locking and terminating walsenders on a target
timeline change, and also changes the behavior wrt. WAL segments
restored from the archive, as I just suggested in another email
(http://archives.postgresql.org/pgsql-hackers/2012-09/msg00206.php).

The test case I've been using is a master and two standbys. The first
standby is set up to connect to the master with streaming replication,
and the other standby is set up to connect to the 1st standby, ie. it's
a cascading slave. In addition, the master is set up to do WAL archiving
to a directory, and both standbys have a restore_command to read from
that archive, and restore_target_timeline='latest'. After the master and
both standbys are running, I create a dummy recovery.conf file in
master's data directory, with just "restore_command='/bin/false'" in it,
and restart the master. That forces a timeline change in the master.
With the patch, the 1st standby will notice the new timeline in the
archive, switch to that, and reconnect to the master. The cascading
connection to the 2nd standby is terminated because of the timeline
change, the 2nd standby will also scan the archive and pick up the new
timeline, reconnect to the 1st standby, and be in sync again.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Cascading replication and recovery_target_timeline='latest'
Next
From: Tom Lane
Date:
Subject: Re: too much pgbench init output