Home > mailing lists

Re: Cascading replication and recovery_target_timeline='latest' - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Cascading replication and recovery_target_timeline='latest'
Date	September 5, 2012 03:35:07
Msg-id	50469E33.902@iki.fi Whole thread Raw
In response to	Re: Cascading replication and recovery_target_timeline='latest' (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Cascading replication and recovery_target_timeline='latest' (Heikki Linnakangas <hlinnaka@iki.fi>)
List	pgsql-hackers

Tree view

On 04.09.2012 16:50, Tom Lane wrote:
> Josh Berkus<josh@agliodbs.com>  writes:
>> Heikki,
>>> It is for 9.2. I'll do a little bit more testing, and barring any
>>> issues, commit the patch. What exactly is the schedule? Do we need to do
>>> a RC2 because of this?
>
>> We're currently scheduled to release next week.  If we need to do an
>> RC2, we're going to have to do some fast rescheduling; we've already
>> started the publicity machine.
>
> At this point I would argue that the only thing that should abort the
> launch is a bad regression.  Minor bugs in new features (and this must
> be minor if it wasn't noticed before) don't qualify.
>
> Having said that, it'd be good to get it fixed if we can.  The schedule
> says to wrap 9.2.0 Thursday evening --- Heikki, can you get this fixed
> tomorrow (Wednesday)?

The attached patch fixes it for me. It fixes the original problem, by
adding the missing locking and terminating walsenders on a target
timeline change, and also changes the behavior wrt. WAL segments
restored from the archive, as I just suggested in another email
(http://archives.postgresql.org/pgsql-hackers/2012-09/msg00206.php).

The test case I've been using is a master and two standbys. The first
standby is set up to connect to the master with streaming replication,
and the other standby is set up to connect to the 1st standby, ie. it's
a cascading slave. In addition, the master is set up to do WAL archiving
to a directory, and both standbys have a restore_command to read from
that archive, and restore_target_timeline='latest'. After the master and
both standbys are running, I create a dummy recovery.conf file in
master's data directory, with just "restore_command='/bin/false'" in it,
and restart the master. That forces a timeline change in the master.
With the patch, the 1st standby will notice the new timeline in the
archive, switch to that, and reconnect to the master. The cascading
connection to the 2nd standby is terminated because of the timeline
change, the 2nd standby will also scan the archive and pick up the new
timeline, reconnect to the 1st standby, and be in sync again.

- Heikki

Attachment

disconnect-walsenders-on-target-tli-change-2.patch

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 05 September 2012, 03:14:19
Subject: Re: Cascading replication and recovery_target_timeline='latest'

From: Tom Lane
Date: 05 September 2012, 03:35:49
Subject: Re: too much pgbench init output

Re: Cascading replication and recovery_target_timeline='latest' - Mailing list pgsql-hackers

Attachment

Previous

Next