Home > mailing lists

Re: Cascading replication and recovery_target_timeline='latest' - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Cascading replication and recovery_target_timeline='latest'
Date	September 5, 2012 03:14:19
Msg-id	50469953.1070603@iki.fi Whole thread Raw
In response to	Re: Cascading replication and recovery_target_timeline='latest' (Heikki Linnakangas <hlinnaka@iki.fi>)
List	pgsql-hackers

Tree view

On 03.09.2012 17:40, Heikki Linnakangas wrote:
> On 03.09.2012 16:26, Heikki Linnakangas wrote:
>> On 03.09.2012 16:25, Fujii Masao wrote:
>>> On Tue, Sep 4, 2012 at 7:07 AM, Heikki Linnakangas<hlinnaka@iki.fi>
>>> wrote:
>>>> Hmm, I was thinking that when walsender gets the position it can send
>>>> the
>>>> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the
>>>> current
>>>> recovery timeline. If it has changed, refuse to send the new WAL and
>>>> terminate. That would be a fairly small change, it would just close the
>>>> window between requesting walsenders to terminate and them actually
>>>> terminating.
>>>
>>> Yeah, sounds good. Could you implement the patch? If you don't have
>>> time,
>>> I will....
>>
>> I'll give it a shot..
>
> So, this is what I came up with, please review.

While testing, I bumped into another related bug: When a WAL segment is 
restored from the archive, we let a walsender to send that whole WAL 
segment to a cascading standby. However, there's no guarantee that the 
restored WAL segment is complete. In particular, if a timeline changes 
within that segment, e.g 000000010000000000000004, that segment will be 
only partially full, and the WAL continues at segment 
000000020000000000000004, at the next timeline. This can also happen if 
you copy a partial WAL segment to the archive, for example from a 
crashed master server. Or if you have set up record-based WAL shipping 
not using streaming replication, per 
http://www.postgresql.org/docs/devel/static/log-shipping-alternative.html#WARM-STANDBY-RECORD. 
That manual page says you can only deal with whole WAL files that way, 
but I think with standby_mode='on', that's actually no longer true.

So all in all, it seems like a shaky assumption that once you've 
restored a WAL file from the archive, you're free to stream it to a 
cascading slave. I think it would be more robust to limit it to 
streaming the file only up to the point that it's been replayed - and 
thus verified - in the 1st standby. If everyone is OK with that change 
in behavior, the fix is simple.

- Heikki

pgsql-hackers by date:

From: Tom Lane
Date: 05 September 2012, 02:50:26
Subject: Re: Cascading replication and recovery_target_timeline='latest'

From: Heikki Linnakangas
Date: 05 September 2012, 03:35:07
Subject: Re: Cascading replication and recovery_target_timeline='latest'

Re: Cascading replication and recovery_target_timeline='latest' - Mailing list pgsql-hackers

Previous

Next