Re: Cascading replication and recovery_target_timeline='latest' - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Cascading replication and recovery_target_timeline='latest'
Date
Msg-id CAHGQGwGrLAvWvV23VLJez4qjovPGaJNtJJ2o=9g_UTwjSQy8dg@mail.gmail.com
Whole thread Raw
In response to Re: Cascading replication and recovery_target_timeline='latest'  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Cascading replication and recovery_target_timeline='latest'  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Tue, Sep 4, 2012 at 7:07 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 03.09.2012 10:43, Fujii Masao wrote:
>>
>> On Sat, Sep 1, 2012 at 2:32 AM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>>>
>>> On Fri, Aug 31, 2012 at 5:03 PM, Heikki Linnakangas<hlinnaka@iki.fi>
>>> wrote:
>>>>
>>>> Aside from the missing locking, I wonder what that does to a cascaded
>>>>
>>>> standby. If there is an active walsender running while RecoveryTargetTLI
>>>> is
>>>> changed, I think what will happen is that the walsender will continue to
>>>> stream WAL from the old timeline, but because the startup process is now
>>>> actually replaying from a different timeline, the walsender will send
>>>> bogus
>>>> WAL to the standby.
>>>
>>>
>>> Good catch! That's really problem. To address that, we should terminate
>>> all cascading walsenders when the timeline history file is read and
>>> the recovery target timeline is changed?
>>
>>
>> This is not right fix. After terminating cascading walsenders, it
>> might take them
>> some time to come to an end, and during that time they might send bogus
>> WAL
>> from old timeline. Currently there is no safeguard against sending bogus
>> WAL
>> from old timeline. To implement such a safeguard, cascading walsender
>> needs
>> to know when the timeline is updated and which is the last valid WAL file
>> of
>> the timeline as the startup process knows. IOW, we need to change
>> cascading
>> walsenders so that they also read and understand the timeline history
>> files.
>> This is not easy fix at this stage (9.2.0 is about to be released...).
>>
>> So, as one idea, I'm thiking to just forbid cascading replication when
>> recovery_target_timeline is set to 'latest'. Thought?
>
>
> Hmm, I was thinking that when walsender gets the position it can send the
> WAL up to, in GetStandbyFlushRecPtr(), it could atomically check the current
> recovery timeline. If it has changed, refuse to send the new WAL and
> terminate. That would be a fairly small change, it would just close the
> window between requesting walsenders to terminate and them actually
> terminating.

Yeah, sounds good. Could you implement the patch? If you don't have time,
I will....

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: pg_upgrade del/rmdir path fix
Next
From: Heikki Linnakangas
Date:
Subject: Re: Cascading replication and recovery_target_timeline='latest'