Re: Switching timeline over streaming replication - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Switching timeline over streaming replication
Date
Msg-id CAHGQGwHR790c5j7SSJwqjPfjRnWvMfrGPgpXT=Gxv4ghxGL9_Q@mail.gmail.com
Whole thread Raw
In response to Re: Switching timeline over streaming replication  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Switching timeline over streaming replication  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Sat, Dec 15, 2012 at 9:36 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Sat, Dec 8, 2012 at 12:51 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> On 06.12.2012 15:39, Amit Kapila wrote:
>>>
>>> On Thursday, December 06, 2012 12:53 AM Heikki Linnakangas wrote:
>>>>
>>>> On 05.12.2012 14:32, Amit Kapila wrote:
>>>>>
>>>>> On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote:
>>>>>>
>>>>>> After some diversions to fix bugs and refactor existing code, I've
>>>>>> committed a couple of small parts of this patch, which just add some
>>>>>> sanity checks to notice incorrect PITR scenarios. Here's a new
>>>>>> version of the main patch based on current HEAD.
>>>>>
>>>>>
>>>>> After testing with the new patch, the following problems are observed.
>>>>>
>>>>> Defect - 1:
>>>>>
>>>>>       1. start primary A
>>>>>       2. start standby B following A
>>>>>       3. start cascade standby C following B.
>>>>>       4. start another standby D following C.
>>>>>       5. Promote standby B.
>>>>>       6. After successful time line switch in cascade standby C&   D,
>>>>
>>>> stop D.
>>>>>
>>>>>       7. Restart D, Startup is successful and connecting to standby C.
>>>>>       8. Stop C.
>>>>>       9. Restart C, startup is failing.
>>>>
>>>>
>>>> Ok, the error I get in that scenario is:
>>>>
>>>> C 2012-12-05 19:55:43.840 EET 9283 FATAL:  requested timeline 2 does not
>>>> contain minimum recovery point 0/3023F08 on timeline 1 C 2012-12-05
>>>> 19:55:43.841 EET 9282 LOG:  startup process (PID 9283) exited with exit
>>>> code 1 C 2012-12-05 19:55:43.841 EET 9282 LOG:  aborting startup due to
>>>> startup process failure
>>>>
>>>
>>>>
>>>> That mismatch causes the error. I'd like to fix this by always treating
>>>> the checkpoint record to be part of the new timeline. That feels more
>>>> correct. The most straightforward way to implement that would be to peek
>>>> at the xlog record before updating replayEndRecPtr and replayEndTLI. If
>>>> it's a checkpoint record that changes TLI, set replayEndTLI to the new
>>>> timeline before calling the redo-function. But it's a bit of a
>>>> modularity violation to peek into the record like that.
>>>>
>>>> Or we could just revert the sanity check at beginning of recovery that
>>>> throws the "requested timeline 2 does not contain minimum recovery point
>>>> 0/3023F08 on timeline 1" error. The error I added to redo of checkpoint
>>>> record that says "unexpected timeline ID %u in checkpoint record, before
>>>> reaching minimum recovery point %X/%X on timeline %u" checks basically
>>>> the same thing, but at a later stage. However, the way
>>>> minRecoveryPointTLI is updated still seems wrong to me, so I'd like to
>>>> fix that.
>>>>
>>>> I'm thinking of something like the attached (with some more comments
>>>> before committing). Thoughts?
>>>
>>>
>>> This has fixed the problem reported.
>>> However, I am not able to think will there be any problem if we remove
>>> check
>>> "requested timeline 2 does not contain minimum recovery point
>>>>
>>>> 0/3023F08 on timeline 1" at beginning of recovery and just update
>>>
>>> replayEndTLI with ThisTimeLineID?
>>
>>
>> Well, it seems wrong for the control file to contain a situation like this:
>>
>> pg_control version number:            932
>> Catalog version number:               201211281
>> Database system identifier:           5819228770976387006
>> Database cluster state:               shut down in recovery
>> pg_control last modified:             pe  7. joulukuuta 2012 17.39.57
>> Latest checkpoint location:           0/3023EA8
>> Prior checkpoint location:            0/2000060
>> Latest checkpoint's REDO location:    0/3023EA8
>> Latest checkpoint's REDO WAL file:    000000020000000000000003
>> Latest checkpoint's TimeLineID:       2
>> ...
>> Time of latest checkpoint:            pe  7. joulukuuta 2012 17.39.49
>> Min recovery ending location:         0/3023F08
>> Min recovery ending loc's timeline:   1
>>
>> Note the latest checkpoint location and its TimelineID, and compare them
>> with the min recovery ending location. The min recovery ending location is
>> ahead of latest checkpoint's location; the min recovery ending location
>> actually points to the end of the checkpoint record. But how come the min
>> recovery ending location's timeline is 1, while the checkpoint record's
>> timeline is 2.
>>
>> Now maybe that would happen to work if remove the sanity check, but it still
>> seems horribly confusing. I'm afraid that discrepancy will come back to
>> haunt us later if we leave it like that. So I'd like to fix that.
>>
>> Mulling over this for some more, I propose the attached patch. With the
>> patch, we peek into the checkpoint record, and actually perform the timeline
>> switch (by changing ThisTimeLineID) before replaying it. That way the
>> checkpoint record is really considered to be on the new timeline for all
>> purposes. At the moment, the only difference that makes in practice is that
>> we set replayEndTLI, and thus minRecoveryPointTLI, to the new TLI, but it
>> feels logically more correct to do it that way.
>
> This patch has already been included in HEAD. Right?
>
> I found another "requested timeline does not contain minimum recovery point"
> error scenario in HEAD:
>
> 1. Set up the master 'M', one standby 'S1', and one cascade standby 'S2'.
> 2. Shutdown the master 'M' and promote the standby 'S1', and wait for 'S2'
>     to reconnect to 'S1'.
> 3. Set up new cascade standby 'S3' connecting to 'S2'.
>     Then 'S3' fails to start the recovery because of the following error:
>
>     FATAL:  requested timeline 2 does not contain minimum recovery
> point 0/3000000 on timeline 1
>     LOG:  startup process (PID 33104) exited with exit code 1
>     LOG:  aborting startup due to startup process failure
>
> The result of pg_controldata of 'S3' is:
>
> Latest checkpoint location:           0/3000088
> Prior checkpoint location:            0/2000060
> Latest checkpoint's REDO location:    0/3000088
> Latest checkpoint's REDO WAL file:    000000020000000000000003
> Latest checkpoint's TimeLineID:       2
> <snip>
> Min recovery ending location:         0/3000000
> Min recovery ending loc's timeline:   1
> Backup start location:                0/0
> Backup end location:                  0/0
>
> The content of the timeline history file '00000002.history' is:
>
> 1       0/3000088       no recovery target specified

I still could reproduce this problem. Attached is the shell script
which reproduces the problem.

Regards,

--
Fujii Masao

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: ALTER .. OWNER TO error mislabels schema as other object type
Next
From: Pavan Deolasee
Date:
Subject: Re: Set visibility map bit after HOT prune