Re: Use pg_rewind when target timeline was switched - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Use pg_rewind when target timeline was switched
Date
Msg-id CAPpHfdtmrUp0Mora1ffZKY6nhMkGJOCF4j8iPc+pAc=7V8Kw3Q@mail.gmail.com
Whole thread Raw
In response to Re: Use pg_rewind when target timeline was switched  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: Use pg_rewind when target timeline was switched  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Thu, Aug 20, 2015 at 9:57 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Jul 22, 2015 at 4:28 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Wed, Jul 22, 2015 at 8:48 AM, Michael Paquier <michael.paquier@gmail.com> wrote
On Mon, Jul 20, 2015 at 9:18 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> attached patch allows pg_rewind to work when target timeline was switched.
> Actually, this patch fixes TODO from pg_rewind comments.
>
>   /*
>    * Trace the history backwards, until we hit the target timeline.
>    *
>    * TODO: This assumes that there are no timeline switches on the target
>    * cluster after the fork.
>    */
>
> This patch allows pg_rewind to handle data directory synchronization is much
> more general way.
For instance, user can return promoted standby to old master.

+               /*
+                * Since incomplete segments are copied into next timelines, find the
+                * lastest timeline holding required segment.
+                */
+               while (private->tliIndex < targetNentries - 1 &&
+                          targetHistory[private->tliIndex].end < targetSegEnd)
+               {
+                       private->tliIndex++;
+                       tli_index++;
+               }
It seems to me that the patch is able to handle timeline switches onwards, but not backwards and this is what would be required to return a promoted standby, that got switched to let's say timeline 2 when promoted, to an old master, that is still on timeline 1. This code actually fails when scanning for the last checkpoint before WAL forked as it will be on the timeline 1 of the old master. Imagine for example that the WAL has forked at  0/30XXXXX which is saved in segment 000000020000000000000003 (say 2/0/3) on the promoted standby, and that the last checkpoint record is on 0/20XXXXX, which is part of 000000010000000000000002 (1/0/2). I think that we should scan 2/0/3 (not the partial segment 1/0/3), and then 1/0/2 when looking for the last checkpoint record. Hence the startup index TLI should be set to the highest timeline and should be decremented depending on what is in the history file.
The code above looks correct to me when scanning the WAL history onwards though, which is what is done when extracting the page map, but not backwards when we try to find the last common checkpoint record. This code actually fails trying to open 2/0/2 that does not exist in the promoted standby's pg_xlog in my test case.

Attached is a small script I have used to reproduce the failure.
 
Right, thanks! It should be fixed in the attached version of patch.

I think that the documentation needs a brush up as well to outline the fact that pg_rewind would be able to put back as well a standby in a cluster, after for example an operator mistake when promoting a node that should not be.

Yes. But from current pg_rewind documentation I can't figure out that it can't do so. However, we can add an explicit statement which claims that it can.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Attachment

pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: LLVM Address Sanitizer (ASAN) and valgrind support
Next
From: Petr Jelinek
Date:
Subject: Re: WIP: Access method extendability