On Tue, Sep 8, 2015 at 1:14 AM, Alexander Korotkov wrote: > On Thu, Aug 20, 2015 at 9:57 AM, Michael Paquier wrote: >> The code above looks correct to me when scanning the WAL history onwards >> though, which is what is done when extracting the page map, but not >> backwards when we try to find the last common checkpoint record. This code >> actually fails trying to open 2/0/2 that does not exist in the promoted >> standby's pg_xlog in my test case. >> >> Attached is a small script I have used to reproduce the failure. > > > Right, thanks! It should be fixed in the attached version of patch.
So, instead of a code review, I have been torturing your patch and did advanced tests on it with the attached script, that creates a cluster as follows: master (5432) / \ 1 (5433) 2 (5434) | 3 (5435) Once cluster is created, nodes are promoted in a certain order giving them different timeline jump properties: - master, stays on tli 1 - standby 1, tli 1->2 - standby 2, tli 1->3 - standby 3, tli 1->2->4 And data is inserted on each of them to make WAL fork properly. Finally the script tries to rewind one node using another node as source, and then tries to link this target node back to the source node via streaming replication.
I have tested all the one-one permutations possible in the structure above (see commented portions at the bottom of my script), and all of them worked. I have to say that from the testing prospective this patch looks in great shape, and will greatly improve the use cases of pg_rewind!
Great! Many thanks for your testing!
I am planning to do as well a detailed code review rather soon.
Good, thanks.
------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company