On 09/26/2015 02:39 PM, Michael Paquier wrote:
> On Sat, Sep 26, 2015 at 12:22 PM, Amir Rohan wrote:
>> DETAIL: The failed archive command was: test ! -f
>> /var/lib/pgsql/archive/00000002.history && cp pg_xlog/00000002.history
>> /var/lib/pgsql/archive/00000002.history
>
> That's a symptom that the second node already selected timeline 2, my
> first guess is that the restore_command for the second node recovered
> failed to get 00000002.history because it may have been set
> incorrectly.
>
I guess you mean the 3rd node here, which is the 2nd recovery.
You seem to agree that the 3rd node (2nd recovery) picked timeline 2
instead of starting a 3rd. So, at least I have that right.
What does "Set incorrectly" mean in this case? the 2nd node (1st
recovery) creates 00000002.history, I checked.
>> I expected the last server to create timeline 3, but it looks as if it
>> thinks it's timeline2. While in terms of data it's at the same WAL point,
>> in terms of events, it should be at timeline 3. That's what I make
>> of it anyway.
>
> I am not able to see any problems with neither 9.5 nor HEAD, when
> recovering the second node in your scenario, it fetches correctly
> 00000002.history from the archive in my case and the node is able to
> select the timeline 3 when recovery is pointing to select the latest
> timeline.
>
I can reproduced this 100% of the time with HEAD from the last few days,
and haven't tried with anything else.
See attached log+shell-script I used to reproduce.
The log shows the contents of the archive directory at various points
in the process I described in the initial post.
Amir