Re: Race condition in recovery? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Race condition in recovery?
Date
Msg-id CA+Tgmoaa-zTgVtDGsA-hpwE7_ERg8JitHVs8DOtP5vdn6jzk2A@mail.gmail.com
Whole thread Raw
In response to Race condition in recovery?  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Race condition in recovery?
List pgsql-hackers
On Thu, Jan 21, 2021 at 4:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> 8. Node3, get it because walsender of Node2 read it from TL13 and send
> it and Node2 write in the new WAL file but with TL12.
>
> WalSndSegmentOpen()
> {
> /*-------
> * When reading from a historic timeline, and there is a timeline switch
> * within this segment, read from the WAL segment belonging to the new
> * timeline.
> }
>
> 9. Node3, now set the expectedTLEs to 12 because that is what
> walreceiver has streamed the WAL using.

This seems to be incorrect, because the comment for expectedTLEs says:

 * expectedTLEs: a list of TimeLineHistoryEntries for
recoveryTargetTLI and the timelines of
 * its known parents, newest first (so recoveryTargetTLI is always the
 * first list member).  Only these TLIs are expected to be seen in the WAL
 * segments we read, and indeed only these TLIs will be considered as
 * candidate WAL files to open at all.

But in your scenario apparently we end up with a situation that
contradicts that, because you go on to say:

> 10. Node3, now recoveryTargetTLI is 13 and expectedTLEs is 12.  So

As I understand, expectedTLEs should end up being a list where the
first element is the timeline we want to end up on, and the last
element is the timeline where we are now, and every timeline in the
list branches off of the next timeline in the list. So here if 13
branches of 12 then the list should be 13,12 not just 12; and if 13
does not branch off of 12 OR if 13 branches off of 12 at an earlier
point in the WAL stream than where we are now, then that should be an
error that shuts down the standby, because then there is no way for
replay to get from where it is now to the correct timeline.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: Support for NSS as a libpq TLS backend
Next
From: Andres Freund
Date:
Subject: Avoiding smgrimmedsync() during nbtree index builds