Re: Race condition in recovery? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Race condition in recovery?
Date
Msg-id CA+Tgmob9B_fCbXZvx8a-Az76EnDEs3u7oQGdkbzVon9Z0Nc2EQ@mail.gmail.com
Whole thread Raw
In response to Re: Race condition in recovery?  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Race condition in recovery?
Re: Race condition in recovery?
List pgsql-hackers
On Tue, May 18, 2021 at 1:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> Yeah, it will be a fake 1-element list.  But just to be clear that
> 1-element can only be "ControlFile->checkPointCopy.ThisTimeLineID" and
> nothing else, do you agree to this?  Because we initialize
> recoveryTargetTLI to this value and we might change it in
> readRecoveryCommandFile() but for that, we must get the history file,
> so if we are talking about the case where we don't have the history
> file then "recoveryTargetTLI" will still be
> "ControlFile->checkPointCopy.ThisTimeLineID".

I don't think your conclusion is correct. As I understand it, you're
talking about the state before
ee994272ca50f70b53074f0febaec97e28f83c4e, because as of now
readRecoveryCommandFile() no longer exists in master. Before that
commit, StartupXLOG did this:

        recoveryTargetTLI = ControlFile->checkPointCopy.ThisTimeLineID;
        readRecoveryCommandFile();
        expectedTLEs = readTimeLineHistory(recoveryTargetTLI);

Now, readRecoveryCommandFile() can change recoveryTargetTLI. Before
doing so, it will call existsTimeLineHistory if
recovery_target_timeline was set to an integer, and findNewestTimeLine
if it was set to latest. In the first case, existsTimeLineHistory()
calls RestoreArchivedFile(), but that restores it using a temporary
name, and KeepFileRestoredFromArchive is not called, so we might have
the timeline history in RECOVERYHISTORY but that's not the filename
we're actually going to try to read from inside readTimeLineHistory().
In the second case, findNewestTimeLine() will call
existsTimeLineHistory() which results in the same situation. So I
think when readRecoveryCommandFile() returns expectedTLI can be
different but the history file can be absent since it was only ever
restored under a temporary name.

> Conclusion:
> - I think now we agree on the point that initializing expectedTLEs
> with the recovery target timeline is the right fix.
> - We still have some differences of opinion about what was the
> original problem in the base code which was fixed by the commit
> (ee994272ca50f70b53074f0febaec97e28f83c4e).

I am also still concerned about whether we understand in exactly what
cases the current logic doesn't work. We seem to more or less agree on
the fix, but I don't think we really understand precisely what case we
are fixing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Alexander Pyhalov
Date:
Subject: Function scan FDW pushdown
Next
From: Tom Lane
Date:
Subject: CALL versus procedures with output-only arguments