Re: Race condition in recovery? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Race condition in recovery?
Date
Msg-id 20210521.112105.27166595366072396.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Race condition in recovery?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Race condition in recovery?
Re: Race condition in recovery?
Re: Race condition in recovery?
List pgsql-hackers
At Thu, 20 May 2021 13:49:10 -0400, Robert Haas <robertmhaas@gmail.com> wrote in 
> On Tue, May 18, 2021 at 1:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > Yeah, it will be a fake 1-element list.  But just to be clear that
> > 1-element can only be "ControlFile->checkPointCopy.ThisTimeLineID" and
> > nothing else, do you agree to this?  Because we initialize
> > recoveryTargetTLI to this value and we might change it in
> > readRecoveryCommandFile() but for that, we must get the history file,
> > so if we are talking about the case where we don't have the history
> > file then "recoveryTargetTLI" will still be
> > "ControlFile->checkPointCopy.ThisTimeLineID".
> 
> I don't think your conclusion is correct. As I understand it, you're
> talking about the state before
> ee994272ca50f70b53074f0febaec97e28f83c4e, because as of now
> readRecoveryCommandFile() no longer exists in master. Before that
> commit, StartupXLOG did this:
> 
>         recoveryTargetTLI = ControlFile->checkPointCopy.ThisTimeLineID;
>         readRecoveryCommandFile();
>         expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
> 
> Now, readRecoveryCommandFile() can change recoveryTargetTLI. Before
> doing so, it will call existsTimeLineHistory if
> recovery_target_timeline was set to an integer, and findNewestTimeLine
> if it was set to latest. In the first case, existsTimeLineHistory()
> calls RestoreArchivedFile(), but that restores it using a temporary
> name, and KeepFileRestoredFromArchive is not called, so we might have
> the timeline history in RECOVERYHISTORY but that's not the filename
> we're actually going to try to read from inside readTimeLineHistory().
> In the second case, findNewestTimeLine() will call
> existsTimeLineHistory() which results in the same situation. So I
> think when readRecoveryCommandFile() returns expectedTLI can be
> different but the history file can be absent since it was only ever
> restored under a temporary name.

Anyway it seems that the commit tried to fix an issue happens without
using WAL archive.

https://www.postgresql.org/message-id/50E43C57.5050101%40vmware.com

> That leaves one case not covered: If you take a backup with plain 
> "pg_basebackup" from a standby, without -X, and the first WAL segment 
> contains a timeline switch (ie. you take the backup right after a 
> failover), and you try to recover from it without a WAL archive, it 
> doesn't work. This is the original issue that started this thread, 
> except that I used "-x" in my original test case. The problem here is 
> that even though streaming replication will fetch the timeline history 
> file when it connects, at the very beginning of recovery, we expect that 
> we already have the timeline history file corresponding the initial 
> timeline available, either in pg_xlog or the WAL archive. By the time 
> streaming replication has connected and fetched the history file, we've 
> already initialized expectedTLEs to contain just the latest TLI. To fix 
> that, we should delay calling readTimeLineHistoryFile() until streaming 
> replication has connected and fetched the file.
> If the first segment read by recovery contains a timeline switch, the first
> pages have older timeline than segment timeline. However, if
> exepectedTLEs contained only the segment timeline, we cannot know if
> we can use the record.  In that case the following error is issued.

If expectedTLEs is initialized with the pseudo list,
tliOfPointInHistory always return the recoveryTargetTLI regardless of
the given LSN so the checking about timelines later doesn't work. And
later ReadRecord is surprised to see a page of an unknown timeline.

"unexpected timeline ID 1 in log segment"

So the objective is to initialize expectedTLEs with the right content
of the history file for the recoveryTargetTLI until ReadRecord fetches
the first record.  After the fix things are working as the following.

- recoveryTargetTimeLine is initialized with
  ControlFile->checkPointCopy.ThisTimeLineID

- readRecoveryCommandFile():

  Move recoveryTargetTLI forward to the specified target timline if
  the history file for the timeline is found, or in the case of
  latest, move it forward up to the maximum timeline among the history
  files found in either pg_wal or archive.

  !!! Anyway recoveryTargetTLI won't goes back behind the checkpoint
  TLI.

- ReadRecord...XLogFileReadAnyTLI

  Tries to load the history file for recoveryTargetTLI either from
  pg_wal or archive onto local TLE list, if the history file is not
  found, use a generateed list with one entry for the
  recoveryTargetTLI.

  (a) If the specified segment file for any timeline in the TLE list
    is found, expectedTLEs is initialized with the local list. No need
    to worry about expectedTLEs any longer.

  (b) If such a segment is *not* found, expectedTLEs is left
    NIL. Usually recoveryTargetTLI is equal to the last checkpoint
    TLI.

  (c) However, in the case where timeline switches happened in the
    segment and the recoveryTargetTLI has been increased, that is, the
    history file for the recoveryTargetTLI is found in pg_wal or
    archive, that is, the issue raised here, recoveryTargetTLI becomes
    the future timline of the checkpoint TLI.

  (d) The history file for the recoveryTargetTLI is *not* found but
    the segment file is found, expectedTLEs is initialized with the
    generated list, which doesn't contain past timelines. In this
    case, recoveryTargetTLI has not moved from the initial value of
    the checkpoint TLI. If the REDO point is before a timeline switch,
    the page causes FATAL in ReadRecord later.  However, I think there
    cannot be a case where segment file is found before corresponding
    history file.  (Except for TLI=1, which is no problem.)

- WaitForWALToBecomeAvailable

  if we have had no segments for the last checkpoint, initiate
  streaming from the REDO point of the last checkpoint. We should have
  all history files until receiving segment data.

  after sufficient WAL data has been received, the only cases where
  expectedTLEs is still NIL are the (b) and (c) above.

  In the case of (b) recoveryTargetTLI == checkpoint TLI.

  In the case of (c) recoveryTargetTLI > checkpoint TLI.  In this case
  we expecte that checkpint TLI is in the history of
  recoveryTargetTLI. Otherwise recovery failse.  This case is similar
  to the case (a) but the relationship between recoveryTargetTLI and
  the checkpoint TLI is not confirmed yet. ReadRecord barks later if
  they are not compatible so there's not a serious problem but might
  be better checking the relation ship there.  My first proposal
  performed mutual check between the two but we need to check only
  unidirectionally.

  if (readFile < 0)
  {
     if (!expectedTLEs)
     {
        expectedTLEs = readTimeLineHistory(receiveTLI);
+       if (!tliOfPointInHistory(receiveTLI, expectedTLEs))
+          ereport(ERROR, "the received timeline %d is not found in the history file for timeline %d");


> > Conclusion:
> > - I think now we agree on the point that initializing expectedTLEs
> > with the recovery target timeline is the right fix.
> > - We still have some differences of opinion about what was the
> > original problem in the base code which was fixed by the commit
> > (ee994272ca50f70b53074f0febaec97e28f83c4e).
> 
> I am also still concerned about whether we understand in exactly what
> cases the current logic doesn't work. We seem to more or less agree on
> the fix, but I don't think we really understand precisely what case we
> are fixing.

Does the discussion above make sense?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: multi-install PostgresNode fails with older postgres versions
Next
From: "houzj.fnst@fujitsu.com"
Date:
Subject: Fdw batch insert error out when set batch_size > 65535