Re: Race condition in recovery? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Race condition in recovery?
Date
Msg-id 20210524.113402.1922481024406047229.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Race condition in recovery?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
At Fri, 21 May 2021 12:52:54 -0400, Robert Haas <robertmhaas@gmail.com> wrote in 
> I had trouble following it completely, but I didn't really spot
> anything that seemed definitely wrong. However, I don't understand
> what it has to do with where we are now. What I want to understand is:
> under exactly what circumstances does it matter that
> WaitForWALToBecomeAvailable(), when currentSource == XLOG_FROM_STREAM,
> will stream from receiveTLI rather than recoveryTargetTLI?

Extracing related descriptions from my previous mail,

- recoveryTargetTimeLine is initialized with
  ControlFile->checkPointCopy.ThisTimeLineID

- readRecoveryCommandFile():
  ...or in the case of
  latest, move it forward up to the maximum timeline among the history
  files found in either pg_wal or archive.

- ReadRecord...XLogFileReadAnyTLI

  Tries to load the history file for recoveryTargetTLI either from
  pg_wal or archive onto local TLE list, if the history file is not
  found, use a generateed list with one entry for the
  recoveryTargetTLI.

  (b) If such a segment is *not* found, expectedTLEs is left
    NIL. Usually recoveryTargetTLI is equal to the last checkpoint
    TLI.

  (c) However, in the case where timeline switches happened in the
    segment and the recoveryTargetTLI has been increased, that is, the
    history file for the recoveryTargetTLI is found in pg_wal or
    archive, that is, the issue raised here, recoveryTargetTLI becomes
    the future timline of the checkpoint TLI.

- WaitForWALToBecomeAvailable

In the case of (c) recoveryTargetTLI > checkpoint TLI.  In this case
  we expecte that checkpint TLI is in the history of
  recoveryTargetTLI. Otherwise recovery failse^h.  This case is similar
  to the case (a) but the relationship between recoveryTargetTLI and
  the checkpoint TLI is not confirmed yet. ReadRecord barks later if
  they are not compatible so there's not a serious problem but might
  be better checking the relation ship there.  My first proposal
  performed mutual check between the two but we need to check only
  unidirectionally.

===
So the condition for the Dilip's case is, as you wrote in another mail:

- ControlFile->checkPointCopy.ThisTimeLineID is in the older timeline.
- Archive or pg_wal offers the history file for the newer timeline.
- The segment for the checkpoint is not found in pg_wal nor in archive.

That is,

- A grandchild(c) node is stopped
- Then the child node(b) is promoted.

- Clear pg_wal directory of (c) then connect it to (b) *before* (b)
  archives the segment for the newer timeline of the
  timeline-switching segments.  (if we have switched at segment 3,
  TLI=1, the segment file of the older timeline is renamed to
  .partial, then create the same segment for TLI=2.  The former is
  archived while promotion is performed but the latter won't be
  archive until the segment ends.)


The orinal case of after the commit ee994272ca,

- recoveryTargetTimeLine is initialized with
  ControlFile->checkPointCopy.ThisTimeLineID

(X) (Before the commit, we created the one-entry expectedTLEs consists
   only of ControlFile->checkPointCopy.ThisTimeLineID.)

- readRecoveryCommandFile():

  Move recoveryTargetTLI forward to the specified target timline if
  the history file for the timeline is found, or in the case of
  latest, move it forward up to the maximum timeline among the history
  files found in either pg_wal or archive.

- ReadRecord...XLogFileReadAnyTLI

  Tries to load the history file for recoveryTargetTLI either from
  pg_wal or archive onto local TLE list, if the history file is not
  found, use a generateed list with one entry for the
  recoveryTargetTLI.

  (b) If such a segment is *not* found, expectedTLEs is left
    NIL. Usually recoveryTargetTLI is equal to the last checkpoint
    TLI.

- WaitForWALToBecomeAvailable

  if we have had no segments for the last checkpoint, initiate
  streaming from the REDO point of the last checkpoint. We should have
  all history files until receiving segment data.

  after sufficient WAL data has been received, the only cases where
  expectedTLEs is still NIL are the (b) and (c) above.

  In the case of (b) recoveryTargetTLI == checkpoint TLI.

So I thought that the commit fixed this scenario. Even in this case,
ReadRecord fails because the checkpoint segment contains pages for the
older timeline which is not in expectedTLEs if we did (X).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Yugo NAGATA
Date:
Subject: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Forget close an open relation in ReorderBufferProcessTXN()