Re: Race condition in recovery? - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Race condition in recovery? |
Date | |
Msg-id | 20210524.113402.1922481024406047229.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Race condition in recovery? (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
At Fri, 21 May 2021 12:52:54 -0400, Robert Haas <robertmhaas@gmail.com> wrote in > I had trouble following it completely, but I didn't really spot > anything that seemed definitely wrong. However, I don't understand > what it has to do with where we are now. What I want to understand is: > under exactly what circumstances does it matter that > WaitForWALToBecomeAvailable(), when currentSource == XLOG_FROM_STREAM, > will stream from receiveTLI rather than recoveryTargetTLI? Extracing related descriptions from my previous mail, - recoveryTargetTimeLine is initialized with ControlFile->checkPointCopy.ThisTimeLineID - readRecoveryCommandFile(): ...or in the case of latest, move it forward up to the maximum timeline among the history files found in either pg_wal or archive. - ReadRecord...XLogFileReadAnyTLI Tries to load the history file for recoveryTargetTLI either from pg_wal or archive onto local TLE list, if the history file is not found, use a generateed list with one entry for the recoveryTargetTLI. (b) If such a segment is *not* found, expectedTLEs is left NIL. Usually recoveryTargetTLI is equal to the last checkpoint TLI. (c) However, in the case where timeline switches happened in the segment and the recoveryTargetTLI has been increased, that is, the history file for the recoveryTargetTLI is found in pg_wal or archive, that is, the issue raised here, recoveryTargetTLI becomes the future timline of the checkpoint TLI. - WaitForWALToBecomeAvailable In the case of (c) recoveryTargetTLI > checkpoint TLI. In this case we expecte that checkpint TLI is in the history of recoveryTargetTLI. Otherwise recovery failse^h. This case is similar to the case (a) but the relationship between recoveryTargetTLI and the checkpoint TLI is not confirmed yet. ReadRecord barks later if they are not compatible so there's not a serious problem but might be better checking the relation ship there. My first proposal performed mutual check between the two but we need to check only unidirectionally. === So the condition for the Dilip's case is, as you wrote in another mail: - ControlFile->checkPointCopy.ThisTimeLineID is in the older timeline. - Archive or pg_wal offers the history file for the newer timeline. - The segment for the checkpoint is not found in pg_wal nor in archive. That is, - A grandchild(c) node is stopped - Then the child node(b) is promoted. - Clear pg_wal directory of (c) then connect it to (b) *before* (b) archives the segment for the newer timeline of the timeline-switching segments. (if we have switched at segment 3, TLI=1, the segment file of the older timeline is renamed to .partial, then create the same segment for TLI=2. The former is archived while promotion is performed but the latter won't be archive until the segment ends.) The orinal case of after the commit ee994272ca, - recoveryTargetTimeLine is initialized with ControlFile->checkPointCopy.ThisTimeLineID (X) (Before the commit, we created the one-entry expectedTLEs consists only of ControlFile->checkPointCopy.ThisTimeLineID.) - readRecoveryCommandFile(): Move recoveryTargetTLI forward to the specified target timline if the history file for the timeline is found, or in the case of latest, move it forward up to the maximum timeline among the history files found in either pg_wal or archive. - ReadRecord...XLogFileReadAnyTLI Tries to load the history file for recoveryTargetTLI either from pg_wal or archive onto local TLE list, if the history file is not found, use a generateed list with one entry for the recoveryTargetTLI. (b) If such a segment is *not* found, expectedTLEs is left NIL. Usually recoveryTargetTLI is equal to the last checkpoint TLI. - WaitForWALToBecomeAvailable if we have had no segments for the last checkpoint, initiate streaming from the REDO point of the last checkpoint. We should have all history files until receiving segment data. after sufficient WAL data has been received, the only cases where expectedTLEs is still NIL are the (b) and (c) above. In the case of (b) recoveryTargetTLI == checkpoint TLI. So I thought that the commit fixed this scenario. Even in this case, ReadRecord fails because the checkpoint segment contains pages for the older timeline which is not in expectedTLEs if we did (X). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: