Re: Race condition in recovery? - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Race condition in recovery? |
Date | |
Msg-id | 20210521.112105.27166595366072396.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Race condition in recovery? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Race condition in recovery?
Re: Race condition in recovery? Re: Race condition in recovery? |
List | pgsql-hackers |
At Thu, 20 May 2021 13:49:10 -0400, Robert Haas <robertmhaas@gmail.com> wrote in > On Tue, May 18, 2021 at 1:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > Yeah, it will be a fake 1-element list. But just to be clear that > > 1-element can only be "ControlFile->checkPointCopy.ThisTimeLineID" and > > nothing else, do you agree to this? Because we initialize > > recoveryTargetTLI to this value and we might change it in > > readRecoveryCommandFile() but for that, we must get the history file, > > so if we are talking about the case where we don't have the history > > file then "recoveryTargetTLI" will still be > > "ControlFile->checkPointCopy.ThisTimeLineID". > > I don't think your conclusion is correct. As I understand it, you're > talking about the state before > ee994272ca50f70b53074f0febaec97e28f83c4e, because as of now > readRecoveryCommandFile() no longer exists in master. Before that > commit, StartupXLOG did this: > > recoveryTargetTLI = ControlFile->checkPointCopy.ThisTimeLineID; > readRecoveryCommandFile(); > expectedTLEs = readTimeLineHistory(recoveryTargetTLI); > > Now, readRecoveryCommandFile() can change recoveryTargetTLI. Before > doing so, it will call existsTimeLineHistory if > recovery_target_timeline was set to an integer, and findNewestTimeLine > if it was set to latest. In the first case, existsTimeLineHistory() > calls RestoreArchivedFile(), but that restores it using a temporary > name, and KeepFileRestoredFromArchive is not called, so we might have > the timeline history in RECOVERYHISTORY but that's not the filename > we're actually going to try to read from inside readTimeLineHistory(). > In the second case, findNewestTimeLine() will call > existsTimeLineHistory() which results in the same situation. So I > think when readRecoveryCommandFile() returns expectedTLI can be > different but the history file can be absent since it was only ever > restored under a temporary name. Anyway it seems that the commit tried to fix an issue happens without using WAL archive. https://www.postgresql.org/message-id/50E43C57.5050101%40vmware.com > That leaves one case not covered: If you take a backup with plain > "pg_basebackup" from a standby, without -X, and the first WAL segment > contains a timeline switch (ie. you take the backup right after a > failover), and you try to recover from it without a WAL archive, it > doesn't work. This is the original issue that started this thread, > except that I used "-x" in my original test case. The problem here is > that even though streaming replication will fetch the timeline history > file when it connects, at the very beginning of recovery, we expect that > we already have the timeline history file corresponding the initial > timeline available, either in pg_xlog or the WAL archive. By the time > streaming replication has connected and fetched the history file, we've > already initialized expectedTLEs to contain just the latest TLI. To fix > that, we should delay calling readTimeLineHistoryFile() until streaming > replication has connected and fetched the file. > If the first segment read by recovery contains a timeline switch, the first > pages have older timeline than segment timeline. However, if > exepectedTLEs contained only the segment timeline, we cannot know if > we can use the record. In that case the following error is issued. If expectedTLEs is initialized with the pseudo list, tliOfPointInHistory always return the recoveryTargetTLI regardless of the given LSN so the checking about timelines later doesn't work. And later ReadRecord is surprised to see a page of an unknown timeline. "unexpected timeline ID 1 in log segment" So the objective is to initialize expectedTLEs with the right content of the history file for the recoveryTargetTLI until ReadRecord fetches the first record. After the fix things are working as the following. - recoveryTargetTimeLine is initialized with ControlFile->checkPointCopy.ThisTimeLineID - readRecoveryCommandFile(): Move recoveryTargetTLI forward to the specified target timline if the history file for the timeline is found, or in the case of latest, move it forward up to the maximum timeline among the history files found in either pg_wal or archive. !!! Anyway recoveryTargetTLI won't goes back behind the checkpoint TLI. - ReadRecord...XLogFileReadAnyTLI Tries to load the history file for recoveryTargetTLI either from pg_wal or archive onto local TLE list, if the history file is not found, use a generateed list with one entry for the recoveryTargetTLI. (a) If the specified segment file for any timeline in the TLE list is found, expectedTLEs is initialized with the local list. No need to worry about expectedTLEs any longer. (b) If such a segment is *not* found, expectedTLEs is left NIL. Usually recoveryTargetTLI is equal to the last checkpoint TLI. (c) However, in the case where timeline switches happened in the segment and the recoveryTargetTLI has been increased, that is, the history file for the recoveryTargetTLI is found in pg_wal or archive, that is, the issue raised here, recoveryTargetTLI becomes the future timline of the checkpoint TLI. (d) The history file for the recoveryTargetTLI is *not* found but the segment file is found, expectedTLEs is initialized with the generated list, which doesn't contain past timelines. In this case, recoveryTargetTLI has not moved from the initial value of the checkpoint TLI. If the REDO point is before a timeline switch, the page causes FATAL in ReadRecord later. However, I think there cannot be a case where segment file is found before corresponding history file. (Except for TLI=1, which is no problem.) - WaitForWALToBecomeAvailable if we have had no segments for the last checkpoint, initiate streaming from the REDO point of the last checkpoint. We should have all history files until receiving segment data. after sufficient WAL data has been received, the only cases where expectedTLEs is still NIL are the (b) and (c) above. In the case of (b) recoveryTargetTLI == checkpoint TLI. In the case of (c) recoveryTargetTLI > checkpoint TLI. In this case we expecte that checkpint TLI is in the history of recoveryTargetTLI. Otherwise recovery failse. This case is similar to the case (a) but the relationship between recoveryTargetTLI and the checkpoint TLI is not confirmed yet. ReadRecord barks later if they are not compatible so there's not a serious problem but might be better checking the relation ship there. My first proposal performed mutual check between the two but we need to check only unidirectionally. if (readFile < 0) { if (!expectedTLEs) { expectedTLEs = readTimeLineHistory(receiveTLI); + if (!tliOfPointInHistory(receiveTLI, expectedTLEs)) + ereport(ERROR, "the received timeline %d is not found in the history file for timeline %d"); > > Conclusion: > > - I think now we agree on the point that initializing expectedTLEs > > with the recovery target timeline is the right fix. > > - We still have some differences of opinion about what was the > > original problem in the base code which was fixed by the commit > > (ee994272ca50f70b53074f0febaec97e28f83c4e). > > I am also still concerned about whether we understand in exactly what > cases the current logic doesn't work. We seem to more or less agree on > the fix, but I don't think we really understand precisely what case we > are fixing. Does the discussion above make sense? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: