Re: Race condition in recovery? - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Race condition in recovery? |
Date | |
Msg-id | CAFiTN-szz5O-8bCcdtef5OYg4kP-RKL78b+S8AVWoPvpBr2N2Q@mail.gmail.com Whole thread Raw |
In response to | Re: Race condition in recovery? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Race condition in recovery?
|
List | pgsql-hackers |
On Fri, May 14, 2021 at 2:37 AM Robert Haas <robertmhaas@gmail.com> wrote: > > So why does this use recoveryTargetTLI instead of receiveTLI only > conditionally? Why not do it all the time? > > The hard thing about this code is that the assumptions are not very > clear. If we don't know why something is a certain way, then we might > break things if we change it. Worse yet, if nobody else knows why it's > like that either, then who knows what assumptions they might be > making? It's hard to be sure that any change is safe. > > But that being said, we have a clear definition from the comments for > what expectedTLEs is supposed to contain, and it's only going to end > up with those contents if we initialize it from recoveryTargetTLI. So > I am inclined to think that we ought to do that always, and if it > breaks something, then that's a sign that some other part of the code > also needs fixing, because apparently that hypothetical other part of > the code doesn't work if expctedTLEs contains what the comments say > that it should. > > Now maybe that's the wrong idea. But if so, then we're saying that the > definition of expectedTLEs needs to be changed, and we should update > the comments with the new definition, whatever it is. A lot of the > confusion here results from the fact that the code and comments are > inconsistent and we can't tell whether that's intentional or > inadvertent. Let's not leave the next person who looks at this code > wondering the same thing about whatever changes we make. I am not sure that have you noticed the commit id which changed the definition of expectedTLEs, Heikki has committed that change so adding him in the list to know his opinion. ===== ee994272ca50f70b53074f0febaec97e28f83c4e Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58 Committer: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58 Delay reading timeline history file until it's fetched from master. ..... Without the timeline history file, recovering that file will fail as the older timeline ID is not recognized to be an ancestor of the target timeline. If you try to recover from such a backup, using only streaming replication to fetch the WAL, this patch is required for that to work. ===== Part of this commit message says that it will not identify the recoveryTargetTLI as the ancestor of the checkpoint timeline (without history file). I did not understand what it is trying to say. Does it is trying to say that even though the recoveryTargetTLI is the ancestor of the checkpoint timeline but we can not track that because we don't have a history file? So to handle this problem change the definition of expectedTLEs to directly point to the checkpoint timeline? Because before this commit, we were directly initializing expectedTLEs with the history file of recoveryTargetTLI, we were not even waiting for reading the checkpoint, but under this commit, it is changed. I am referring to the below code which was deleted by this commit: ======== @@ -5279,49 +5299,6 @@ StartupXLOG(void) */ readRecoveryCommandFile(); - /* Now we can determine the list of expected TLIs */ - expectedTLEs = readTimeLineHistory(recoveryTargetTLI); - - /* - * If the location of the checkpoint record is not on the expected - * timeline in the history of the requested timeline, we cannot proceed: - * the backup is not part of the history of the requested timeline. - */ - if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) != - ControlFile->checkPointCopy.ThisTimeLineID) - { ========= -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: