Re: Race condition in recovery? - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Race condition in recovery?
Date
Msg-id CAFiTN-tO+OxiiNiM8oE=+10xhiMZkGrUZ-L1bn1SRChjzVnn7Q@mail.gmail.com
Whole thread Raw
In response to Race condition in recovery?  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Race condition in recovery?
List pgsql-hackers
On Tue, Mar 2, 2021 at 3:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

> =====
> ee994272ca50f70b53074f0febaec97e28f83c4e
> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>  2013-01-03 14:11:58
> Committer: Heikki Linnakangas <heikki.linnakangas@iki.fi>  2013-01-03 14:11:58
>
>     Delay reading timeline history file until it's fetched from master.
>
>     Streaming replication can fetch any missing timeline history files from the
>     master, but recovery would read the timeline history file for the target
>     timeline before reading the checkpoint record, and before walreceiver has
>     had a chance to fetch it from the master. Delay reading it, and the sanity
>     checks involving timeline history, until after reading the checkpoint
>     record.
>
>     There is at least one scenario where this makes a difference: if you take
>     a base backup from a standby server right after a timeline switch, the
>     WAL segment containing the initial checkpoint record will begin with an
>     older timeline ID. Without the timeline history file, recovering that file
>     will fail as the older timeline ID is not recognized to be an ancestor of
>     the target timeline. If you try to recover from such a backup, using only
>     streaming replication to fetch the WAL, this patch is required for that to
>     work.
> =====

The above commit avoid initializing the expectedTLEs from the
recoveryTargetTLI as shown in below hunk from this commit.

@@ -5279,49 +5299,6 @@ StartupXLOG(void)
      */
     readRecoveryCommandFile();

-    /* Now we can determine the list of expected TLIs */
-    expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
-

I think the fix for the problem will be that, after reading/validating
the checkpoint record, we can free the current value of expectedTLEs
and reinitialize it based on the recoveryTargetTLI as shown in the
attached patch?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: A test for replay of regression tests
Next
From: Tomas Vondra
Date:
Subject: Re: WIP: WAL prefetch (another approach)