Re: Race condition in recovery? - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Race condition in recovery?
Date
Msg-id CAFiTN-tT7YhuqFN-79OKrYOsu+DggU-gD7CHv6VGJnaiCaW2bg@mail.gmail.com
Whole thread Raw
In response to Re: Race condition in recovery?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Race condition in recovery?  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Mon, May 17, 2021 at 8:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> Before the commit expectedTLEs is always initialized with just one
> entry for the TLI of the last checkpoint record.

Right

> (1) If XLogFileReadAnyTLI() found the segment but no history file
> found, that is, using the dummy TLE-list, expectedTLEs is initialized
> with the dummy one-entry list. So there's no behavioral change in this
> aspect.

Yeah, you are right.

> (2) If we didn't find the segment for the checkpoint record, it starts
> replication and fetches history files and WAL records then revisits
> XLogFileReadAnyTLI.  Now we have both the history file and segments,
> it successfully reads the recood.  The difference of expectedTLEs made
> by the patch is having just one entry or the all entries for the past.

Correct.

> Assuming that we keep expectedTLEs synced with recoveryTargetTLI,
> rescanLatestTimeLine updates the list properly so no need to worry
> about the future.  So the issue would be in the past timelines.  After
> reading the checkpoint record, if we need to rewind to the previous
> timeline for the REDO point, the dummy list is inconvenient.
>
> So there is a possibility that the patch fixed the case (2), where the
> standby doesn't have both the segment for the checkpoint record and
> the history file for the checkpoint, and the REDO point is on the last
> TLI.  If it is correct, the patch still fails for the case (1), that
> is, the issue raised here. Anyway it would be useless (and rahter
> harmful) to initialize expectedTLEs based on receiveTLI there.
>
> So my resul there is:
>
> The commit fixed the case (2)

Yes, by maintaining the entire history instead of one entry if history
was missing.

> The fix caused the issue for the case (1).

Basically, before this commit expectedTLEs and recoveryTargetTLI were
in always in sync which this patch broke.

> The proposed fix fixes the case (1), caused by the commit.

Right, I agree with the fix. So fix should be just to change that one
line and initialize expectedTLEs from recoveryTargetTLI


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Yugo NAGATA
Date:
Subject: Re: Implementing Incremental View Maintenance
Next
From: Tatsuo Ishii
Date:
Subject: Re: Typo in README.barrier