Re: Race between KeepFileRestoredFromArchive() and restartpoint - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Race between KeepFileRestoredFromArchive() and restartpoint
Date
Msg-id 20220803072847.GB3817792@rfd.leadboat.com
Whole thread Raw
In response to Re: Race between KeepFileRestoredFromArchive() and restartpoint  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Race between KeepFileRestoredFromArchive() and restartpoint
List pgsql-hackers
On Wed, Aug 03, 2022 at 11:24:17AM +0900, Kyotaro Horiguchi wrote:
> At Tue, 2 Aug 2022 16:03:42 -0500, Don Seiler <don@seiler.us> wrote in 
> > could not link file “pg_wal/xlogtemp.18799" to
> > > “pg_wal/000000010000D45300000010”: File exists

> Hmm.  It seems like a race condition betwen StartupXLOG() and
> RemoveXlogFIle(). We need wider extent of ContolFileLock. Concretely
> taking ControlFileLock before deciding the target xlog file name in
> RemoveXlogFile() seems to prevent this happening. (If this is correct
> this is a live issue on the master branch.)

RemoveXlogFile() calls InstallXLogFileSegment() with find_free=true.  The
intent of find_free=true is to make it okay to pass a target xlog file that
ceases to be a good target.  (InstallXLogFileSegment() searches for a good
target while holding ControlFileLock.)  Can you say more about how that proved
to be insufficient?



pgsql-hackers by date:

Previous
From: Ronan Dunklau
Date:
Subject: Fix gin index cost estimation
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Does having pg_last_wal_replay_lsn[replica] >= pg_current_wal_insert_lsn[master] guarantee that the replica is caught up?