Re: race condition when writing pg_control - Mailing list pgsql-hackers

From Bossart, Nathan
Subject Re: race condition when writing pg_control
Date
Msg-id D162D812-D724-4B79-8D6C-CB1EFC9A796E@amazon.com
Whole thread Raw
In response to Re: race condition when writing pg_control  (Michael Paquier <michael@paquier.xyz>)
Responses Re: race condition when writing pg_control
List pgsql-hackers
On 5/22/20, 10:40 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Sat, May 23, 2020 at 01:00:17AM +0900, Fujii Masao wrote:
>> Per my quick check, XLogReportParameters() seems to have the similar issue,
>> i.e., it updates the control file without taking ControlFileLock.
>> Maybe we should fix this at the same time?
>
> Yeah.  It also checks the control file values, implying that we should
> have LW_SHARED taken at least at the beginning, but this lock cannot
> be upgraded we need LW_EXCLUSIVE the whole time.  I am wondering if we
> should check with an assert if ControlFileLock is taken when going
> through UpdateControlFile().  We have one code path at the beginning
> of redo where we don't need a lock close to the backup_label file
> checks, but we could just pass down a boolean flag to the routine to
> handle that case.  Another good thing in having an assert is that any
> new caller of UpdateControlFile() would need to think about the need
> of a lock.

While an assertion in UpdateControlFile() would not have helped us
catch the problem I initially reported, it does seem worthwhile to add
it.  I have attached a patch that adds this assertion and also
attempts to fix XLogReportParameters().  Since there is only one place
where we feel it is safe to call UpdateControlFile() without a lock, I
just changed it to take the lock.  I don't think this adds any sort of
significant contention risk, and IMO it is a bit cleaner than the
boolean flag.

For the XLogReportParameters() fix, I simply added an exclusive lock
acquisition for the portion that updates the values in shared memory
and calls UpdateControlFile().  IIUC the first part of this function
that accesses several ControlFile values should be safe, as none of
them can be updated after server start.

Nathan


Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: hash join error improvement (old)
Next
From: Robert Haas
Date:
Subject: Re: some grammar refactoring