Re: race condition when writing pg_control - Mailing list pgsql-hackers

From vignesh C
Subject Re: race condition when writing pg_control
Date
Msg-id CALDaNm1NFJD2ea7t1F8u6ZF4xzuejNfmPknHfZgrgqcXjFuRMw@mail.gmail.com
Whole thread Raw
In response to Re: race condition when writing pg_control  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Wed, 31 Jul 2024 at 05:25, Noah Misch <noah@leadboat.com> wrote:
>
> On Mon, Jul 15, 2024 at 03:44:48PM +1200, Thomas Munro wrote:
> > On Fri, Jul 12, 2024 at 11:43 PM Noah Misch <noah@leadboat.com> wrote:
> > > On Sat, May 18, 2024 at 05:29:12PM +1200, Thomas Munro wrote:
> > > > On Fri, May 17, 2024 at 4:46 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > > > The specific problem here is that LocalProcessControlFile() runs in
> > > > > every launched child for EXEC_BACKEND builds.  Windows uses
> > > > > EXEC_BACKEND, and Windows' NTFS file system is one of the two file
> > > > > systems known to this list to have the concurrent read/write data
> > > > > mashing problem (the other being ext4).
> > >
> > > > First idea idea I've come up with to avoid all of that: pass a copy of
> > > > the "proto-controlfile", to coin a term for the one read early in
> > > > postmaster startup by LocalProcessControlFile().  As far as I know,
> > > > the only reason we need it is to suck some settings out of it that
> > > > don't change while a cluster is running (mostly can't change after
> > > > initdb, and checksums can only be {en,dis}abled while down).  Right?
> > > > Children can just "import" that sucker instead of calling
> > > > LocalProcessControlFile() to figure out the size of WAL segments yada
> > > > yada, I think?  Later they will attach to the real one in shared
> > > > memory for all future purposes, once normal interlocking is allowed.
> > >
> > > I like that strategy, particularly because it recreates what !EXEC_BACKEND
> > > backends inherit from the postmaster.  It might prevent future bugs that would
> > > have been specific to EXEC_BACKEND.
> >
> > Thanks for looking!  Yeah, that is a good way to put it.
>
> Oops, the way I put it turned out to be false.  Postmaster has ControlFile
> pointing to shared memory before forking backends, so !EXEC_BACKEND children
> are born that way.  In the postmaster, ControlFile->checkPointCopy->redo does
> change after each checkpoint.
>
> > The only other idea I can think of is that the Postmaster could take
> > all of the things that LocalProcessControlFile() wants to extract from
> > the file, and transfer them via that struct used for EXEC_BACKEND as
> > individual variables, instead of this new proto-controlfile copy.  I
> > think it would be a bigger change with no obvious-to-me additional
> > benefit, so I didn't try it.
>
> Yeah, that would be more future-proof but a bigger change.  One could argue
> for a yet-larger refactor so even the !EXEC_BACKEND case doesn't read those
> fields from ControlFile memory.  Then we could get rid of ControlFile ever
> being set to something other than NULL or a shmem pointer.  ControlFileData's
> mix of initdb-time fields, postmaster-start-time fields, and changes-anytime
> fields is inconvenient here.
>
> The unknown is the value of that future proofing.  Much EXEC_BACKEND early
> startup code is shared with postmaster startup, which can assume it's the only
> process.  I can't rule out a future bug where that shared code does a read
> that's harmless in postmaster startup but harmful when an EXEC_BACKEND child
> runs the same read.  For a changes-anytime field, the code would already be
> subtly buggy in EXEC_BACKEND today, since it would be reading without an
> LWLock.  For a postmaster-start-time field, things should be okay so long as
> we capture the proto ControlFileData after the last change to such fields.
> That invariant is not trivial to achieve, but it's not gravely hard either.
>
> A possible middle option would be to use the proto control file, but
> explicitly set its changes-anytime fields to bogus values.
>
> What's your preference?

@Thomas Munro , others -  If anyone has suggestions on which approach
might work best, please share them to help move this discussion
forward.

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Possibly hard-to-read message
Next
From: Pavel Stehule
Date:
Subject: Re: Allow default \watch interval in psql to be configured