Re: race condition when writing pg_control - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: race condition when writing pg_control |
Date | |
Msg-id | CALDaNm1NFJD2ea7t1F8u6ZF4xzuejNfmPknHfZgrgqcXjFuRMw@mail.gmail.com Whole thread Raw |
In response to | Re: race condition when writing pg_control (Noah Misch <noah@leadboat.com>) |
List | pgsql-hackers |
On Wed, 31 Jul 2024 at 05:25, Noah Misch <noah@leadboat.com> wrote: > > On Mon, Jul 15, 2024 at 03:44:48PM +1200, Thomas Munro wrote: > > On Fri, Jul 12, 2024 at 11:43 PM Noah Misch <noah@leadboat.com> wrote: > > > On Sat, May 18, 2024 at 05:29:12PM +1200, Thomas Munro wrote: > > > > On Fri, May 17, 2024 at 4:46 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > > The specific problem here is that LocalProcessControlFile() runs in > > > > > every launched child for EXEC_BACKEND builds. Windows uses > > > > > EXEC_BACKEND, and Windows' NTFS file system is one of the two file > > > > > systems known to this list to have the concurrent read/write data > > > > > mashing problem (the other being ext4). > > > > > > > First idea idea I've come up with to avoid all of that: pass a copy of > > > > the "proto-controlfile", to coin a term for the one read early in > > > > postmaster startup by LocalProcessControlFile(). As far as I know, > > > > the only reason we need it is to suck some settings out of it that > > > > don't change while a cluster is running (mostly can't change after > > > > initdb, and checksums can only be {en,dis}abled while down). Right? > > > > Children can just "import" that sucker instead of calling > > > > LocalProcessControlFile() to figure out the size of WAL segments yada > > > > yada, I think? Later they will attach to the real one in shared > > > > memory for all future purposes, once normal interlocking is allowed. > > > > > > I like that strategy, particularly because it recreates what !EXEC_BACKEND > > > backends inherit from the postmaster. It might prevent future bugs that would > > > have been specific to EXEC_BACKEND. > > > > Thanks for looking! Yeah, that is a good way to put it. > > Oops, the way I put it turned out to be false. Postmaster has ControlFile > pointing to shared memory before forking backends, so !EXEC_BACKEND children > are born that way. In the postmaster, ControlFile->checkPointCopy->redo does > change after each checkpoint. > > > The only other idea I can think of is that the Postmaster could take > > all of the things that LocalProcessControlFile() wants to extract from > > the file, and transfer them via that struct used for EXEC_BACKEND as > > individual variables, instead of this new proto-controlfile copy. I > > think it would be a bigger change with no obvious-to-me additional > > benefit, so I didn't try it. > > Yeah, that would be more future-proof but a bigger change. One could argue > for a yet-larger refactor so even the !EXEC_BACKEND case doesn't read those > fields from ControlFile memory. Then we could get rid of ControlFile ever > being set to something other than NULL or a shmem pointer. ControlFileData's > mix of initdb-time fields, postmaster-start-time fields, and changes-anytime > fields is inconvenient here. > > The unknown is the value of that future proofing. Much EXEC_BACKEND early > startup code is shared with postmaster startup, which can assume it's the only > process. I can't rule out a future bug where that shared code does a read > that's harmless in postmaster startup but harmful when an EXEC_BACKEND child > runs the same read. For a changes-anytime field, the code would already be > subtly buggy in EXEC_BACKEND today, since it would be reading without an > LWLock. For a postmaster-start-time field, things should be okay so long as > we capture the proto ControlFileData after the last change to such fields. > That invariant is not trivial to achieve, but it's not gravely hard either. > > A possible middle option would be to use the proto control file, but > explicitly set its changes-anytime fields to bogus values. > > What's your preference? @Thomas Munro , others - If anyone has suggestions on which approach might work best, please share them to help move this discussion forward. Regards, Vignesh
pgsql-hackers by date: