Re: race condition when writing pg_control - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: race condition when writing pg_control
Date
Msg-id CAPpHfdvvPQGT_AST9zrFLZEQX5tDCbAzK6n6jpqprbrpNzk3Mg@mail.gmail.com
Whole thread
In response to Re: race condition when writing pg_control  (Álvaro Herrera <alvherre@kurilemu.de>)
List pgsql-hackers
Hi!

On Mon, Feb 2, 2026 at 4:35 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2024-May-18, Thomas Munro wrote:
>
> > First idea idea I've come up with to avoid all of that: pass a copy of
> > the "proto-controlfile", to coin a term for the one read early in
> > postmaster startup by LocalProcessControlFile().  As far as I know,
> > the only reason we need it is to suck some settings out of it that
> > don't change while a cluster is running (mostly can't change after
> > initdb, and checksums can only be {en,dis}abled while down).  Right?
> > Children can just "import" that sucker instead of calling
> > LocalProcessControlFile() to figure out the size of WAL segments yada
> > yada, I think?  Later they will attach to the real one in shared
> > memory for all future purposes, once normal interlocking is allowed.
> >
> > I dunno.  Draft patch attached.  Better plans welcome.  This passes CI
> > on Linux systems afflicted by EXEC_BACKEND, and Windows.  Thoughts?
>
> Has this problem been addressed?  Looking at the known-buildfarm-
> failures page,
>
https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#culicidae_failed_to_restart_server_due_to_incorrect_checksum_in_control_file
> there are still some failures of that ilk, last in 2026-01-21.
>
> So, was this "proto-controlfile" idea discarded?  I see Noah downthread
> proposed something somewhat more sophisticated than this, setting some
> values to garbage to prevent reading invalid values.  I imagine that
> would be on top of Thomas' patch, so I have rebased it and moved to the
> next commitfest.

I came from another thread [1].  There I came to the idea of passing
initial pg_control copy via BackendParameters [1] same as discussed
here.  After studying this thread, I get that there is critics to this
approach, because it creates risk that stale values from initial
pg_control copy could be used somewhere.

I'd like to propose another approach.  It actually appears that the
only thing we actually need from ControlFileData before we get shmem
attached is data_checksum_version.  Attached patch passes just
data_checksum_version via BackendParameters.  So, there is no local
pg_control copy, and no risk to accidentally use it.

Links.
1. https://www.postgresql.org/message-id/f59335a4-83ff-438a-a30e-7cf2200276b6%40postgrespro.ru
2. https://www.postgresql.org/message-id/CAPpHfdsXkEWUeLUG4zh9q%3DhjpsOCMgsbN_XZh-6JL0z1NaNMqQ%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

Attachment

pgsql-hackers by date:

Previous
From: Ayush Tiwari
Date:
Subject: Re: [PATCH] Fix WAIT FOR LSN cleanup on subtransaction abort
Next
From: Amit Kapila
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication