Re: Incorrect checksum in control file with pg_rewind test - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Incorrect checksum in control file with pg_rewind test
Date
Msg-id CAPpHfdsXkEWUeLUG4zh9q=hjpsOCMgsbN_XZh-6JL0z1NaNMqQ@mail.gmail.com
Whole thread
In response to Re: Incorrect checksum in control file with pg_rewind test  ("Maksim.Melnikov" <m.melnikov@postgrespro.ru>)
Responses Re: Incorrect checksum in control file with pg_rewind test
List pgsql-hackers
Hi, Maksim!

On Fri, Nov 7, 2025 at 5:19 PM Maksim.Melnikov
<m.melnikov@postgrespro.ru> wrote:
> just to clarify, it isn't pg_rewind related issue and can fire
> spontaneously.
> I don't have any strong scenario how to reproduce it, tests sometimes
> fired on our local CI, but as you can see on thread [1],
> where the same issue for frontends was discussed, it is very hard to
> reproduce and there wasn't scenario how to do it too.
>
> Some dirty hacks to reproduce it was described here [2], and I've tried
> it on master branch:
> First of all I applied patch
> 0001-XXX-Dirty-hack-to-clobber-control-file-for-testing.patch from [2],
> then compile app with
> -DEXEC_BACKEND and exec command in psql
> do $$ begin loop perform pg_update_control_file(); end loop; end; $$;
> Also I've run pgbench command
> for run in {1..5000}; do pgbench -c50 -t100 -j6 -S postgres ; done
> And eventually got error
>
> 2025-11-07 17:58:33.139 MSK [2472504] FATAL:  incorrect checksum in
> control file
> 2025-11-07 17:58:33.141 MSK [2472501] LOG:  could not receive data from
> client: Connection reset by peer
> 2025-11-07 17:58:33.143 MSK [2472505] LOG:  could not send data to
> client: Broken pipe
> 2025-11-07 17:58:33.143 MSK [2472505] FATAL:  connection to client lost

Thank you for spotting this issue and proposing a patch.  The fork
builds don't have this problem, because fork replicated contents of
LocalControlFile to the new process.  And the postmaster has
consistent snapshot of control file as there is no concurrent process
which could write it and that moment.  But EXEC_BACKEND, even with
your patch, may end up different processes with different contents of
LocalControlFile.  I don't see it could cause a material bug right
now, but I see this as undesirable divergence between fork and
EXEC_BACKEND behaviors.  I propose an alternative approach copy the
contents of control file to the new process via BackendParameters.
This approach solves two problems at once: no torn reads, and no
divergence between fork and EXEC_BACKEND.

------
Regards,
Alexander Korotkov
Supabase

Attachment

pgsql-hackers by date:

Previous
From: lakshmi
Date:
Subject: Re: ECPG: inconsistent behavior with the document in “GET/SET DESCRIPTOR.”
Next
From: Ashutosh Bapat
Date:
Subject: Re: [Bug][patch]: After dropping the last label from a property graph element, invoking pg_get_propgraphdef() triggers an assertion failure