Re: Changing the state of data checksums in a running cluster - Mailing list pgsql-hackers

From Daniel Gustafsson
Subject Re: Changing the state of data checksums in a running cluster
Date
Msg-id 17C1D2E0-C12D-40E2-B4B8-B9CCECA45A88@yesql.se
Whole thread Raw
In response to Re: Changing the state of data checksums in a running cluster  (Tomas Vondra <tomas@vondra.me>)
Responses Re: Changing the state of data checksums in a running cluster
List pgsql-hackers
> On 20 Aug 2025, at 16:37, Tomas Vondra <tomas@vondra.me> wrote:

> This happens quite regularly, it's not hard to hit. But I've only seen
> it to happen on a FSM, and only right after immediate shutdown. I don't
> think that's quite expected.
> 
> I believe the built-in TAP tests (with injection points) can't catch
> this, because there's no concurrent activity while flipping checksums
> on/off. It'd be good to do something like that, by running pgbench in
> the background, or something like that.

In searching for this bug I opted for implementing a version of the stress
tests as a TAP test, see 006_concurrent_pgbench.pl in the attached patch
version.  It's gated behind PG_TEST_EXTRA since it's clearly not something
which can be enabled by default (if this goes in this need to be re-done to
provide two levels IMO, but during testing this is more convenient).  I'm
curious to see which improvements you can think to make it stress the code to
the breaking point.

> I think there's a minor issue in how pg_checksums validates state before
> checking the data.
> 
> The current patch simply does:
> 
>  if (ControlFile->data_checksum_version == 0 &&
>      mode == PG_MODE_CHECK)
>      pg_fatal("data checksums are not enabled in cluster");
> 
> and that worked when the version was either 0 or 1. But now it can be
> also 2 or 3, for inprogress-on / inprogress-off, and if the cluster gets
> shut down at the right moment, that can end in the control file.

Good point, I've changed the test to check for checksums being enabled rather
than checking if they are disabled.

--
Daniel Gustafsson


Attachment

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: GetNamedLWLockTranche crashes on Windows in normal backend
Next
From: Tomas Vondra
Date:
Subject: Re: index prefetching