Re: Changing the state of data checksums in a running cluster - Mailing list pgsql-hackers

From Daniel Gustafsson
Subject Re: Changing the state of data checksums in a running cluster
Date
Msg-id E156C55A-5FAB-40A9-9E9B-E25A49C35EDC@yesql.se
Whole thread Raw
In response to Re: Changing the state of data checksums in a running cluster  (Tomas Vondra <tomas@vondra.me>)
Responses Re: Changing the state of data checksums in a running cluster
List pgsql-hackers
> On 13 Mar 2025, at 12:03, Tomas Vondra <tomas@vondra.me> wrote:
> On 3/13/25 10:54, Daniel Gustafsson wrote:
>>> On 12 Mar 2025, at 14:16, Tomas Vondra <tomas@vondra.me> wrote:

>>> I believe the approach is correct, but the number of possible states
>>> (e.g. after a crash/restart) seems a bit complex. I wonder if there's a
>>> better way to handle this, but I can't think of any. Ideas?
>>
>> Not sure if this moves the needle too far in terms of complexity wrt to the
>> previous version of the patch, there were already multiple copies.
>
> It does add one more place (XLogCtl->data_checksum_version) to store the
> current state, so it's not that much more complex, ofc. But I was not
> really comparing this to the previous patch version, I meant the state
> space in general - all possible combinations of all the flags (control
> file, local + xlogct).

Fair point.

> I wonder if it might be possible to have a more thorough validation of
> the transitions. We already have that for the LocalDataChecksumVersion,
> thanks to the asserts - and it was damn useful, otherwise we would not
> have noticed this issue for a long time, I think.
>
> I wonder if we can have similar checks for the other flags. I'm pretty
> sure we can have the same checks for XLogCtl, right?

I don't see why not, they should abide by the same rules.

> I'm not quite sure
> about ControlFile - can't that "skip" some of the changes, e.g. if we do
> (enable->disable->enable) within a single checkpoint? Need to check.

For enable->disable->enable within a single checkpoint then ControlFile should
never see the disable state.

> This also reminds me I had a question about the barrier - can't it
> happen a process gets to process multiple barriers at the same time? I
> mean, let's say it gets stuck for a while, and the cluster happens to go
> through disable+enable. Won't it then see both barriers? That'd be a
> problem, because the core processes the barriers in the order determined
> by the enum value, not in the order the barriers happened. Which means
> it might break the expected state transitions again (and end with the
> wrong local value). I haven't tried, though.

Interesting, that seems like a general deficiency in the barriers, surely
processing them in-order would be more intuitive?  That would probably require
some form of Lamport clock though.

>>> One issue I ran into is the postmaster does not seem to be processing
>>> the barriers, and thus not getting info about the data_checksum_version
>>> changes.
>>
>> Makes sense, that seems like a pretty reasonable constraint for the barrier.
>
> Not sure I follow. What's a reasonable constraint?

That the postmaster deosn't process them.

>>> That's fine until it needs to launch a child process (e.g. a
>>> walreceiver), which will then see the LocalDataChecksumVersion as of the
>>> start of the instance, not the "current" one. I fixed this by explicitly
>>> refreshing the value in postmaster_child_launch(), but maybe I'm missing
>>> something. (Also, EXEC_BACKEND may need to handle this too.)
>>
>> The pg_checksums test is failing for me on this version due to the GUC not
>> being initialized, don't we need something like the below as well?  (With a
>> comment explaining why ReadControlFile wasn't enough.)
>>
>> @@ -5319,6 +5319,7 @@ LocalProcessControlFile(bool reset)
>>        Assert(reset || ControlFile == NULL);
>>        ControlFile = palloc(sizeof(ControlFileData));
>>        ReadControlFile();
>> +       SetLocalDataChecksumVersion(ControlFile->data_checksum_version);
>>
>
> Yeah, I think this (or something like it) is missing.

Thanks for confirming.

>> +uint32
>> +GetLocalDataChecksumVersion(void)
>> +{
>> + return LocalDataChecksumVersion;
>> +}
>> +
>> +/*
>> + * Get the *current* data_checksum_version (might not be written to control
>> + * file yet).
>> + */
>> +uint32
>> +GetCurrentDataChecksumVersion(void)
>> +{
>> + return XLogCtl->data_checksum_version;
>> +}
>> I wonder if CachedDataChecksumVersion would be more appropriate to distinguish
>> it from the Current value, and also to make appear less like actual copies of
>> controlfile values like LocalMinRecoveryPoint.  Another thought is if we should
>> have the GetLocalDataChecksumVersion() API?  GetCurrentDataChecksumVersion()
>> should be a better API no?
>>
>
> FWIW those functions are for debug logging only, I needed to print the
> values in a couple places outside xlog.c. I don't intend to make that
> part of the patch.

Ah, gotcha, I never applied the debug patch from the patchset so I figured this
was a planned API.  The main question still stands though, if LocalDataCheckXX
can be confusing and CachedDataCheckXX would be better in order to
distinguish it from actual controlfile copies?

--
Daniel Gustafsson




pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: support virtual generated column not null constraint
Next
From: Ashutosh Bapat
Date:
Subject: Re: Test to dump and restore objects left behind by regression