Re: Changing the state of data checksums in a running cluster - Mailing list pgsql-hackers

From Daniel Gustafsson
Subject Re: Changing the state of data checksums in a running cluster
Date
Msg-id 4020B0AE-6286-462F-BD50-3AAD170F08C9@yesql.se
Whole thread Raw
In response to Re: Changing the state of data checksums in a running cluster  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
> On 28 Mar 2026, at 00:30, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

> +1 for committing 0001 right away.

Thanks for confirming, I'll do that early next week.

> Some leftover stuff remains, related to the change that it no longer rebuilds the list of databases:

Thanks for the re-review.  I've been staring at this for so long that it's
blinding..

>> + * The DataChecksumsWorker will compile a list of all databases at the start,
>> + * and once all are processed will regenerate the list and start over
>> + * processing any new entries. Once there are no new entries on the list,
>> + * processing will end.  The regenerated list is required since databases can
>> + * be created concurrently with data checksum processing, using a template
>> + * database which has yet to be processed. All databases MUST BE successfully
>> + * processed in order for data checksums to be enabled, the only exception are
>> + * databases which are dropped before having been processed.
>
> and here:

Fixed.

>> +/*
>> + * Test to remove an entry from the Databaselist to force re-processing since
>> + * not all databases could be processed in the first iteration of the loop.
>> + */
>> +void
>> +dc_dblist(const char *name, const void *private_data, void *arg)
>
> The above talks about re-processing, but that doesn't happen anymore. I suspect the test that uses this is now
obsoleteor broken. 

This was quite obsolete and removed.

>> + * Check if this database has been processed already, and if so
>> + * whether it should be retried or skipped.
>> + */
>> + entry = (DataChecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
>> +   HASH_FIND, NULL);
>> +
>
> I guess this works, but wouldn't it be simpler to remove entries from DatabaseList when they're successfully
processed?

This did indeed suffer badly from leftover-itis from the previous
re-processing, and as I tried to refactor it I came to the conclusion that we
might as well do away with the retry logic entirely.  The RETRYDB state would
only be triggered by the background worker failing to start, and without any
prompting to the user to reconfigure we'd just run through the retries and fail
eventually anyways.  The attached retains just the core part of the loop with a
greatly simplified logic as a result.  Improving on that can be revisited in
future cycles in case this goes in, better to get a solid foundation first.

>> + errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + errmsg("create database strategy \"%s\" not allowed when data checksums are being enabled",
>> +   strategy));
>> dbstrategy = CREATEDB_FILE_COPY;
>> + }
>> else
>> ereport(ERROR,
>> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>
> Is this enough? Is it possible that you start CREATE DATABASE in file_copy mode, and while it's already running but
hasn'tcopied everything yet, you turn checksums on? 

Then the state will change to inprogress-on, and the launcher will wait for all
ongoing transactions to finish before generating the list of databases.  Any
such db should thus get entered to the list for processing unless I'm missing
something.

--
Daniel Gustafsson


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Next
From: Thomas Munro
Date:
Subject: Re: pg_waldump: support decoding of WAL inside tarfile