Re: Changing the state of data checksums in a running cluster - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Changing the state of data checksums in a running cluster
Date
Msg-id c92b5d8b-bc03-47bc-b209-2e4a719eee32@iki.fi
Whole thread Raw
In response to Re: Changing the state of data checksums in a running cluster  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: Changing the state of data checksums in a running cluster
List pgsql-hackers
On 28/03/2026 00:03, Daniel Gustafsson wrote:
> The attached rebase contains lots more polish, mostly renaming variable names
> for clarity, tidying up comments and documentation and some smaller bits of
> cleanup like moving more code out of xlog.c.
> 
> This version runs all the tests in a normal test-run, with a few of them pared
> down with larger runs gated by PG_TEST_EXTRA.  I thinkt the tests are still too
> expensive in the event of getting committed, but it's helpful to have them
> during dev and test.  Executing pgbench sometimes fails in CI but I've been
> unable to reproduce that so not entirely sure what is going on there.
> 
> Heikki, Andres and Tomas; as you have been reviewing this patchset, what do you
> feel is left for considering this for commit?  (Apart from figuring out the CI
> test thing mentioned above which I think is a buildsystem issue.) I think 0001
> could be considered independently of 0002 and is cleanup in it's own right.

+1 for committing 0001 right away.


Some leftover stuff remains, related to the change that it no longer 
rebuilds the list of databases:

> + * The DataChecksumsWorker will compile a list of all databases at the start,
> + * and once all are processed will regenerate the list and start over
> + * processing any new entries. Once there are no new entries on the list,
> + * processing will end.  The regenerated list is required since databases can
> + * be created concurrently with data checksum processing, using a template
> + * database which has yet to be processed. All databases MUST BE successfully
> + * processed in order for data checksums to be enabled, the only exception are
> + * databases which are dropped before having been processed.

and here:

> +/*
> + * Test to remove an entry from the Databaselist to force re-processing since
> + * not all databases could be processed in the first iteration of the loop.
> + */
> +void
> +dc_dblist(const char *name, const void *private_data, void *arg)

The above talks about re-processing, but that doesn't happen anymore. I 
suspect the test that uses this is now obsolete or broken.

> +    while (true)
> +    {
> +        int            processed_databases = 0;
> +
> +        foreach_ptr(DataChecksumsWorkerDatabase, db, DatabaseList)
> +        {
> +            DataChecksumsWorkerResult result;
> +            DataChecksumsWorkerResultEntry *entry;
> +            bool        found;
> +
> +            /*
> +             * Check if this database has been processed already, and if so
> +             * whether it should be retried or skipped.
> +             */
> +            entry = (DataChecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
> +                                                                   HASH_FIND, NULL);
> +

I guess this works, but wouldn't it be simpler to remove entries from 
DatabaseList when they're successfully processed?

> --- a/src/backend/commands/dbcommands.c
> +++ b/src/backend/commands/dbcommands.c
> @@ -1044,7 +1044,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
>          if (pg_strcasecmp(strategy, "wal_log") == 0)
>              dbstrategy = CREATEDB_WAL_LOG;
>          else if (pg_strcasecmp(strategy, "file_copy") == 0)
> +        {
> +            if (DataChecksumsInProgress())
> +                ereport(ERROR,
> +                        errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                        errmsg("create database strategy \"%s\" not allowed when data checksums are being enabled",
> +                               strategy));
>              dbstrategy = CREATEDB_FILE_COPY;
> +        }
>          else
>              ereport(ERROR,
>                      (errcode(ERRCODE_INVALID_PARAMETER_VALUE),

Is this enough? Is it possible that you start CREATE DATABASE in 
file_copy mode, and while it's already running but hasn't copied 
everything yet, you turn checksums on?

- Heikki




pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: unclear OAuth error message
Next
From: Andres Freund
Date:
Subject: Re: Don't synchronously wait for already-in-progress IO in read stream