Re: Changing the state of data checksums in a running cluster - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Changing the state of data checksums in a running cluster
Date
Msg-id CAPpHfdudYaR8O=YW+cnchD-bH90V2zVY=+J3D35GbS-oU4vVxw@mail.gmail.com
Whole thread Raw
In response to Re: Changing the state of data checksums in a running cluster  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers
Hi!

On Sat, Mar 15, 2025 at 7:33 PM Tomas Vondra <tomas@vondra.me> wrote:
On 3/15/25 17:26, Andres Freund wrote:
> Jo.
>
> On 2025-03-15 16:50:02 +0100, Tomas Vondra wrote:
>> Thanks, here's an updated patch version
>
> FWIW, this fails in CI;
>
> https://cirrus-ci.com/build/4678473324691456
> On all OSs:
> [16:08:36.331] #   Failed test 'options --locale-provider=icu --locale=und --lc-*=C: no stderr'
> [16:08:36.331] #   at /tmp/cirrus-ci-build/src/bin/initdb/t/001_initdb.pl line 132.
> [16:08:36.331] #          got: '2025-03-15 16:08:26.216 UTC [63153] LOG:  XLogCtl->data_checksum_version 0 ControlFile->data_checksum_version 0
> [16:08:36.331] # 2025-03-15 16:08:26.216 UTC [63153] LOG:  XLogCtl->data_checksum_version 0 ControlFile->data_checksum_version 0 (UPDATED)
>
> Windows & Compiler warnings:
> [16:05:08.723] ../src/backend/storage/page/bufpage.c(25): fatal error C1083: Cannot open include file: 'execinfo.h': No such file or directory
>
> [16:18:52.385] bufpage.c:25:10: fatal error: execinfo.h: No such file or directory
> [16:18:52.385]    25 | #include <execinfo.h>
> [16:18:52.385]       |          ^~~~~~~~~~~~
>
> Greetings,

Yeah, that's just the "debug stuff" - I don't expect any of that to be
included in the commit, I only posted it for convenience. It adds a lot
of debug logging, which I hope might help others to understand what the
problem with checksums on standby is.

I took a look at this patch.  I have following notes.
1) I think reporting of these errors could be better, more detailed.  Especially the second one could be similar to some of other errors on checksums processing.
            ereport(ERROR,
                    (errmsg("failed to start background worker to process data checksums")));
            ereport(ERROR,
                    (errmsg("unable to enable data checksums in cluster")));

2) ProcessAllDatabases() contains loop, which repeats scanning the new databases for checkums.  It continues while there are new database on each iteration.  Could we just limit the number of iterations to 2?  Given at each step we're calling WaitForAllTransactionsToFinish(), everything that gets created after first WaitForAllTransactionsToFinish() call should have checksums enabled in the beginning.

------
Regards,
Alexander Korotkov
Supabase 

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Quote-less file names in error messages
Next
From: Daniel Gustafsson
Date:
Subject: Re: Using read stream in autoprewarm