Re: Changing the state of data checksums in a running cluster - Mailing list pgsql-hackers
From | Daniel Gustafsson |
---|---|
Subject | Re: Changing the state of data checksums in a running cluster |
Date | |
Msg-id | E156C55A-5FAB-40A9-9E9B-E25A49C35EDC@yesql.se Whole thread Raw |
In response to | Re: Changing the state of data checksums in a running cluster (Tomas Vondra <tomas@vondra.me>) |
Responses |
Re: Changing the state of data checksums in a running cluster
|
List | pgsql-hackers |
> On 13 Mar 2025, at 12:03, Tomas Vondra <tomas@vondra.me> wrote: > On 3/13/25 10:54, Daniel Gustafsson wrote: >>> On 12 Mar 2025, at 14:16, Tomas Vondra <tomas@vondra.me> wrote: >>> I believe the approach is correct, but the number of possible states >>> (e.g. after a crash/restart) seems a bit complex. I wonder if there's a >>> better way to handle this, but I can't think of any. Ideas? >> >> Not sure if this moves the needle too far in terms of complexity wrt to the >> previous version of the patch, there were already multiple copies. > > It does add one more place (XLogCtl->data_checksum_version) to store the > current state, so it's not that much more complex, ofc. But I was not > really comparing this to the previous patch version, I meant the state > space in general - all possible combinations of all the flags (control > file, local + xlogct). Fair point. > I wonder if it might be possible to have a more thorough validation of > the transitions. We already have that for the LocalDataChecksumVersion, > thanks to the asserts - and it was damn useful, otherwise we would not > have noticed this issue for a long time, I think. > > I wonder if we can have similar checks for the other flags. I'm pretty > sure we can have the same checks for XLogCtl, right? I don't see why not, they should abide by the same rules. > I'm not quite sure > about ControlFile - can't that "skip" some of the changes, e.g. if we do > (enable->disable->enable) within a single checkpoint? Need to check. For enable->disable->enable within a single checkpoint then ControlFile should never see the disable state. > This also reminds me I had a question about the barrier - can't it > happen a process gets to process multiple barriers at the same time? I > mean, let's say it gets stuck for a while, and the cluster happens to go > through disable+enable. Won't it then see both barriers? That'd be a > problem, because the core processes the barriers in the order determined > by the enum value, not in the order the barriers happened. Which means > it might break the expected state transitions again (and end with the > wrong local value). I haven't tried, though. Interesting, that seems like a general deficiency in the barriers, surely processing them in-order would be more intuitive? That would probably require some form of Lamport clock though. >>> One issue I ran into is the postmaster does not seem to be processing >>> the barriers, and thus not getting info about the data_checksum_version >>> changes. >> >> Makes sense, that seems like a pretty reasonable constraint for the barrier. > > Not sure I follow. What's a reasonable constraint? That the postmaster deosn't process them. >>> That's fine until it needs to launch a child process (e.g. a >>> walreceiver), which will then see the LocalDataChecksumVersion as of the >>> start of the instance, not the "current" one. I fixed this by explicitly >>> refreshing the value in postmaster_child_launch(), but maybe I'm missing >>> something. (Also, EXEC_BACKEND may need to handle this too.) >> >> The pg_checksums test is failing for me on this version due to the GUC not >> being initialized, don't we need something like the below as well? (With a >> comment explaining why ReadControlFile wasn't enough.) >> >> @@ -5319,6 +5319,7 @@ LocalProcessControlFile(bool reset) >> Assert(reset || ControlFile == NULL); >> ControlFile = palloc(sizeof(ControlFileData)); >> ReadControlFile(); >> + SetLocalDataChecksumVersion(ControlFile->data_checksum_version); >> > > Yeah, I think this (or something like it) is missing. Thanks for confirming. >> +uint32 >> +GetLocalDataChecksumVersion(void) >> +{ >> + return LocalDataChecksumVersion; >> +} >> + >> +/* >> + * Get the *current* data_checksum_version (might not be written to control >> + * file yet). >> + */ >> +uint32 >> +GetCurrentDataChecksumVersion(void) >> +{ >> + return XLogCtl->data_checksum_version; >> +} >> I wonder if CachedDataChecksumVersion would be more appropriate to distinguish >> it from the Current value, and also to make appear less like actual copies of >> controlfile values like LocalMinRecoveryPoint. Another thought is if we should >> have the GetLocalDataChecksumVersion() API? GetCurrentDataChecksumVersion() >> should be a better API no? >> > > FWIW those functions are for debug logging only, I needed to print the > values in a couple places outside xlog.c. I don't intend to make that > part of the patch. Ah, gotcha, I never applied the debug patch from the patchset so I figured this was a planned API. The main question still stands though, if LocalDataCheckXX can be confusing and CachedDataCheckXX would be better in order to distinguish it from actual controlfile copies? -- Daniel Gustafsson
pgsql-hackers by date: