Re: Offline enabling/disabling of data checksums - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: Offline enabling/disabling of data checksums
Date
Msg-id alpine.DEB.2.21.1903130805330.4059@lancre
Whole thread Raw
In response to Re: Offline enabling/disabling of data checksums  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Offline enabling/disabling of data checksums
List pgsql-hackers
Michaël-san,

> Now the set of patches is:
> - 0001, add --enable and --disable.  I have tweaked a bit the patch so
> as "action" is replaced by "mode" which is more consistent with other
> tools like pg_ctl.  pg_indent was also complaining about one of the
> new enum structures.

Patch applies cleanly, compiles, various make check ok, doc build ok.

I'm still at odds with the exit(1) behavior when there is nothing to do.

If this behavior is kept, I think that the documentation needs to be 
improved because "failed" does not describe a no-op-was-needed to me.

"""
If enabling or disabling checksums, the exit status is nonzero if the 
operation failed.
"""

Maybe: "... if the operation failed or the requested setting is already 
active." would at least describe clearly the implemented behavior.

  +       printf(_("  -c, --check            check data checksums\n"));
  +       printf(_("                         This is the default mode if nothing is specified.\n"));

I'm not sure of the punctuation logic on the help line: the first sentence 
does not end with a ".". I could not find an instance of this style in 
other help on pg commands. I'd suggest "check data checksums (default)" 
would work around and be more in line with other commands help.

I see a significant locking issue, which I discussed on other threads 
without convincing anyone. I could do the following things:

I slowed down pg_checksums by adding a 0.1s sleep when scanning a new 
file, then started a "pg_checksums --enable" on a stopped cluster, then 
started the cluster while the enabling was in progress, then connected and 
updated data. Hmmm. Then I stopped while the slow enabling was still in 
progress. Then I could also run a fast pg_checksums --enable in parallel, 
overtaking the first one... then when the fast one finished, I started the 
cluster again. When the slow one finished, it overwrote the control file, 
I had a running cluster with a control file which did not say so, so I 
could disable the checksum. Hmmm again. Ok, I could not generate a 
inconsistent state because on stopping the cluster the cluster file is 
overwritten with the initial state from the point of view of postmater, 
but it does not look good.

I do not think it is a good thing that two commands can write to the data 
directory at the same time, really.

About fsync-ing: ISTM that it is possible that the control file is written 
to disk while data are still not written, so a failure in between would 
leave the cluster with an inconsistent state. I think that it should fsync 
the data *then* update the control file and fsync again on that one.

> - 0002, add --no-sync.

Patch applies cleanly, compiles, various make checks are ok, doc build ok.

Fine with me.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Georgios Kokolatos
Date:
Subject: Re: CPU costs of random_zipfian in pgbench
Next
From: Peter Eisentraut
Date:
Subject: Re: Fix volatile vs. pointer confusion