Home > mailing lists

Re: Offline enabling/disabling of data checksums - Mailing list pgsql-hackers

From	Fabien COELHO
Subject	Re: Offline enabling/disabling of data checksums
Date	March 13, 2019 09:08:33
Msg-id	alpine.DEB.2.21.1903130805330.4059@lancre Whole thread Raw
In response to	Re: Offline enabling/disabling of data checksums (Michael Paquier <michael@paquier.xyz>)
Responses	Re: Offline enabling/disabling of data checksums
List	pgsql-hackers

Tree view

Michaël-san,

> Now the set of patches is:
> - 0001, add --enable and --disable.  I have tweaked a bit the patch so
> as "action" is replaced by "mode" which is more consistent with other
> tools like pg_ctl.  pg_indent was also complaining about one of the
> new enum structures.

Patch applies cleanly, compiles, various make check ok, doc build ok.

I'm still at odds with the exit(1) behavior when there is nothing to do.

If this behavior is kept, I think that the documentation needs to be 
improved because "failed" does not describe a no-op-was-needed to me.

"""
If enabling or disabling checksums, the exit status is nonzero if the 
operation failed.
"""

Maybe: "... if the operation failed or the requested setting is already 
active." would at least describe clearly the implemented behavior.

  +       printf(_("  -c, --check            check data checksums\n"));
  +       printf(_("                         This is the default mode if nothing is specified.\n"));

I'm not sure of the punctuation logic on the help line: the first sentence 
does not end with a ".". I could not find an instance of this style in 
other help on pg commands. I'd suggest "check data checksums (default)" 
would work around and be more in line with other commands help.

I see a significant locking issue, which I discussed on other threads 
without convincing anyone. I could do the following things:

I slowed down pg_checksums by adding a 0.1s sleep when scanning a new 
file, then started a "pg_checksums --enable" on a stopped cluster, then 
started the cluster while the enabling was in progress, then connected and 
updated data. Hmmm. Then I stopped while the slow enabling was still in 
progress. Then I could also run a fast pg_checksums --enable in parallel, 
overtaking the first one... then when the fast one finished, I started the 
cluster again. When the slow one finished, it overwrote the control file, 
I had a running cluster with a control file which did not say so, so I 
could disable the checksum. Hmmm again. Ok, I could not generate a 
inconsistent state because on stopping the cluster the cluster file is 
overwritten with the initial state from the point of view of postmater, 
but it does not look good.

I do not think it is a good thing that two commands can write to the data 
directory at the same time, really.

About fsync-ing: ISTM that it is possible that the control file is written 
to disk while data are still not written, so a failure in between would 
leave the cluster with an inconsistent state. I think that it should fsync 
the data *then* update the control file and fsync again on that one.

> - 0002, add --no-sync.

Patch applies cleanly, compiles, various make checks are ok, doc build ok.

Fine with me.

-- 
Fabien.

pgsql-hackers by date:

From: Georgios Kokolatos
Date: 13 March 2019, 08:55:10
Subject: Re: CPU costs of random_zipfian in pgbench

From: Peter Eisentraut
Date: 13 March 2019, 09:13:28
Subject: Re: Fix volatile vs. pointer confusion

Re: Offline enabling/disabling of data checksums - Mailing list pgsql-hackers

Previous

Next