Re: Offline enabling/disabling of data checksums - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Offline enabling/disabling of data checksums
Date
Msg-id 20190320125451.GA20192@paquier.xyz
Whole thread Raw
In response to Re: Offline enabling/disabling of data checksums  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: Offline enabling/disabling of data checksums
List pgsql-hackers
On Wed, Mar 20, 2019 at 10:38:36AM +0100, Fabien COELHO wrote:
> Hmmm… so nothing:-)

The core of the feature is still here, fortunately.

> I think that a clear warning not to run any cluster command in parallel,
> under pain of possible cluster corruption, and possibly other caveats about
> replication, should appear in the documentation.

I still have the following extra documentation in my notes:
+ <refsect1>
+  <title>Notes</title>
+  <para>
+   When disabling or enabling checksums in a cluster of multiple instances,
+   it is recommended to stop all the instances of the cluster before doing
+   the switch to all the instances consistently.  When using a cluster with
+   tools which perform direct copies of relation file blocks (for example
+   <xref linkend="app-pgrewind"/>), enabling or disabling checksums can
+   lead to page corruptions in the shape of incorrect checksums if the
+   operation is not done consistently across all nodes.  Destroying all
+   the standbys in a cluster first, enabling or disabling checksums on
+   the primary and finally recreate the cluster nodes from scratch is
+   also safe.
+  </para>
+ </refsect1>
This sounds kind of enough for me but..

> I also think that some mechanism should be used to prevent such occurence,
> whether in this patch or another.

I won't counter-argue on that.

> I still think that the control file update should be made *after* the data
> directory has been synced, otherwise the control file could be updated while
> some data files are not. It just means exchanging two lines.

The parent path of the control file needs also to be flushed to make
the change durable, just doing things the same way pg_rewind keeps the
code simple in my opinion.

> If the conditional sync is moved before the file update, the file update
> should pass do_sync to update_controlfile.

For the durability of the operation, the current order is
intentional.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: Add exclusive backup deprecation notes to documentation
Next
From: Michael Paquier
Date:
Subject: Re: Add exclusive backup deprecation notes to documentation