Re: Offline enabling/disabling of data checksums - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: Offline enabling/disabling of data checksums |
Date | |
Msg-id | alpine.DEB.2.21.1903201712370.24709@lancre Whole thread Raw |
In response to | Re: Offline enabling/disabling of data checksums (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Offline enabling/disabling of data checksums
|
List | pgsql-hackers |
Michaël-san, >> I think that a clear warning not to run any cluster command in parallel, >> under pain of possible cluster corruption, and possibly other caveats about >> replication, should appear in the documentation. > > I still have the following extra documentation in my notes: Ok, it should have been included in the patch. > + <refsect1> > + <title>Notes</title> > + <para> > + When disabling or enabling checksums in a cluster of multiple instances, ISTM that a "postgres cluster" was like an Oracle instance: See https://www.postgresql.org/docs/devel/creating-cluster.html So the vocabulary used above seems misleading. I'm not sure how to name an Oracle cluster in postgres lingo, though. > + it is recommended to stop all the instances of the cluster before doing > + the switch to all the instances consistently. I think that the motivation/risks should appear before the solution. "As xyz ..., ...", or there at least the logical link should be outlined. It is not clear for me whether the following sentences, which seems specific to "pg_rewind", are linked to the previous advice, which seems rather to refer to streaming replication? > + When using a cluster with > + tools which perform direct copies of relation file blocks (for example > + <xref linkend="app-pgrewind"/>), enabling or disabling checksums can > + lead to page corruptions in the shape of incorrect checksums if the > + operation is not done consistently across all nodes. Destroying all > + the standbys in a cluster first, enabling or disabling checksums on > + the primary and finally recreate the cluster nodes from scratch is > + also safe. > + </para> > + </refsect1> Should not disabling in reverse order be safe? the checksum are not checked afterwards? > This sounds kind of enough for me but.. ISTM that the risks could be stated more clearly. >> I also think that some mechanism should be used to prevent such occurence, >> whether in this patch or another. > > I won't counter-argue on that. This answer is ambiguous. >> I still think that the control file update should be made *after* the data >> directory has been synced, otherwise the control file could be updated while >> some data files are not. It just means exchanging two lines. > > The parent path of the control file needs also to be flushed to make > the change durable, just doing things the same way pg_rewind keeps the > code simple in my opinion. I do not understand. The issue I'm refering to is, when enabling: - data files are updated in fs cache - control file is updated in fs cache * fsync is called - updated control file gets written - data files are being written... but reboot occurs while fsyncing is still in progress After the reboot, some data files are not fully updated with their checksums, although the controlfiles tells that they are. It should then fail after a restart when a no-checksum page is loaded? What am I missing? Also, I do not see how exchanging two lines makes the code less simple. >> If the conditional sync is moved before the file update, the file update >> should pass do_sync to update_controlfile. > > For the durability of the operation, the current order is intentional. See my point above: I think that this order can lead to cluster corruption. If not, please be kind enough to try to explain in more details why I'm wrong. -- Fabien.
pgsql-hackers by date: