Re: Offline enabling/disabling of data checksums - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: Offline enabling/disabling of data checksums
Date
Msg-id alpine.DEB.2.21.1903201712370.24709@lancre
Whole thread Raw
In response to Re: Offline enabling/disabling of data checksums  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Offline enabling/disabling of data checksums
List pgsql-hackers
Michaël-san,

>> I think that a clear warning not to run any cluster command in parallel,
>> under pain of possible cluster corruption, and possibly other caveats about
>> replication, should appear in the documentation.
>
> I still have the following extra documentation in my notes:

Ok, it should have been included in the patch.

> + <refsect1>
> +  <title>Notes</title>
> +  <para>
> +   When disabling or enabling checksums in a cluster of multiple instances,

ISTM that a "postgres cluster" was like an Oracle instance:

See https://www.postgresql.org/docs/devel/creating-cluster.html

So the vocabulary used above seems misleading. I'm not sure how to name an 
Oracle cluster in postgres lingo, though.

> +   it is recommended to stop all the instances of the cluster before doing
> +   the switch to all the instances consistently.

I think that the motivation/risks should appear before the solution. "As 
xyz ..., ...", or there at least the logical link should be outlined.

It is not clear for me whether the following sentences, which seems 
specific to "pg_rewind", are linked to the previous advice, which seems 
rather to refer to streaming replication?

> +   When using a cluster with
> +   tools which perform direct copies of relation file blocks (for example
> +   <xref linkend="app-pgrewind"/>), enabling or disabling checksums can
> +   lead to page corruptions in the shape of incorrect checksums if the
> +   operation is not done consistently across all nodes.  Destroying all
> +   the standbys in a cluster first, enabling or disabling checksums on
> +   the primary and finally recreate the cluster nodes from scratch is
> +   also safe.
> +  </para>
> + </refsect1>

Should not disabling in reverse order be safe? the checksum are not 
checked afterwards?

> This sounds kind of enough for me but..

ISTM that the risks could be stated more clearly.

>> I also think that some mechanism should be used to prevent such occurence,
>> whether in this patch or another.
>
> I won't counter-argue on that.

This answer is ambiguous.

>> I still think that the control file update should be made *after* the data
>> directory has been synced, otherwise the control file could be updated while
>> some data files are not. It just means exchanging two lines.
>
> The parent path of the control file needs also to be flushed to make
> the change durable, just doing things the same way pg_rewind keeps the
> code simple in my opinion.

I do not understand. The issue I'm refering to is, when enabling:

  - data files are updated in fs cache
  - control file is updated in fs cache
  * fsync is called
  - updated control file gets written
  - data files are being written...
    but reboot occurs while fsyncing is still in progress

After the reboot, some data files are not fully updated with their 
checksums, although the controlfiles tells that they are. It should then 
fail after a restart when a no-checksum page is loaded?

What am I missing?

Also, I do not see how exchanging two lines makes the code less simple.

>> If the conditional sync is moved before the file update, the file update
>> should pass do_sync to update_controlfile.
>
> For the durability of the operation, the current order is intentional.

See my point above: I think that this order can lead to cluster 
corruption. If not, please be kind enough to try to explain in more 
details why I'm wrong.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Automated way to find actual COMMIT LSN of subxact LSN
Next
From: Alexander Kuzmenkov
Date:
Subject: Re: Optimze usage of immutable functions as relation