Re: Offline enabling/disabling of data checksums - Mailing list pgsql-hackers

From Michael Banck
Subject Re: Offline enabling/disabling of data checksums
Date
Msg-id 1546945543.32387.8.camel@credativ.de
Whole thread Raw
In response to Re: Offline enabling/disabling of data checksums  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: Offline enabling/disabling of data checksums  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
Hi,

Am Donnerstag, den 27.12.2018, 12:26 +0100 schrieb Fabien COELHO:
> > > For enable/disable, while the command is running, it should mark the 
> > > cluster as opened to prevent an unwanted database start. I do not see 
> > > where this is done.
> > > 
> > > You have pretty much the same class of problems if you attempt to
> > > start a cluster on which pg_rewind or the existing pg_verify_checksums
> > > is run after these have scanned the control file to make sure that
> > > they work on a cleanly-stopped instance. [...]
> > 
> > I think it comes down to what the outcome is. If we're going to end up with
> > a corrupt database (e.g. one where checksums aren't set everywhere but they
> > are marked as such in pg_control) then it's not acceptable. If the only
> > outcome is the tool gives an error that's not an error and if re-run it's
> > fine, then it's a different story.
> 
> ISTM that such an outcome is indeed a risk, as a starting postgres could 
> update already checksummed pages without putting a checksum. It could be 
> even worse, although with a (very) low probability, with updates 
> overwritten on a race condition between the processes. In any case, no 
> error would be reported before much later, with invalid checksums or 
> inconsistent data, or undetected forgotten committed data.

One difference between pg_rewind and pg_checksums is that the latter
potentially runs for a longer time (or rather a non-trivial amount of
time, compared to pg_rewind), so the margin of error of another DBA
saying "oh, that DB is down, let me start it again" might be much
higher. 

The question is how to reliably do this in an acceptable way? Just
faking a postmaster.pid sounds pretty hackish to me, do you have any
suggestions here?

The alternative would be to document that it needs to be made sure that
the database is not started up during enabling of checksums, yielding to
pilot error.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Displaying and dumping of table access methods
Next
From: Padam Chopra
Date:
Subject: GCI-2019 Mentoring