Re: Offline enabling/disabling of data checksums - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Offline enabling/disabling of data checksums
Date
Msg-id 20181227232529.GA3210@paquier.xyz
Whole thread Raw
In response to Re: Offline enabling/disabling of data checksums  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Offline enabling/disabling of data checksums  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Thu, Dec 27, 2018 at 03:46:48PM +0100, Tomas Vondra wrote:
> On 12/27/18 11:43 AM, Magnus Hagander wrote:
>> Should we double-check with packagers that this won't cause a problem?
>> Though the fact that it's done in a major release should make it
>> perfectly fine I think -- and it's a smaller change than when we did all
>> those xlog->wal changes...
>>
>
> I think it makes little sense to not rename the tool now. I'm pretty
> sure we'd end up doing that sooner or later anyway, and we'll just live
> with a misnamed tool until then.

Do you think that a thread Would on -packagers be more adapted then?

> I don't know, TBH. I agree making the on/off change cheaper makes moves
> us closer to 'on' by default, because they may disable it if needed. But
> it's not the whole story.
>
> If we enable checksums by default, 99% users will have them enabled.
> That means more people will actually observe data corruption cases that
> went unnoticed so far. What shall we do with that? We don't have very
> good answers to that (tooling, docs) and I'd say "disable checksums" is
> not a particularly amazing response in this case :-(

Enabling data checksums by default is still a couple of steps ahead,
without a way to control them better..

> FWIW I don't know what to do about that. We certainly can't prevent the
> data corruption, but maybe we could help with fixing it (although that's
> bound to be low-level work).

Yes, data checksums are extremely useful to tell people when the
problem is *not* from Postgres, which can be really hard in a large
organization.  Knowing about the corrupted page is also useful as you
can look at its contents and look at its bytes before it gets zero'ed
to spot patterns which can help other teams in charge of a lower level
of the application layer.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: could recovery_target_timeline=latest be the default in standbymode?
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] REINDEX CONCURRENTLY 2.0