Re: Enable data checksums by default - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Enable data checksums by default
Date
Msg-id 20190322170715.tafjitaldyhwfl2u@alap3.anarazel.de
Whole thread Raw
In response to Re: Enable data checksums by default  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On 2019-03-22 18:01:32 +0100, Tomas Vondra wrote:
> On 3/22/19 5:41 PM, Andres Freund wrote:
> > Hi,
> > 
> > On 2019-03-22 17:32:10 +0100, Tomas Vondra wrote:
> >> On 3/22/19 5:10 PM, Andres Freund wrote:
> >>> IDK, being able to verify in some form that backups aren't corrupted on
> >>> an IO level is mighty nice. That often does allow to detect the issue
> >>> while one still has older backups around.
> >>>
> >>
> >> Yeah, I agree that's a valuable capability. I think the question is how
> >> effective it actually is considering how much the storage changed over
> >> the past few years (which necessarily affects the type of failures
> >> people have to deal with).
> > 
> > I'm not sure I understand? How do the changes around storage
> > meaningfully affect the need to have some trust in backups and
> > benefiting from earlier detection?
> > 
> 
> Having trusted in backups is still desirable - nothing changes that,
> obviously. The question I was posing was rather "Are checksums still
> effective on current storage systems?"
> 
> I'm wondering if the storage systems people use nowadays may be failing
> in ways that are not reliably detectable by checksums. I don't have any
> data to either support or reject that hypothesis, though.

I don't think it's useful to paint unsubstantiated doom-and-gloom
pictures.


> >> It's not clear to me what can checksums do about zeroed pages (and/or
> >> truncated files) though.
> > 
> > Well, there's nothing fundamental about needing added pages be
> > zeroes. We could expand them to be initialized with actual valid
> > checksums instead of
> >         /* new buffers are zero-filled */
> >         MemSet((char *) bufBlock, 0, BLCKSZ);
> >         /* don't set checksum for all-zero page */
> >         smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
> > 
> > the problem is that it's hard to do so safely without adding a lot of
> > additional WAL logging. A lot of filesystems will journal metadata
> > changes (like the size of the file), but not contents. So after a crash
> > the tail end might appear zeroed out, even if we never wrote
> > zeroes. That's obviously solvable by WAL logging, but that's not cheap.
> > 
> 
> Hmmm. I'd say a filesystem that does not guarantee having all the data
> after an fsync is outright broken, but maybe that's what checksums are
> meant to protect against.

There's no fsync here. smgrextend(with-valid-checksum);crash; - the OS
will probably have journalled the file size change, but not the
contents. After a crash it's thus likely that the data page will appear
zeroed.  Which prevents us from erroring out when encountering a zeroed
page, even though that'd be very good for error detection capabilities,
because storage systems will show corrupted data as zeroes in a number
of cases.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Enable data checksums by default
Next
From: Simon Riggs
Date:
Subject: Error message inconsistency