Re: Enable data checksums by default - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Enable data checksums by default
Date
Msg-id 12110cc1-3729-9e5e-b6bb-62151a68af29@2ndquadrant.com
Whole thread Raw
In response to Re: Enable data checksums by default  (Andres Freund <andres@anarazel.de>)
Responses Re: Enable data checksums by default
List pgsql-hackers
On 3/22/19 5:41 PM, Andres Freund wrote:
> Hi,
> 
> On 2019-03-22 17:32:10 +0100, Tomas Vondra wrote:
>> On 3/22/19 5:10 PM, Andres Freund wrote:
>>> IDK, being able to verify in some form that backups aren't corrupted on
>>> an IO level is mighty nice. That often does allow to detect the issue
>>> while one still has older backups around.
>>>
>>
>> Yeah, I agree that's a valuable capability. I think the question is how
>> effective it actually is considering how much the storage changed over
>> the past few years (which necessarily affects the type of failures
>> people have to deal with).
> 
> I'm not sure I understand? How do the changes around storage
> meaningfully affect the need to have some trust in backups and
> benefiting from earlier detection?
> 

Having trusted in backups is still desirable - nothing changes that,
obviously. The question I was posing was rather "Are checksums still
effective on current storage systems?"

I'm wondering if the storage systems people use nowadays may be failing
in ways that are not reliably detectable by checksums. I don't have any
data to either support or reject that hypothesis, though.

> 
>> It's not clear to me what can checksums do about zeroed pages (and/or
>> truncated files) though.
> 
> Well, there's nothing fundamental about needing added pages be
> zeroes. We could expand them to be initialized with actual valid
> checksums instead of
>         /* new buffers are zero-filled */
>         MemSet((char *) bufBlock, 0, BLCKSZ);
>         /* don't set checksum for all-zero page */
>         smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
> 
> the problem is that it's hard to do so safely without adding a lot of
> additional WAL logging. A lot of filesystems will journal metadata
> changes (like the size of the file), but not contents. So after a crash
> the tail end might appear zeroed out, even if we never wrote
> zeroes. That's obviously solvable by WAL logging, but that's not cheap.
> 

Hmmm. I'd say a filesystem that does not guarantee having all the data
after an fsync is outright broken, but maybe that's what checksums are
meant to protect against.

> It might still be a good idea to just write a page with an initialized
> header / checksum at that point, as that ought to still detect a number
> of problems we can't detect right now.
> 

Sounds reasonable.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: propagating replica identity to partitions
Next
From: Andres Freund
Date:
Subject: Re: Enable data checksums by default