Re: Enable data checksums by default - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Enable data checksums by default
Date
Msg-id 20ce4ea9-47f3-9903-01b5-00eaf349b09d@2ndquadrant.com
Whole thread Raw
In response to Re: Enable data checksums by default  (Andres Freund <andres@anarazel.de>)
Responses Re: Enable data checksums by default
List pgsql-hackers

On 3/22/19 5:10 PM, Andres Freund wrote:
> Hi,
> 
> On 2019-03-22 12:07:22 -0400, Tom Lane wrote:
>> Christoph Berg <myon@debian.org> writes:
>>> I think, the next step in that direction would be to enable data
>>> checksums by default. They make sense in most setups,
>>
>> Well, that is exactly the point that needs some proof, not just
>> an unfounded assertion.
>>
>> IMO, the main value of checksums is that they allow the Postgres
>> project to deflect blame.  That's nice for us but I'm not sure
>> that it's a benefit for users.  I've seen little if any data to
>> suggest that checksums actually catch enough problems to justify
>> the extra CPU costs and the risk of false positives.
> 

I'm not sure about checksums being an effective tool to deflect blame.
Considering the recent fsync retry issues - due to the assumption that
we can just retry fsync we might have lost some of the writes, resulting
in torn pages and checksum failures. I'm sure we could argue about how
much sense the fsync behavior makes, but I doubt checksum failures are
enough to deflect blame here.

> IDK, being able to verify in some form that backups aren't corrupted on
> an IO level is mighty nice. That often does allow to detect the issue
> while one still has older backups around.
> 

Yeah, I agree that's a valuable capability. I think the question is how
effective it actually is considering how much the storage changed over
the past few years (which necessarily affects the type of failures
people have to deal with).

> My problem is more that I'm not confident the checks are mature
> enough. The basebackup checks are atm not able to detect random data,
> and neither basebackup nor backend checks detect zeroed out files/file
> ranges.
> 

Yep :-( The pg_basebackup vulnerability to random garbage in a page
header is unfortunate, we better improve that.

It's not clear to me what can checksums do about zeroed pages (and/or
truncated files) though.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Ordered Partitioned Table Scans
Next
From: Robert Haas
Date:
Subject: Re: propagating replica identity to partitions