Re: Enable data checksums by default - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Enable data checksums by default
Date
Msg-id CA+Tgmoa_RUVrROy_iOQmWfsHA-YD6J3BsSurkH6t1qT9QLg1kw@mail.gmail.com
Whole thread Raw
In response to Re: Enable data checksums by default  (Greg Sabino Mullane <htamfids@gmail.com>)
List pgsql-hackers
On Tue, Aug 13, 2024 at 10:42 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:
> Fair enough. I think the performance impact is acceptable, as evidenced by the large number of people that turn it
on.And it is easy enough to turn it off again, either via --no-data-checksums or pg_checksums --disable. 
> When I did some measurements some time ago, I found numbers much less than 5%, but of course it depends on a lot of
factors.

I think the bad case is when you have a write workload that is
significantly bigger than shared_buffers but still small enough to fit
comfortably in the OS cache. When everything fits in shared_buffers,
you only need to write dirty buffers once per checkpoint cycle, so
making it more expensive isn't necessarily a big deal. When you're
constantly going to disk, that's so expensive that you don't notice
the computational overhead. But when you're in that middle zone where
you keep evicting buffers from PG but not actually having to write
them down to the disk, then I think it's pretty noticeable.

> I've come across people who have regretted not throwing a -k into their initial initdb, but have not yet come across
someonewho has the opposite regret. 

I don't think this is really a fair comparison, because everything
being a little slower all the time is not something that people are
likely to "regret" in the same way that they regret it when a data
corruption issue goes undetected. An undetected data corruption issue
is a single, very painful event that people are likely to notice,
whereas a small performance loss over time kind of blends into the
background. You don't really regret that kind of thing in the same way
that you regret a bad event that happens at a particular moment in
time.

And it's not like we have statistics anywhere that you can look at to
see how much CPU time you spent computing checksums, so if a user DOES
have a performance problem that would not have occurred if checksums
had been disabled, they'll probably never know it.

>> For those uses, this change would render pg_upgrade useless for upgrades from an old instance with default settings
toa new instance with default settings.  And then users would either need to re-initdb with checksums turned back off,
orI suppose run pg_checksums on the old instance before upgrading?  This is significant additional complication. 
> Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle.

I tend to agree with that, but I would also like to see the sort of
improvements that Peter mentions. It's a lot less work to say "let's
just change the default" and then get mad at anyone who disagrees than
it is to do the engineering to make changing the default less of a
problem. But that kind of engineering really adds a lot of value
compared to just changing the default.

None of that is to say that I'm totally hostile to this change.
Checksums don't actually prevent your data from getting corrupted, or
let you recover it after it does. They just tell you about the
problem, and very often you would have found out anyway. However, they
do have peace-of-mind value. If you've got checksums turned on, you
can verify your checksums regularly and see that they're OK, and
people like that. Whether that's worth the overhead for everyone, I'm
not quite sure.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: pg_stat_statements and "IN" conditions
Next
From: Peter Eisentraut
Date:
Subject: Re: Improve error message for ICU libraries if pkg-config is absent