Re: Enable data checksums by default - Mailing list pgsql-hackers

From Greg Sabino Mullane
Subject Re: Enable data checksums by default
Date
Msg-id CAKAnmmKsJJ6FkGrLLuZ7qi1gjA2NVuy5i1FN+QKk-pU1ksTJgw@mail.gmail.com
Whole thread Raw
In response to Re: Enable data checksums by default  (Peter Eisentraut <peter@eisentraut.org>)
Responses Re: Enable data checksums by default
List pgsql-hackers
On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut <peter@eisentraut.org> wrote:
 
My understanding was that the reason for some hesitation about adopting data checksums was the performance impact.  Not the checksumming itself, but the overhead from hint bit logging.  The last time I looked into that, you could get performance impacts on the order of 5% tps.  Maybe that's acceptable, and you of course can turn it off if you want the extra performance.  But I think this should be discussed in this thread.

Fair enough. I think the performance impact is acceptable, as evidenced by the large number of people that turn it on. And it is easy enough to turn it off again, either via --no-data-checksums or pg_checksums --disable. I've come across people who have regretted not throwing a -k into their initial initdb, but have not yet come across someone who has the opposite regret. When I did some measurements some time ago, I found numbers much less than 5%, but of course it depends on a lot of factors.

About the claim that it's already the de-facto standard.  Maybe that is approximately true for "serious" installations.  But AFAICT, the popular packagings don't enable checksums by default, so there is likely a significant middle tier between "just trying it out" and serious
production use that don't have it turned on.

I would push back on that "significant" a good bit. The number of Postgres installations in the cloud is very likely to dwarf the total package installations. Maybe not 10 years ago, but now? Maybe someone from Amazon can share some numbers. Not that we have any way to compare against package installs :) But anecdotally the number of people who mention RDS etc. on the various fora has exploded.
 
For those uses, this change would render pg_upgrade useless for upgrades from an old instance with default settings to a new instance with default settings.  And then users would either need to re-initdb with checksums turned back off, or I suppose run pg_checksums on the old instance before upgrading?  This is significant additional complication.

Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle.
 
And packagers who have built abstractions on top of pg_upgrade (such as Debian pg_upgradecluster) would also need to implement something to manage this somehow.

How does it deal with clusters with checksums enabled now?
 
I'm thinking pg_upgrade could have a mode where it adds the checksum during the upgrade as it copies the files (essentially a subset
of pg_checksums).  I think that would be useful for that middle tier of users who just want a good default experience.

Hm...might be a bad experience if it forces a switch out of --link mode. Perhaps a warning at the end of pg_upgrade that suggests running pg_checksums on your new cluster if you want to enable checksums?

Cheers,
Greg
 

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Create syscaches for pg_extension
Next
From: Greg Sabino Mullane
Date:
Subject: Re: Normalize queries starting with SET for pg_stat_statements