Re: Enable data checksums by default - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Enable data checksums by default
Date
Msg-id 8f5b725d-1a6c-4ba6-a9ba-a67106fa2054@eisentraut.org
Whole thread Raw
In response to Enable data checksums by default  (Greg Sabino Mullane <htamfids@gmail.com>)
Responses Re: Enable data checksums by default
Re: Enable data checksums by default
Re: Enable data checksums by default
Re: Enable data checksums by default
List pgsql-hackers
On 07.08.24 00:46, Greg Sabino Mullane wrote:
> Currently, initdb only enables data checksums if passed the 
> --data-checksums or -k argument. There was some hesitation years ago 
> when this feature was first added, leading to the current situation 
> where the default is off. However, many years later, there is wide 
> consensus that this is an extraordinarily safe, desirable setting. 
> Indeed, most (if not all) of the major commercial and open source 
> Postgres systems currently turn this on by default. I posit you would be 
> hard-pressed to find many systems these days in which it has NOT been 
> turned on. So basically we have a de-facto standard, and I think it's 
> time we flipped the switch to make it on by default.

I'm sympathetic to this proposal, but I want to raise some concerns.

My understanding was that the reason for some hesitation about adopting 
data checksums was the performance impact.  Not the checksumming itself, 
but the overhead from hint bit logging.  The last time I looked into 
that, you could get performance impacts on the order of 5% tps.  Maybe 
that's acceptable, and you of course can turn it off if you want the 
extra performance.  But I think this should be discussed in this thread.

About the claim that it's already the de-facto standard.  Maybe that is 
approximately true for "serious" installations.  But AFAICT, the popular 
packagings don't enable checksums by default, so there is likely a 
significant middle tier between "just trying it out" and serious 
production use that don't have it turned on.

For those uses, this change would render pg_upgrade useless for upgrades 
from an old instance with default settings to a new instance with 
default settings.  And then users would either need to re-initdb with 
checksums turned back off, or I suppose run pg_checksums on the old 
instance before upgrading?  This is significant additional complication. 
  And packagers who have built abstractions on top of pg_upgrade (such 
as Debian pg_upgradecluster) would also need to implement something to 
manage this somehow.

So I think we need to think through the upgrade experience a bit more. 
Unfortunately, pg_checksums hasn't gotten to the point that we were 
perhaps once hoping for that you could enable checksums on a live 
system.  I'm thinking pg_upgrade could have a mode where it adds the 
checksum during the upgrade as it copies the files (essentially a subset 
of pg_checksums).  I think that would be useful for that middle tier of 
users who just want a good default experience.




pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber
Next
From: Thomas Munro
Date:
Subject: Re: Refactoring postmaster's code to cleanup after child exit