Home > mailing lists

Thread: Enable data checksums by default

Enable data checksums by default

From

Greg Sabino Mullane

Date:

06 August 2024, 22:46:52

Please find attached a patch to enable data checksums by default.

Currently, initdb only enables data checksums if passed the --data-checksums or -k argument. There was some hesitation years ago when this feature was first added, leading to the current situation where the default is off. However, many years later, there is wide consensus that this is an extraordinarily safe, desirable setting. Indeed, most (if not all) of the major commercial and open source Postgres systems currently turn this on by default. I posit you would be hard-pressed to find many systems these days in which it has NOT been turned on. So basically we have a de-facto standard, and I think it's time we flipped the switch to make it on by default.

The patch is simple enough: literally flipping the boolean inside of initdb.c, and adding a new argument '--no-data-checksums' for those instances that truly want the old behavior. One place that still needs the old behavior is our internal tests for pg_checksums and pg_amcheck, so I added a new argument to init() in PostgreSQL/Test/Cluster.pm to allow those to still pass their tests.

This is just the default - people are still more than welcome to turn it off with the new flag. The pg_checksums program is another option that actually argues for having the default "on", as switching to "off" once initdb has been run is trivial.

Yes, I am aware of the previous discussions on this, but the world moves fast - wal compression is better than in the past, vacuum is better now, and data-checksums being on is such a complete default in the wild, it feels weird and a disservice that we are not running all our tests like that.

Cheers,

Greg

Attachment

0001-Make-initdb-enable-data-checksums-by-default.patch

Re: Enable data checksums by default

From

Michael Banck

Date:

07 August 2024, 08:43:41

Hi,

On Tue, Aug 06, 2024 at 06:46:52PM -0400, Greg Sabino Mullane wrote:
> Please find attached a patch to enable data checksums by default.
> 
> Currently, initdb only enables data checksums if passed the
> --data-checksums or -k argument. There was some hesitation years ago when
> this feature was first added, leading to the current situation where the
> default is off. However, many years later, there is wide consensus that
> this is an extraordinarily safe, desirable setting. Indeed, most (if not
> all) of the major commercial and open source Postgres systems currently
> turn this on by default. I posit you would be hard-pressed to find many
> systems these days in which it has NOT been turned on. So basically we have
> a de-facto standard, and I think it's time we flipped the switch to make it
> on by default.

[...]
 
> Yes, I am aware of the previous discussions on this, but the world moves
> fast - wal compression is better than in the past, vacuum is better now,
> and data-checksums being on is such a complete default in the wild, it
> feels weird and a disservice that we are not running all our tests like
> that.

I agree.

Some review on the patch:

> diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
> index bdd613e77f..511f489d34 100644
> --- a/doc/src/sgml/ref/initdb.sgml
> +++ b/doc/src/sgml/ref/initdb.sgml
> @@ -267,12 +267,14 @@ PostgreSQL documentation
>         <para>
>          Use checksums on data pages to help detect corruption by the
>          I/O system that would otherwise be silent. Enabling checksums
> -        may incur a noticeable performance penalty. If set, checksums
> +        may incur a small performance penalty. If set, checksums
>          are calculated for all objects, in all databases. All checksum

I think the last time we dicussed this the consensus was that
computational overhead of computing the checksums is pretty small for
most systems (so the above change seems warranted regardless of whether
we switch the default), but turning on wal_compression also turns on
wal_log_hints, which can increase WAL by quite a lot. Maybe this is
covered elsewhere in the documentation (I just looked at the patch), but
if not, it probably should be added here as a word of caution.

>          failures will be reported in the
>          <link linkend="monitoring-pg-stat-database-view">
>          <structname>pg_stat_database</structname></link> view.
>          See <xref linkend="checksums" /> for details.
> +        As of version 18, checksums are enabled by default. They can be
> +        disabled by use of <option>--no-data-checksums</option>.

I think we usually do not mention when a feature was added/changed, do
we? So I'd just write "(default: enabled)" or whatever is the style of
the surrounding options.

>         </para>
>        </listitem>
>       </varlistentry>
> diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
> index f00718a015..ce7d3e99e5 100644
> --- a/src/bin/initdb/initdb.c
> +++ b/src/bin/initdb/initdb.c
> @@ -164,7 +164,7 @@ static bool noinstructions = false;
>  static bool do_sync = true;
>  static bool sync_only = false;
>  static bool show_setting = false;
> -static bool data_checksums = false;
> +static bool data_checksums = true;
>  static char *xlog_dir = NULL;
>  static int    wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
>  static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
> @@ -3121,6 +3121,7 @@ main(int argc, char *argv[])
>          {"waldir", required_argument, NULL, 'X'},
>          {"wal-segsize", required_argument, NULL, 12},
>          {"data-checksums", no_argument, NULL, 'k'},
> +        {"no-data-checksums", no_argument, NULL, 20},

Does it make sense to add -K (capital k) as a short-cut for this? I
think this is how we distinguish on/off for pg_dump (-t/-T etc.) but
maybe that is not wider project policy.


Michael

Re: Enable data checksums by default

From

Greg Sabino Mullane

Date:

07 August 2024, 14:17:30

On Wed, Aug 7, 2024 at 4:43 AM Michael Banck <mbanck@gmx.net> wrote:

I think the last time we dicussed this the consensus was that
computational overhead of computing the checksums is pretty small for
most systems (so the above change seems warranted regardless of whether
we switch the default), but turning on wal_compression also turns on
wal_log_hints, which can increase WAL by quite a lot. Maybe this is
covered elsewhere in the documentation (I just looked at the patch), but
if not, it probably should be added here as a word of caution.

Yeah, that seems something beyond this patch? Certainly we should mention wal_compression in the release notes if the default changes. I mean, I feel wal_log_hints should probably default to on as well, but I've honestly never really given it much thought because my fingers are trained to type "initdb -k". I've been using data checksums for roughly a decade now. I think the only time I've NOT used checksums was when I was doing checksum overhead measurements, or hacking on the pg_checksums program.

I think we usually do not mention when a feature was added/changed, do
we? So I'd just write "(default: enabled)" or whatever is the style of
the surrounding options.

+1

> + {"no-data-checksums", no_argument, NULL, 20},

Does it make sense to add -K (capital k) as a short-cut for this? I
think this is how we distinguish on/off for pg_dump (-t/-T etc.) but
maybe that is not wider project policy.

I'd rather not. Better to keep it explicit rather than some other weird letter that has no mnemonic value.

Cheers,

Greg

Re: Enable data checksums by default

From

Peter Eisentraut

Date:

08 August 2024, 10:11:38

On 07.08.24 00:46, Greg Sabino Mullane wrote:
> Currently, initdb only enables data checksums if passed the 
> --data-checksums or -k argument. There was some hesitation years ago 
> when this feature was first added, leading to the current situation 
> where the default is off. However, many years later, there is wide 
> consensus that this is an extraordinarily safe, desirable setting. 
> Indeed, most (if not all) of the major commercial and open source 
> Postgres systems currently turn this on by default. I posit you would be 
> hard-pressed to find many systems these days in which it has NOT been 
> turned on. So basically we have a de-facto standard, and I think it's 
> time we flipped the switch to make it on by default.

I'm sympathetic to this proposal, but I want to raise some concerns.

My understanding was that the reason for some hesitation about adopting 
data checksums was the performance impact.  Not the checksumming itself, 
but the overhead from hint bit logging.  The last time I looked into 
that, you could get performance impacts on the order of 5% tps.  Maybe 
that's acceptable, and you of course can turn it off if you want the 
extra performance.  But I think this should be discussed in this thread.

About the claim that it's already the de-facto standard.  Maybe that is 
approximately true for "serious" installations.  But AFAICT, the popular 
packagings don't enable checksums by default, so there is likely a 
significant middle tier between "just trying it out" and serious 
production use that don't have it turned on.

For those uses, this change would render pg_upgrade useless for upgrades 
from an old instance with default settings to a new instance with 
default settings.  And then users would either need to re-initdb with 
checksums turned back off, or I suppose run pg_checksums on the old 
instance before upgrading?  This is significant additional complication. 
  And packagers who have built abstractions on top of pg_upgrade (such 
as Debian pg_upgradecluster) would also need to implement something to 
manage this somehow.

So I think we need to think through the upgrade experience a bit more. 
Unfortunately, pg_checksums hasn't gotten to the point that we were 
perhaps once hoping for that you could enable checksums on a live 
system.  I'm thinking pg_upgrade could have a mode where it adds the 
checksum during the upgrade as it copies the files (essentially a subset 
of pg_checksums).  I think that would be useful for that middle tier of 
users who just want a good default experience.

Re: Enable data checksums by default

From

Michael Banck

Date:

08 August 2024, 12:54:55

On Thu, Aug 08, 2024 at 12:11:38PM +0200, Peter Eisentraut wrote:
> So I think we need to think through the upgrade experience a bit more.
> Unfortunately, pg_checksums hasn't gotten to the point that we were perhaps
> once hoping for that you could enable checksums on a live system.  I'm
> thinking pg_upgrade could have a mode where it adds the checksum during the
> upgrade as it copies the files (essentially a subset of pg_checksums).  I
> think that would be useful for that middle tier of users who just want a
> good default experience.

Well that, or, as a first less ambitious step, pg_upgrade could carry
over the data_checksums setting from the old to the new instance by
essentially disabling it via pg_checksums -d (which is fast) if it the
current default (off) is set on the old instance and the new instance
was created with the new onw (checksums on).

Probably should include a warning or something in that case, though I
guess a lot of users will read just past it. But at least they are not
worse off than before.

Michael

Re: Enable data checksums by default

From

Daniel Gustafsson

Date:

08 August 2024, 13:01:57

> On 8 Aug 2024, at 12:11, Peter Eisentraut <peter@eisentraut.org> wrote:

> My understanding was that the reason for some hesitation about adopting data checksums was the performance impact.
Notthe checksumming itself, but the overhead from hint bit logging.  The last time I looked into that, you could get
performanceimpacts on the order of 5% tps.  Maybe that's acceptable, and you of course can turn it off if you want the
extraperformance.  But I think this should be discussed in this thread. 

That's been my experience as well, the overhead of the checksumming is
negligible but the overhead in WAL can be (having hint bits WAL logged does
carry other benefits as well to be fair).

> I think we need to think through the upgrade experience a bit more.

+1

> Unfortunately, pg_checksums hasn't gotten to the point that we were perhaps once hoping for that you could enable
checksumson a live system.   

I don't recall there being any work done (or plans for) using pg_checksums on a
live system.  Anyone interested in enabling checksums on a live cluster can
however review the patch for that in:

  https://postgr.es/m/E07A611B-9CF3-4FDB-8CE8-A221E39040EC@yesql.se

> I'm thinking pg_upgrade could have a mode where it adds the checksum during the upgrade as it copies the files
(essentiallya subset of pg_checksums).  I think that would be useful for that middle tier of users who just want a good
defaultexperience. 

As a side-note, I implemented this in pg_upgrade at Greenplum (IIRC it was
submitted to -hackers at the time as well) and it worked well with not a lot of
code.

--
Daniel Gustafsson

Re: Enable data checksums by default

From

Greg Sabino Mullane

Date:

08 August 2024, 17:19:14

Thank you for the feedback. Please find attached three separate patches. One to add a new flag to initdb (--no-data-checksums), one to adjust the tests to use this flag as needed, and the final to make the actual switch of the default value (along with tests and docs).

Cheers,

Greg

Attachment

Re: Enable data checksums by default

From

Robert Haas

Date:

08 August 2024, 17:42:55

On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> About the claim that it's already the de-facto standard.  Maybe that is
> approximately true for "serious" installations.  But AFAICT, the popular
> packagings don't enable checksums by default, so there is likely a
> significant middle tier between "just trying it out" and serious
> production use that don't have it turned on.

+1.

> I'm thinking pg_upgrade could have a mode where it adds the
> checksum during the upgrade as it copies the files (essentially a subset
> of pg_checksums).  I think that would be useful for that middle tier of
> users who just want a good default experience.

That would be very nice.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Enable data checksums by default

From

Tomas Vondra

Date:

08 August 2024, 18:49:36

On 8/8/24 19:42, Robert Haas wrote:
> On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>> About the claim that it's already the de-facto standard.  Maybe that is
>> approximately true for "serious" installations.  But AFAICT, the popular
>> packagings don't enable checksums by default, so there is likely a
>> significant middle tier between "just trying it out" and serious
>> production use that don't have it turned on.
> 
> +1.
> 
>> I'm thinking pg_upgrade could have a mode where it adds the
>> checksum during the upgrade as it copies the files (essentially a subset
>> of pg_checksums).  I think that would be useful for that middle tier of
>> users who just want a good default experience.
> 
> That would be very nice.
> 

Yeah, but it might also disable checksums on the new cluster, which
would work for link mode too. So we'd probably want multiple modes, one
to enable checksums during file copy, one to disable checksums, and one
to just fail for incompatible clusters.


-- 
Tomas Vondra

Re: Enable data checksums by default

From

Greg Sabino Mullane

Date:

13 August 2024, 14:41:44

On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut <peter@eisentraut.org> wrote:

My understanding was that the reason for some hesitation about adopting data checksums was the performance impact. Not the checksumming itself, but the overhead from hint bit logging. The last time I looked into that, you could get performance impacts on the order of 5% tps. Maybe that's acceptable, and you of course can turn it off if you want the extra performance. But I think this should be discussed in this thread.

Fair enough. I think the performance impact is acceptable, as evidenced by the large number of people that turn it on. And it is easy enough to turn it off again, either via --no-data-checksums or pg_checksums --disable. I've come across people who have regretted not throwing a -k into their initial initdb, but have not yet come across someone who has the opposite regret. When I did some measurements some time ago, I found numbers much less than 5%, but of course it depends on a lot of factors.

About the claim that it's already the de-facto standard. Maybe that is approximately true for "serious" installations. But AFAICT, the popular packagings don't enable checksums by default, so there is likely a significant middle tier between "just trying it out" and serious
production use that don't have it turned on.

I would push back on that "significant" a good bit. The number of Postgres installations in the cloud is very likely to dwarf the total package installations. Maybe not 10 years ago, but now? Maybe someone from Amazon can share some numbers. Not that we have any way to compare against package installs :) But anecdotally the number of people who mention RDS etc. on the various fora has exploded.

For those uses, this change would render pg_upgrade useless for upgrades from an old instance with default settings to a new instance with default settings. And then users would either need to re-initdb with checksums turned back off, or I suppose run pg_checksums on the old instance before upgrading? This is significant additional complication.

Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle.

And packagers who have built abstractions on top of pg_upgrade (such as Debian pg_upgradecluster) would also need to implement something to manage this somehow.

How does it deal with clusters with checksums enabled now?

I'm thinking pg_upgrade could have a mode where it adds the checksum during the upgrade as it copies the files (essentially a subset
of pg_checksums). I think that would be useful for that middle tier of users who just want a good default experience.

Hm...might be a bad experience if it forces a switch out of --link mode. Perhaps a warning at the end of pg_upgrade that suggests running pg_checksums on your new cluster if you want to enable checksums?

Cheers,

Greg

Re: Enable data checksums by default

From

Robert Haas

Date:

13 August 2024, 20:07:40

On Tue, Aug 13, 2024 at 10:42 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:
> Fair enough. I think the performance impact is acceptable, as evidenced by the large number of people that turn it
on.And it is easy enough to turn it off again, either via --no-data-checksums or pg_checksums --disable. 
> When I did some measurements some time ago, I found numbers much less than 5%, but of course it depends on a lot of
factors.

I think the bad case is when you have a write workload that is
significantly bigger than shared_buffers but still small enough to fit
comfortably in the OS cache. When everything fits in shared_buffers,
you only need to write dirty buffers once per checkpoint cycle, so
making it more expensive isn't necessarily a big deal. When you're
constantly going to disk, that's so expensive that you don't notice
the computational overhead. But when you're in that middle zone where
you keep evicting buffers from PG but not actually having to write
them down to the disk, then I think it's pretty noticeable.

> I've come across people who have regretted not throwing a -k into their initial initdb, but have not yet come across
someonewho has the opposite regret. 

I don't think this is really a fair comparison, because everything
being a little slower all the time is not something that people are
likely to "regret" in the same way that they regret it when a data
corruption issue goes undetected. An undetected data corruption issue
is a single, very painful event that people are likely to notice,
whereas a small performance loss over time kind of blends into the
background. You don't really regret that kind of thing in the same way
that you regret a bad event that happens at a particular moment in
time.

And it's not like we have statistics anywhere that you can look at to
see how much CPU time you spent computing checksums, so if a user DOES
have a performance problem that would not have occurred if checksums
had been disabled, they'll probably never know it.

>> For those uses, this change would render pg_upgrade useless for upgrades from an old instance with default settings
toa new instance with default settings.  And then users would either need to re-initdb with checksums turned back off,
orI suppose run pg_checksums on the old instance before upgrading?  This is significant additional complication. 
> Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle.

I tend to agree with that, but I would also like to see the sort of
improvements that Peter mentions. It's a lot less work to say "let's
just change the default" and then get mad at anyone who disagrees than
it is to do the engineering to make changing the default less of a
problem. But that kind of engineering really adds a lot of value
compared to just changing the default.

None of that is to say that I'm totally hostile to this change.
Checksums don't actually prevent your data from getting corrupted, or
let you recover it after it does. They just tell you about the
problem, and very often you would have found out anyway. However, they
do have peace-of-mind value. If you've got checksums turned on, you
can verify your checksums regularly and see that they're OK, and
people like that. Whether that's worth the overhead for everyone, I'm
not quite sure.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Enable data checksums by default

From

Jakub Wartak

Date:

15 August 2024, 10:49:04

On Wed, Aug 7, 2024 at 4:18 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
>
> On Wed, Aug 7, 2024 at 4:43 AM Michael Banck <mbanck@gmx.net> wrote:
>>
>> I think the last time we dicussed this the consensus was that
>> computational overhead of computing the checksums is pretty small for
>> most systems (so the above change seems warranted regardless of whether
>> we switch the default), but turning on wal_compression also turns on
>> wal_log_hints, which can increase WAL by quite a lot. Maybe this is
[..]
>
>
> Yeah, that seems something beyond this patch? Certainly we should mention wal_compression in the release notes if the
defaultchanges. I mean, I feel wal_log_hints should probably default to on as well, but I've honestly never really
givenit much thought because my fingers are trained to type "initdb -k". I've been using data checksums for roughly a
decadenow. I think the only time I've NOT used checksums was when I was doing checksum overhead measurements, or
hackingon the pg_checksums program. 

Maybe I don't understand something, but just to be clear:
wal_compression (mentioned above) is not turning wal_log_hints on,
just the wal_log_hints needs to be on when using data checksums
(implicitly, by the XLogHintBitIsNeeded() macro). I suppose Michael
was thinking about the wal_log_hints earlier (?)

-J.

Re: Enable data checksums by default

From

Jakub Wartak

Date:

15 August 2024, 10:49:26

Hi Greg and others

On Tue, Aug 13, 2024 at 4:42 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
>
> On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
>>
>> My understanding was that the reason for some hesitation about adopting data checksums was the performance impact.
Notthe checksumming itself, but the overhead from hint bit logging.  The last time I looked into that, you could get
performanceimpacts on the order of 5% tps.  Maybe that's acceptable, and you of course can turn it off if you want the
extraperformance.  But I think this should be discussed in this thread. 
>
>
> Fair enough. I think the performance impact is acceptable, as evidenced by the large number of people that turn it
on.And it is easy enough to turn it off again, either via --no-data-checksums or pg_checksums --disable. I've come
acrosspeople who have regretted not throwing a -k into their initial initdb, but have not yet come across someone who
hasthe opposite regret. When I did some measurements some time ago, I found numbers much less than 5%, but of course it
dependson a lot of factors. 

Same here, and +1 to data_checksums=on by default for new installations.

The best public measurement of the impact was posted in [1] in 2019 by
Tomas to the best of my knowledge, where he explicitly mentioned the
problem with more WAL with hints/checksums: SATA disks (low IOPS). My
take: now we have 2024, and most people are using at least SSDs or
slow-SATA (but in cloud they could just change the class of I/O if
required to get IOPS to avoid too much throttling), therefore the
price of IOPS dropped significantly.

>> About the claim that it's already the de-facto standard.  Maybe that is approximately true for "serious"
installations. But AFAICT, the popular packagings don't enable checksums by default, so there is likely a significant
middletier between "just trying it out" and serious 
>> production use that don't have it turned on.
>
>
> I would push back on that "significant" a good bit. The number of Postgres installations in the cloud is very likely
todwarf the total package installations. Maybe not 10 years ago, but now? Maybe someone from Amazon can share some
numbers.Not that we have any way to compare against package installs :) But anecdotally the number of people who
mentionRDS etc. on the various fora has exploded. 

Same here. If it helps the case the: 43% of all PostgreSQL DBs
involved in any support case or incident in EDB within last year had
data_checksums=on (at least if they had collected the data using our )
. That's a surprisingly high number (for something that's off by
default), and it makes me think this is because plenty of customers
are either managed by DBAs who care, or assisted by consultants when
deploying, or simply using TPAexec [2] which has this on by default.

Another thing is plenty of people run with wal_log_hints=on (without
data_checksums=off) just to have pg_rewind working. As this is a
strictly standby related tool it means they don't have WAL/network
bandwidth problems, so the WAL rate is not that high in the wild to
cause problems. I found 1 or 2 cases within last year where we would
mention that high WAL generation was attributed to
wal_log_hints=on/XLOG_FPI  and they still didn't disable it apparently
(we have plenty of cases related to too much WAL, but it's mostly due
to other basic reasons)

-J.

[1] - https://www.postgresql.org/message-id/20190330192543.GH4719%40development
[2] - https://www.enterprisedb.com/docs/pgd/4/deployments/tpaexec/

Re: Enable data checksums by default

From

Michael Banck

Date:

15 August 2024, 11:03:30

On Thu, Aug 15, 2024 at 09:49:04AM +0200, Jakub Wartak wrote:
> On Wed, Aug 7, 2024 at 4:18 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
> >
> > On Wed, Aug 7, 2024 at 4:43 AM Michael Banck <mbanck@gmx.net> wrote:
> >>
> >> I think the last time we dicussed this the consensus was that
> >> computational overhead of computing the checksums is pretty small for
> >> most systems (so the above change seems warranted regardless of whether
> >> we switch the default), but turning on wal_compression also turns on
> >> wal_log_hints, which can increase WAL by quite a lot. Maybe this is
> [..]
> >
> >
> > Yeah, that seems something beyond this patch? Certainly we should
> > mention wal_compression in the release notes if the default changes.
> > I mean, I feel wal_log_hints should probably default to on as well,
> > but I've honestly never really given it much thought because my
> > fingers are trained to type "initdb -k". I've been using data
> > checksums for roughly a decade now. I think the only time I've NOT
> > used checksums was when I was doing checksum overhead measurements,
> > or hacking on the pg_checksums program.
> 
> Maybe I don't understand something, but just to be clear:
> wal_compression (mentioned above) is not turning wal_log_hints on,
> just the wal_log_hints needs to be on when using data checksums
> (implicitly, by the XLogHintBitIsNeeded() macro). I suppose Michael
> was thinking about the wal_log_hints earlier (?)

Uh, I am pretty sure I meant to say "turning on data_checksums als turns
on wal_log_hints", sorry about the confusion.

I guess the connection is that if you turn on wal_lot_hints (either
directly or via data_checksums) then the number FPIs goes up (possibly
signficantly), and enabling wal_compression could (partly) remedy that.
But I agree with Greg that such a discussion is probably out-of-scope
for this default change.


Michael

Re: Enable data checksums by default

From

Jakub Wartak

Date:

15 August 2024, 15:02:04

Hi all,

On Tue, Aug 13, 2024 at 10:08 PM Robert Haas <robertmhaas@gmail.com> wrote:

> And it's not like we have statistics anywhere that you can look at to
> see how much CPU time you spent computing checksums, so if a user DOES
> have a performance problem that would not have occurred if checksums
> had been disabled, they'll probably never know it.

In worst case, per second and per-pid CPU time consumption could be
quantified by having eBPF which is the standard on distros now
(requires kernel headers and bpfcc-tools installed), e.g. here 7918
was PID doing pgbench-related -c 4 workload with checksum=on (sorry
for formatting, but I don't want to use HTML here):

# funclatency-bpfcc --microseconds -i 1 -p 7918
/usr/lib/postgresql/16/bin/postgres:pg_checksum_page
Tracing 1 functions for
"/usr/lib/postgresql/16/bin/postgres:pg_checksum_page"... Hit Ctrl-C
to end.

     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 238      |*************                           |
         4 -> 7          : 714      |****************************************|
         8 -> 15         : 2        |                                        |
        16 -> 31         : 5        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 1        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 1        |                                        |
       512 -> 1023       : 1        |                                        |

avg = 6 usecs, total: 6617 usecs, count: 962


     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 241      |*************                           |
         4 -> 7          : 706      |****************************************|
         8 -> 15         : 11       |                                        |
        16 -> 31         : 10       |                                        |
        32 -> 63         : 1        |                                        |

avg = 5 usecs, total: 5639 usecs, count: 969

[..refreshes every 1s here..]

So the above can tell us e.g. that this pg_checksum_page() took 5639
us out of 1s full sample time (and with 100% CPU pegged core so that's
gives again ~5% CPU util per this routine; I'm ignoring the WAL/log
hint impact for sure). One could also write a small script using
bpftrace instead, too. Disassembly on Debian version and stock PGDG is
telling me it's ful SSE2 instruction-set, so that's nice and optimal
too.

> >> For those uses, this change would render pg_upgrade useless for upgrades from an old instance with default
settingsto a new instance with default settings.  And then users would either need to re-initdb with checksums turned
backoff, or I suppose run pg_checksums on the old instance before upgrading?  This is significant additional
complication.
> > Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle.
>
> I tend to agree with that, but I would also like to see the sort of
> improvements that Peter mentions.
[..]
> None of that is to say that I'm totally hostile to this change.
[.,.]
> Whether that's worth the overhead for everyone, I'm not quite sure.

Without data checksums there's a risk that someone receives silent-bit
corruption and no one will notice. Shouldn't integrity of data stand
above performance by default, in this case? (and performance could be
opt-in, if someone really wants it).

-J.

Re: Enable data checksums by default

From

Jakub Wartak

Date:

22 August 2024, 14:10:15

On Thu, Aug 22, 2024 at 8:11 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 15.08.24 08:38, Peter Eisentraut wrote:
> > On 08.08.24 19:42, Robert Haas wrote:
> >>> I'm thinking pg_upgrade could have a mode where it adds the
> >>> checksum during the upgrade as it copies the files (essentially a subset
> >>> of pg_checksums).  I think that would be useful for that middle tier of
> >>> users who just want a good default experience.
> >> That would be very nice.
> >
> > Here is a demo patch for that.  It turned out to be quite simple.
> >
> > I wrote above about a separate mode for that (like
> > --copy-and-make-adjustments), but it was just as easy to stick it into
> > the existing --copy mode.
> >
> > It would be useful to check what the performance overhead of this is
> > versus a copy that does not have to make adjustments.  I expect it's
> > very little.
> >
> > A drawback is that as written this does not work on Windows, because
> > Windows uses a different code path in copyFile().  I don't know the
> > reasons for that.  But it would need to be figured out.
>
> Here is an updated patch for this.  I simplified the logic a bit and
> also handle the case where the read() reads less than a round number of
> blocks.  I did some performance testing.  The overhead of computing the
> checksums versus a straight --copy without checksum adjustments appears
> to be around 5% wall clock time, which seems ok to me.  I also looked
> around the documentation to see if there is anything to update, but
> didn't find anything.
>
> I think if we can work out what to do on Windows, this could be a useful
> little feature for facilitating $subject.

My take:
1. I wonder if we should or should not by default calculate/enable the
checksums when doing pg_upgrade --copy from cluster with
checksums=off. Maybe we should error on that like we are doing now.
There might be still people want to have them off, but they would use
the proposed-new-defaults-of-initdb with checksums on blindly (so this
should be opt-in via some switch like with let's say
--copy-and-enable-checksums; so the user is in full control).
2. WIN32's copyFile() could then stay as it is, and then that new
--copy-and-enable-checksums on WIN32 would have to fallback to classic
loop.

-J.

Re: Enable data checksums by default

From

Peter Eisentraut

Date:

23 August 2024, 16:17:17

On 08.08.24 19:19, Greg Sabino Mullane wrote:
> Thank you for the feedback. Please find attached three separate patches. 
> One to add a new flag to initdb (--no-data-checksums), one to adjust the 
> tests to use this flag as needed, and the final to make the actual 
> switch of the default value (along with tests and docs).

I think we can get started with the initdb --no-data-checksums option.

The 0001 patch is missing documentation and --help output for this 
option.  Also, some of the tests for the option that are in patch 0003 
might be better in patch 0001.

Separately, this

-        may incur a noticeable performance penalty. If set, checksums
+        may incur a small performance penalty. If set, checksums

should perhaps be committed separately.  I don't think the patch 0003 
really changes the performance penalty. ;-)

Re: Enable data checksums by default

From

Bruce Momjian

Date:

23 August 2024, 17:42:07

On Fri, Aug 23, 2024 at 03:17:17PM +0200, Peter Eisentraut wrote:
> On 08.08.24 19:19, Greg Sabino Mullane wrote:
> > Thank you for the feedback. Please find attached three separate patches.
> > One to add a new flag to initdb (--no-data-checksums), one to adjust the
> > tests to use this flag as needed, and the final to make the actual
> > switch of the default value (along with tests and docs).
> 
> I think we can get started with the initdb --no-data-checksums option.
> 
> The 0001 patch is missing documentation and --help output for this option.
> Also, some of the tests for the option that are in patch 0003 might be
> better in patch 0001.
> 
> Separately, this
> 
> -        may incur a noticeable performance penalty. If set, checksums
> +        may incur a small performance penalty. If set, checksums
> 
> should perhaps be committed separately.  I don't think the patch 0003 really
> changes the performance penalty. ;-)

I think "might" would be more precise than "may" above.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: Enable data checksums by default

From

Nathan Bossart

Date:

26 August 2024, 22:46:39

In general, +1 for $SUBJECT.

     printf(_("  -k, --data-checksums      use data page checksums\n"));
+    printf(_("      --no-data-checksums   do not use data page checksums\n"));

Should we error if both --data-checksum and --no-data-checksums are
specified?  IIUC with 0001, we'll use whichever is specified last.

nitpick: these 4 patches are small enough that they could likely be
combined and committed together.

I think it's fair to say we should make the pg_upgrade experience nicer
once the default changes, but IMHO that needn't block actually changing the
default.

-- 
nathan

Re: Enable data checksums by default

From

Greg Sabino Mullane

Date:

27 August 2024, 16:44:39

On Mon, Aug 26, 2024 at 3:46 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

Should we error if both --data-checksum and --no-data-checksums are
specified? IIUC with 0001, we'll use whichever is specified last.

Hmmm, that is a good question. We have never (to my recollection) flipped a default quite like this before. I'm inclined to leave it as "last one wins", as I can see automated systems appending their desired selection to the end of the arg list, and expecting it to work.

nitpick: these 4 patches are small enough that they could likely be combined and committed together.

This was split per request upthread, which I do agree with.

I think it's fair to say we should make the pg_upgrade experience nicer once the default changes, but IMHO that needn't block actually changing the default.

+1

Cheers,

Greg

Re: Enable data checksums by default

From

Peter Eisentraut

Date:

27 August 2024, 18:16:51

On 27.08.24 15:44, Greg Sabino Mullane wrote:
> On Mon, Aug 26, 2024 at 3:46 PM Nathan Bossart <nathandbossart@gmail.com 
> <mailto:nathandbossart@gmail.com>> wrote:
> 
>     Should we error if both --data-checksum and --no-data-checksums are
>     specified?  IIUC with 0001, we'll use whichever is specified last.
> 
> 
> Hmmm, that is a good question. We have never (to my recollection) 
> flipped a default quite like this before. I'm inclined to leave it as 
> "last one wins", as I can see automated systems appending their desired 
> selection to the end of the arg list, and expecting it to work.

Yes, last option wins is the normal expected behavior.

Re: Enable data checksums by default

From

Nathan Bossart

Date:

27 August 2024, 18:26:59

On Tue, Aug 27, 2024 at 05:16:51PM +0200, Peter Eisentraut wrote:
> On 27.08.24 15:44, Greg Sabino Mullane wrote:
>> On Mon, Aug 26, 2024 at 3:46 PM Nathan Bossart <nathandbossart@gmail.com
>> <mailto:nathandbossart@gmail.com>> wrote:
>> 
>>     Should we error if both --data-checksum and --no-data-checksums are
>>     specified?  IIUC with 0001, we'll use whichever is specified last.
>> 
>> 
>> Hmmm, that is a good question. We have never (to my recollection)
>> flipped a default quite like this before. I'm inclined to leave it as
>> "last one wins", as I can see automated systems appending their desired
>> selection to the end of the arg list, and expecting it to work.
> 
> Yes, last option wins is the normal expected behavior.

WFM

001_verify_heapam fails with this patch set.  I think you may need to use
--no-data-checksums in that test, too.  Otherwise, it looks pretty good to
me.

-- 
nathan

Re: Enable data checksums by default

From

Peter Eisentraut

Date:

01 October 2024, 18:15:02

On 27.08.24 17:26, Nathan Bossart wrote:
> On Tue, Aug 27, 2024 at 05:16:51PM +0200, Peter Eisentraut wrote:
>> On 27.08.24 15:44, Greg Sabino Mullane wrote:
>>> On Mon, Aug 26, 2024 at 3:46 PM Nathan Bossart <nathandbossart@gmail.com
>>> <mailto:nathandbossart@gmail.com>> wrote:
>>>
>>>      Should we error if both --data-checksum and --no-data-checksums are
>>>      specified?  IIUC with 0001, we'll use whichever is specified last.
>>>
>>>
>>> Hmmm, that is a good question. We have never (to my recollection)
>>> flipped a default quite like this before. I'm inclined to leave it as
>>> "last one wins", as I can see automated systems appending their desired
>>> selection to the end of the arg list, and expecting it to work.
>>
>> Yes, last option wins is the normal expected behavior.
> 
> WFM
> 
> 001_verify_heapam fails with this patch set.  I think you may need to use
> --no-data-checksums in that test, too.  Otherwise, it looks pretty good to
> me.

I have committed 0001 (the new option) and 0004 (the docs tweak).  I 
think there is consensus for the rest, too, but I'll leave it for a few 
more days to think about.  I guess the test failure has to be addressed.

Re: Enable data checksums by default

From

Daniel Gustafsson

Date:

07 October 2024, 12:37:04

> On 3 Oct 2024, at 23:13, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
>> I have committed 0001 (the new option) and 0004 (the docs tweak).  I think
>> there is consensus for the rest, too, but I'll leave it for a few more days
>> to think about.  I guess the test failure has to be addressed.
>
> Here is a rebased patch with the test fix (for cfbot).  I have made no
> other changes.

+        Data checksums are enabled by default. They can be
+        disabled by use of <option>--no-data-checksums</option>.

Nitpick, but I think this should be an xref like how we link to --no-locale in
the -E docs: <xref linkend="app-initdb-no-data-checksums"/> instead.

LGTM otherwise.

--
Daniel Gustafsson

Re: Enable data checksums by default

From

Peter Eisentraut

Date:

14 October 2024, 12:28:41

On 03.10.24 23:13, Nathan Bossart wrote:
> On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
>> I have committed 0001 (the new option) and 0004 (the docs tweak).  I think
>> there is consensus for the rest, too, but I'll leave it for a few more days
>> to think about.  I guess the test failure has to be addressed.
> 
> Here is a rebased patch with the test fix (for cfbot).  I have made no
> other changes.

I have committed the test changes (patch 0002).  (I renamed the option 
to no_data_checksums to keep the wording consistent with the initdb option.)

Right now, with checksums off by default, this doesn't do much, but you 
can test this like

     PG_TEST_INITDB_EXTRA_OPTS=--data-checksums meson test ...

and everything will pass.  To make that work, I had to adjust the order 
of how the initdb options are assembled in Cluster.pm a bit.

I will work on the patch that flips the default next.

Re: Enable data checksums by default

From

Peter Eisentraut

Date:

16 October 2024, 09:54:54

On 14.10.24 11:28, Peter Eisentraut wrote:
> On 03.10.24 23:13, Nathan Bossart wrote:
>> On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
>>> I have committed 0001 (the new option) and 0004 (the docs tweak).  I 
>>> think
>>> there is consensus for the rest, too, but I'll leave it for a few 
>>> more days
>>> to think about.  I guess the test failure has to be addressed.
>>
>> Here is a rebased patch with the test fix (for cfbot).  I have made no
>> other changes.
> 
> I have committed the test changes (patch 0002).  (I renamed the option 
> to no_data_checksums to keep the wording consistent with the initdb 
> option.)
> 
> Right now, with checksums off by default, this doesn't do much, but you 
> can test this like
> 
>      PG_TEST_INITDB_EXTRA_OPTS=--data-checksums meson test ...
> 
> and everything will pass.  To make that work, I had to adjust the order 
> of how the initdb options are assembled in Cluster.pm a bit.
> 
> I will work on the patch that flips the default next.

The patch that flips the default has been committed.

I also started a PG18 open items page and made a note that we follow up 
on the upgrade experience, as was discussed in this thread.

https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items

Re: Enable data checksums by default

From

Peter Eisentraut

Date:

16 October 2024, 10:49:48

On 16.10.24 08:54, Peter Eisentraut wrote:
> On 14.10.24 11:28, Peter Eisentraut wrote:
>> On 03.10.24 23:13, Nathan Bossart wrote:
>>> On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
>>>> I have committed 0001 (the new option) and 0004 (the docs tweak).  I 
>>>> think
>>>> there is consensus for the rest, too, but I'll leave it for a few 
>>>> more days
>>>> to think about.  I guess the test failure has to be addressed.
>>>
>>> Here is a rebased patch with the test fix (for cfbot).  I have made no
>>> other changes.
>>
>> I have committed the test changes (patch 0002).  (I renamed the option 
>> to no_data_checksums to keep the wording consistent with the initdb 
>> option.)
>>
>> Right now, with checksums off by default, this doesn't do much, but 
>> you can test this like
>>
>>      PG_TEST_INITDB_EXTRA_OPTS=--data-checksums meson test ...
>>
>> and everything will pass.  To make that work, I had to adjust the 
>> order of how the initdb options are assembled in Cluster.pm a bit.
>>
>> I will work on the patch that flips the default next.
> 
> The patch that flips the default has been committed.
> 
> I also started a PG18 open items page and made a note that we follow up 
> on the upgrade experience, as was discussed in this thread.

Ah yes, and the upgrade tests on the buildfarm don't like this.  What 
shall we do about this?  Maybe adjust the buildfarm scripts to use 
--no-data-checksums?

Re: Enable data checksums by default

From

Daniel Gustafsson

Date:

16 October 2024, 11:01:37

> On 16 Oct 2024, at 09:49, Peter Eisentraut <peter@eisentraut.org> wrote:

> Ah yes, and the upgrade tests on the buildfarm don't like this.  What shall we do about this?  Maybe adjust the
buildfarmscripts to use --no-data-checksums? 

I can't see many other options, and it represents a reasonable use-case, so +1.

--
Daniel Gustafsson

Re: Enable data checksums by default

From

Peter Eisentraut

Date:

13 November 2024, 11:31:16

On 07.11.24 19:38, Alvaro Herrera wrote:
> I have just noticed that since this patch was committed as 04bec894a04c,
> pg_upgrade's "make check" action is unusable when given the
> "olddump/oldinstall" options.  We now need to inject '-k' to the initdb
> line for old servers, and we don't, so all upgrade tests fail.  I think
> this patch should be enough to fix it.

Yes, this fix looks correct.  (Or the other way around: Disable 
checksums on the new node.)

Is this not being covered by the build farm?  Are the upgrade tests 
there not using this?

Re: Enable data checksums by default

From

Alvaro Herrera

Date:

13 November 2024, 14:03:05

On 2024-Nov-13, Peter Eisentraut wrote:

> On 07.11.24 19:38, Alvaro Herrera wrote:
> > I have just noticed that since this patch was committed as 04bec894a04c,
> > pg_upgrade's "make check" action is unusable when given the
> > "olddump/oldinstall" options.  We now need to inject '-k' to the initdb
> > line for old servers, and we don't, so all upgrade tests fail.  I think
> > this patch should be enough to fix it.
> 
> Yes, this fix looks correct.

Thanks, pushed.

> (Or the other way around: Disable checksums on the new node.)

Yeah, I thought about that too, but, I think it'd be less realistic,
because the world is become one where checksums are enabled, not the
other way around.

> Is this not being covered by the build farm?  Are the upgrade tests there
> not using this?

Nope, the buildfarm has separate code to test cross-version upgrades,
https://github.com/PGBuildFarm/client-code/blob/main/PGBuild/Modules/TestUpgradeXversion.pm
This predates our in-core test support.  Maybe buildfarm's could be
simplified, but I'm not sure if it's worth it.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso." (Ernesto Hernández-Novich)

Re: Enable data checksums by default

From

Tomas Vondra

Date:

23 April, 01:24:39

On 10/16/24 08:54, Peter Eisentraut wrote:
> On 14.10.24 11:28, Peter Eisentraut wrote:
>> On 03.10.24 23:13, Nathan Bossart wrote:
>>> On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
>>>> I have committed 0001 (the new option) and 0004 (the docs tweak).  I
>>>> think
>>>> there is consensus for the rest, too, but I'll leave it for a few
>>>> more days
>>>> to think about.  I guess the test failure has to be addressed.
>>>
>>> Here is a rebased patch with the test fix (for cfbot).  I have made no
>>> other changes.
>>
>> I have committed the test changes (patch 0002).  (I renamed the option
>> to no_data_checksums to keep the wording consistent with the initdb
>> option.)
>>
>> Right now, with checksums off by default, this doesn't do much, but
>> you can test this like
>>
>>      PG_TEST_INITDB_EXTRA_OPTS=--data-checksums meson test ...
>>
>> and everything will pass.  To make that work, I had to adjust the
>> order of how the initdb options are assembled in Cluster.pm a bit.
>>
>> I will work on the patch that flips the default next.
> 
> The patch that flips the default has been committed.
> 
> I also started a PG18 open items page and made a note that we follow up
> on the upgrade experience, as was discussed in this thread.
> 
> https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
> 

Regarding the open item, can someone explain what exactly are we
planning to evaluate mid-beta?

We went through the open items on the RMT team meeting today, and my
impression was the questions are mostly about performance of having
checksums by default, but now I realize the thread talks about "upgrade
experience" which seems fairly wide.

So, what kind of data we expect to gather in order to evaluate this?
Who's expected to collect it and evaluate this?

regards

-- 
Tomas Vondra

Re: Enable data checksums by default

From

Christoph Berg

Date:

23 April, 18:28:22

Re: Tomas Vondra
> We went through the open items on the RMT team meeting today, and my
> impression was the questions are mostly about performance of having
> checksums by default, but now I realize the thread talks about "upgrade
> experience" which seems fairly wide.

Fwiw, Debian's pg_upgradecluster already knows about that change. With
the default "dump" upgrade method, clusters will be transitioned to
checksums enabled, while "upgrade" (= pg_upgrade) upgrades will stick
with whatever was there before.

Christoph

Re: Enable data checksums by default

From

Peter Eisentraut

Date:

24 April, 14:26:04

On 23.04.25 00:24, Tomas Vondra wrote:
>> The patch that flips the default has been committed.
>>
>> I also started a PG18 open items page and made a note that we follow up
>> on the upgrade experience, as was discussed in this thread.
>>
>> https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
>>
> Regarding the open item, can someone explain what exactly are we
> planning to evaluate mid-beta?

If you have a PG <=17 installation without checksums (the default), and 
you do the usual upgrade procedure to PG 18 involving initdb + 
pg_upgrade, then pg_upgrade will reject the upgrade, because the 
checksum settings don't match.  The workaround is to run initdb with 
--no-data-checksums and try again.

That's probably not all that bad, but if this is all below a bunch of 
layers of scripts, users will have to do some work on their end to get 
this working smoothly.

The point of the open item was (a) to make sure this is adequately 
documented, for instance in the release notes, (b) to think about 
technological solutions to simplify this, such as [0], and (c) to just 
check the general feedback.

Nothing from [0] ended up being committed, so that part of obsolete. 
The action for beta1 is (a).  And then for (c) perhaps monitor the 
feedback between beta1 and beta2.

[0]: 
https://www.postgresql.org/message-id/flat/57957aca-3eae-4106-afb2-3008122b9950%40eisentraut.org

Re: Enable data checksums by default

From

Heikki Linnakangas

Date:

23 May, 11:10:47

On 24/04/2025 14:26, Peter Eisentraut wrote:
> On 23.04.25 00:24, Tomas Vondra wrote:
>>> The patch that flips the default has been committed.
>>>
>>> I also started a PG18 open items page and made a note that we follow up
>>> on the upgrade experience, as was discussed in this thread.
>>>
>>> https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
>>>
>> Regarding the open item, can someone explain what exactly are we
>> planning to evaluate mid-beta?
> 
> If you have a PG <=17 installation without checksums (the default), and 
> you do the usual upgrade procedure to PG 18 involving initdb + 
> pg_upgrade, then pg_upgrade will reject the upgrade, because the 
> checksum settings don't match.  The workaround is to run initdb with -- 
> no-data-checksums and try again.
> 
> That's probably not all that bad, but if this is all below a bunch of 
> layers of scripts, users will have to do some work on their end to get 
> this working smoothly.
> 
> The point of the open item was (a) to make sure this is adequately 
> documented, for instance in the release notes, (b) to think about 
> technological solutions to simplify this, such as [0], and (c) to just 
> check the general feedback.
> 
> Nothing from [0] ended up being committed, so that part of obsolete. The 
> action for beta1 is (a).  And then for (c) perhaps monitor the feedback 
> between beta1 and beta2.
> 
> 
> [0]: https://www.postgresql.org/message-id/flat/57957aca-3eae-4106- 
> afb2-3008122b9950%40eisentraut.org

Ping: It's time to do something about this open item. (Or decide to do 
nothing I guess). We're already in beta, but at the same time, we're 
still early in the beta and now is the last chance for code changes 
before 18 is shipped.

Aside from just documenting it, I see two things we could do:

1. Have pg_upgrade run initdb for you. It's always felt silly that you 
need to run initdb with the new version yourself, when there's really 
only one correct way to do it. pg_upgrade has all the checks to verify 
that you did it right, so why doesn't it just do it itself? I think 
that'd be a good long-term solution. Might be too late for 18, but I'm 
not sure. If someone wrote the patch we could evaluate it. To use that 
mode, the scripts calling pg_upgrade would need to be changed, though, 
so we'd perhaps want to do #2 or something else in addition to this.

2. If the new cluster has checksums enabled, but the old one has them 
disabled, have pg_upgrade disable checksums in the new cluster.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Enable data checksums by default

From

Daniel Gustafsson

Date:

23 May, 12:25:56

> On 23 May 2025, at 10:10, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

> Aside from just documenting it, I see two things we could do:
>
> 1. Have pg_upgrade run initdb for you. It's always felt silly that you need to run initdb with the new version
yourself,when there's really only one correct way to do it. pg_upgrade has all the checks to verify that you did it
right,so why doesn't it just do it itself? I think that'd be a good long-term solution. Might be too late for 18, but
I'mnot sure. If someone wrote the patch we could evaluate it. To use that mode, the scripts calling pg_upgrade would
needto be changed, though, so we'd perhaps want to do #2 or something else in addition to this. 

I can see this being desired longer term, but as you mention there is likely to
be many moving parts outside of our immediate control making it much harder
than just adding the call to initdb.  It doesn't seem like a post-beta patch to
me given the implications for packagers and others in the ecosystem.

> 2. If the new cluster has checksums enabled, but the old one has them disabled, have pg_upgrade disable checksums in
thenew cluster. 

IF we do this it should be Very visible, since a user otherwise might think
that their upgraded cluster will have checksums since they added them in
initdb.

I think we should document how to deal with checksums in upgrades, and perhaps
even tweak the errormessage in the pg_upgrade check with explanatory comments
if needed, and leave the functionality as is today.

--
Daniel Gustafsson

Re: Enable data checksums by default

From

Bruce Momjian

Date:

24 May, 00:08:59

On Fri, May 23, 2025 at 11:10:47AM +0300, Heikki Linnakangas wrote:
> On 24/04/2025 14:26, Peter Eisentraut wrote:
> > action for beta1 is (a).  And then for (c) perhaps monitor the feedback
> > between beta1 and beta2.
> > 
> 
> Ping: It's time to do something about this open item. (Or decide to do
> nothing I guess). We're already in beta, but at the same time, we're still
> early in the beta and now is the last chance for code changes before 18 is
> shipped.
> 
> Aside from just documenting it, I see two things we could do:
> 
> 1. Have pg_upgrade run initdb for you. It's always felt silly that you need
> to run initdb with the new version yourself, when there's really only one
> correct way to do it. pg_upgrade has all the checks to verify that you did
> it right, so why doesn't it just do it itself? I think that'd be a good
> long-term solution. Might be too late for 18, but I'm not sure. If someone
> wrote the patch we could evaluate it. To use that mode, the scripts calling
> pg_upgrade would need to be changed, though, so we'd perhaps want to do #2
> or something else in addition to this.

The pg_upgrade docs suggest why having pg_upgrade run initdb could be
limiting:

    5. Install extension shared object files

    Many extensions and custom modules, whether from contrib or another
    source, use shared object files (or DLLs), e.g., pgcrypto.so. If
    the old cluster used these, shared object files matching the new
    server binary must be installed in the new cluster, usually via
    operating system commands. Do not load the schema definitions,
    e.g., CREATE EXTENSION pgcrypto, because these will be duplicated
    from the old cluster. If extension updates are available,
    pg_upgrade will report this and create a script that can be run
    later to update them.

    6. Copy custom full-text search files

    Copy any custom full text search files (dictionary, synonym,
    thesaurus, stop words) from the old to the new cluster.

    7. Adjust authentication

    pg_upgrade will connect to the old and new servers several times,
    so you might want to set authentication to peer in pg_hba.conf
    or use a ~/.pgpass file (see Section 32.16).


I have also heard of cases where postgresql.conf must be modified for
pg_upgrade to succeed, and of course initdb installs postgresql.conf.

Should #5 be after #6?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Enable data checksums by default

From

Peter Eisentraut

Date:

04 June, 13:16:09

On 23.05.25 11:43, Tomas Vondra wrote:
>> We don't currently have anything in the release notes that calls this
>> out as a potential upgrading issue, so I propose the attached patch.
>>
> Seems reasonable, although maybe it should say
> 
>    ... so if the old cluster does not have checksums enabled ...
> 
> instead of
> 
>    ... so if the old cluster was initialized without checksums ...
> 
> I mean, what matters is the current state, not how it was initialized.

committed with that change

Re: Enable data checksums by default

From

Christoph Berg

Date:

05 June, 13:37:02

Re: Heikki Linnakangas
> 1. Have pg_upgrade run initdb for you. It's always felt silly that you need
> to run initdb with the new version yourself, when there's really only one
> correct way to do it.

Fwiw, Debian's pg_upgradecluster script would be happy if that problem
would be solved. Currently, we have to do all sorts of inspection of
the old cluster to figure out the locale used, data checksums, and we
are likely missing some more properties that could carried over (like
wal segment size etc). Some of the info isn't even available from the
control file but needs a connection to the running server.

The problem exists for both the pg_upgrade and dump-restore upgrade
methods, so an "initdb --like /path/to/other/cluster" mode would be
handy to have. Perhaps including a "initdb --like --print-command"
mode.

The pg_{backup,restore}cluster scripts have the same problem. It's
unnecessarily complex to recreate an existing setup.

Maybe it would be enough if the initdb options used to create a
cluster would be stored in some file in the data dir. (Perhaps in a
new line in PG_VERSION? As part of the control file?)

Christoph

Re: Enable data checksums by default

From

Peter Eisentraut

Date:

06 June, 14:40:40

On 05.06.25 12:37, Christoph Berg wrote:
> Maybe it would be enough if the initdb options used to create a
> cluster would be stored in some file in the data dir.

That probably wouldn't help if the default behavior changed, as in the 
current case.

Re: Enable data checksums by default

From

Bruce Momjian

Date:

06 June, 17:22:17

On Fri, Jun  6, 2025 at 01:40:40PM +0200, Peter Eisentraut wrote:
> On 05.06.25 12:37, Christoph Berg wrote:
> > Maybe it would be enough if the initdb options used to create a
> > cluster would be stored in some file in the data dir.
> 
> That probably wouldn't help if the default behavior changed, as in the
> current case.

Well, technically we would know the default from the old version, so we
could infer the desired behavior.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.