Re: pg_upgrade: Support for upgrading to checksums enabled - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: pg_upgrade: Support for upgrading to checksums enabled
Date
Msg-id 0ae64f7c-ab36-431c-9c0d-ec398c12da87@eisentraut.org
Whole thread Raw
In response to Re: pg_upgrade: Support for upgrading to checksums enabled  (Robert Treat <rob@xzilla.net>)
Responses Re: pg_upgrade: Support for upgrading to checksums enabled
List pgsql-hackers
On 21.02.25 00:41, Robert Treat wrote:
> On Tue, Aug 27, 2024 at 5:57 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>>
>> On Mon, Aug 26, 2024 at 08:23:44AM +0200, Peter Eisentraut wrote:
>>> The purpose of this patch is to allow using pg_upgrade between clusters that
>>> have different checksum settings.  When upgrading between instances with
>>> different checksum settings, the --copy (default) mode automatically sets
>>> (or unsets) the checksum on the fly.
>>>
>>> This would be particularly useful if we switched to enabling checksums by
>>> default, as [0] proposes, but it's also useful without that.
>>
>> Given enabling checksums can be rather expensive, I think it makes sense to
>> add a way to do it during pg_upgrade versus asking folks to run
>> pg_checksums separately.  I'd anticipate arguments against enabling
>> checksums automatically, but as you noted, we can move it to a separate
>> option (e.g., --copy --enable-checksums).  Disabling checksums with
>> pg_checksums is fast because it just updates pg_control, so I don't see any
>> need for --disable-checkums in pg_upgrade.
>>
> 
> The repercussions of either enabling or disabling checksums on
> accident is quite high (not for pg_upgrade, but for $futureDBA), so
> ISTM an explicit flag for BOTH is the right way to go. In that
> scenario, pg_upgrade would check to make sure the clusters match and
> then make the appropriate suggestion. In the case someone did
> something like --enable-checksums and --link, again, we'd toss an
> error that --copy mode is required to --enable-checksums.

Here is an updated patch that works more along those lines.  It adds a 
pg_upgrade option --update-checksums, which activates the code to 
rewrite the checksums.  You must specify this option if the source and 
target clusters have different checksum settings.

Note that this also works to hypothetically upgrade between future 
different checksum versions (hence "--update-*", not "--enable-*"). 
Also, as the patch is currently written, it is also required to specify 
this option to downgrade from checksums to no-checksums.  (It will then 
write a zero into the checksum place, as it would look if you had never 
used checksums.)  Also, you can optionally specify this option even if 
the checksum settings are the same, then it will recalculate the 
checksums.  Probably not all of this is useful, but perhaps a subset of 
it.  Thoughts?

Also, I still don't know what to do about the Windows code path in 
copyFile().  We could just not support this feature on Windows?  Or 
maybe the notionally correct thing to do would be to move that code into 
copyFileByRange().  But then we'd need a different default on Windows 
and it would require more documentation.  I don't know what to do here 
and I don't have enough context to make a suggestion.  But if we don't 
answer this, I don't think we can move ahead with this feature.

Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Statistics import and export: difference in statistics of materialized view dumped
Next
From: Ashutosh Bapat
Date:
Subject: Re: Test to dump and restore objects left behind by regression