Thread: Re: pg_upgrade: Support for upgrading to checksums enabled

Re: pg_upgrade: Support for upgrading to checksums enabled

From
Nathan Bossart
Date:
On Mon, Aug 26, 2024 at 08:23:44AM +0200, Peter Eisentraut wrote:
> The purpose of this patch is to allow using pg_upgrade between clusters that
> have different checksum settings.  When upgrading between instances with
> different checksum settings, the --copy (default) mode automatically sets
> (or unsets) the checksum on the fly.
> 
> This would be particularly useful if we switched to enabling checksums by
> default, as [0] proposes, but it's also useful without that.

Given enabling checksums can be rather expensive, I think it makes sense to
add a way to do it during pg_upgrade versus asking folks to run
pg_checksums separately.  I'd anticipate arguments against enabling
checksums automatically, but as you noted, we can move it to a separate
option (e.g., --copy --enable-checksums).  Disabling checksums with
pg_checksums is fast because it just updates pg_control, so I don't see any
need for --disable-checkums in pg_upgrade.

> - Windows has a separate code path in the --copy mode.  I don't know the
> reasons or advantages of that.  So it's not clear how the checksum rewriting
> mode should be handled in that case.  We could switch to the non-Windows
> code path in that case, but then the performance difference between the
> normal path and the checksum-rewriting path is even more unclear.

AFAICT the separate Windows path dates back to before pg_upgrade was first
added to the Postgres tree, and unfortunately I couldn't find any
discussion about it.

-- 
nathan



Re: pg_upgrade: Support for upgrading to checksums enabled

From
Robert Treat
Date:
On Tue, Aug 27, 2024 at 5:57 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Mon, Aug 26, 2024 at 08:23:44AM +0200, Peter Eisentraut wrote:
> > The purpose of this patch is to allow using pg_upgrade between clusters that
> > have different checksum settings.  When upgrading between instances with
> > different checksum settings, the --copy (default) mode automatically sets
> > (or unsets) the checksum on the fly.
> >
> > This would be particularly useful if we switched to enabling checksums by
> > default, as [0] proposes, but it's also useful without that.
>
> Given enabling checksums can be rather expensive, I think it makes sense to
> add a way to do it during pg_upgrade versus asking folks to run
> pg_checksums separately.  I'd anticipate arguments against enabling
> checksums automatically, but as you noted, we can move it to a separate
> option (e.g., --copy --enable-checksums).  Disabling checksums with
> pg_checksums is fast because it just updates pg_control, so I don't see any
> need for --disable-checkums in pg_upgrade.
>

The repercussions of either enabling or disabling checksums on
accident is quite high (not for pg_upgrade, but for $futureDBA), so
ISTM an explicit flag for BOTH is the right way to go. In that
scenario, pg_upgrade would check to make sure the clusters match and
then make the appropriate suggestion. In the case someone did
something like --enable-checksums and --link, again, we'd toss an
error that --copy mode is required to --enable-checksums.

Robert Treat
https://xzilla.net



Re: pg_upgrade: Support for upgrading to checksums enabled

From
Peter Eisentraut
Date:
On 21.02.25 00:41, Robert Treat wrote:
> On Tue, Aug 27, 2024 at 5:57 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>>
>> On Mon, Aug 26, 2024 at 08:23:44AM +0200, Peter Eisentraut wrote:
>>> The purpose of this patch is to allow using pg_upgrade between clusters that
>>> have different checksum settings.  When upgrading between instances with
>>> different checksum settings, the --copy (default) mode automatically sets
>>> (or unsets) the checksum on the fly.
>>>
>>> This would be particularly useful if we switched to enabling checksums by
>>> default, as [0] proposes, but it's also useful without that.
>>
>> Given enabling checksums can be rather expensive, I think it makes sense to
>> add a way to do it during pg_upgrade versus asking folks to run
>> pg_checksums separately.  I'd anticipate arguments against enabling
>> checksums automatically, but as you noted, we can move it to a separate
>> option (e.g., --copy --enable-checksums).  Disabling checksums with
>> pg_checksums is fast because it just updates pg_control, so I don't see any
>> need for --disable-checkums in pg_upgrade.
>>
> 
> The repercussions of either enabling or disabling checksums on
> accident is quite high (not for pg_upgrade, but for $futureDBA), so
> ISTM an explicit flag for BOTH is the right way to go. In that
> scenario, pg_upgrade would check to make sure the clusters match and
> then make the appropriate suggestion. In the case someone did
> something like --enable-checksums and --link, again, we'd toss an
> error that --copy mode is required to --enable-checksums.

Here is an updated patch that works more along those lines.  It adds a 
pg_upgrade option --update-checksums, which activates the code to 
rewrite the checksums.  You must specify this option if the source and 
target clusters have different checksum settings.

Note that this also works to hypothetically upgrade between future 
different checksum versions (hence "--update-*", not "--enable-*"). 
Also, as the patch is currently written, it is also required to specify 
this option to downgrade from checksums to no-checksums.  (It will then 
write a zero into the checksum place, as it would look if you had never 
used checksums.)  Also, you can optionally specify this option even if 
the checksum settings are the same, then it will recalculate the 
checksums.  Probably not all of this is useful, but perhaps a subset of 
it.  Thoughts?

Also, I still don't know what to do about the Windows code path in 
copyFile().  We could just not support this feature on Windows?  Or 
maybe the notionally correct thing to do would be to move that code into 
copyFileByRange().  But then we'd need a different default on Windows 
and it would require more documentation.  I don't know what to do here 
and I don't have enough context to make a suggestion.  But if we don't 
answer this, I don't think we can move ahead with this feature.

Attachment

Re: pg_upgrade: Support for upgrading to checksums enabled

From
Peter Eisentraut
Date:
On 11.03.25 11:42, Peter Eisentraut wrote:
> Here is an updated patch that works more along those lines.  It adds a 
> pg_upgrade option --update-checksums, which activates the code to 
> rewrite the checksums.  You must specify this option if the source and 
> target clusters have different checksum settings.
> 
> Note that this also works to hypothetically upgrade between future 
> different checksum versions (hence "--update-*", not "--enable-*"). 
> Also, as the patch is currently written, it is also required to specify 
> this option to downgrade from checksums to no-checksums.  (It will then 
> write a zero into the checksum place, as it would look if you had never 
> used checksums.)  Also, you can optionally specify this option even if 
> the checksum settings are the same, then it will recalculate the 
> checksums.  Probably not all of this is useful, but perhaps a subset of 
> it.  Thoughts?
> 
> Also, I still don't know what to do about the Windows code path in 
> copyFile().  We could just not support this feature on Windows?  Or 
> maybe the notionally correct thing to do would be to move that code into 
> copyFileByRange().  But then we'd need a different default on Windows 
> and it would require more documentation.  I don't know what to do here 
> and I don't have enough context to make a suggestion.  But if we don't 
> answer this, I don't think we can move ahead with this feature.

I'm not sensing much enthusiasm for this feature or for working out the 
remaining problems, so I'm closing this commitfest entry.




Re: pg_upgrade: Support for upgrading to checksums enabled

From
Robert Treat
Date:
On Thu, Apr 3, 2025 at 3:25 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 11.03.25 11:42, Peter Eisentraut wrote:
> > Here is an updated patch that works more along those lines.  It adds a
> > pg_upgrade option --update-checksums, which activates the code to
> > rewrite the checksums.  You must specify this option if the source and
> > target clusters have different checksum settings.
> >
> > Note that this also works to hypothetically upgrade between future
> > different checksum versions (hence "--update-*", not "--enable-*").
> > Also, as the patch is currently written, it is also required to specify
> > this option to downgrade from checksums to no-checksums.  (It will then
> > write a zero into the checksum place, as it would look if you had never
> > used checksums.)  Also, you can optionally specify this option even if
> > the checksum settings are the same, then it will recalculate the
> > checksums.  Probably not all of this is useful, but perhaps a subset of
> > it.  Thoughts?
> >
> > Also, I still don't know what to do about the Windows code path in
> > copyFile().  We could just not support this feature on Windows?  Or
> > maybe the notionally correct thing to do would be to move that code into
> > copyFileByRange().  But then we'd need a different default on Windows
> > and it would require more documentation.  I don't know what to do here
> > and I don't have enough context to make a suggestion.  But if we don't
> > answer this, I don't think we can move ahead with this feature.
>
> I'm not sensing much enthusiasm for this feature or for working out the
> remaining problems, so I'm closing this commitfest entry.
>

That's unfortunate; I think there is enthusiasm for the feature, just
not enough to overcome the questions around Windows support.


Robert Treat
https://xzilla.net