Postgres 14 commit 5b861baa55 added hardening to nbtree page deletion.
This had the effect of making nbtree VACUUM robust against misbehaving
operator classes -- we just LOG the problem and move on, without
throwing an error. In practice a "misbehaving operator class" is often
a problem with collation versioning.
I think that this should be backpatched now, to protect users from
particularly nasty problems that hitting the error eventually leads
to.
An error ends the whole VACUUM operation. If VACUUM cannot delete the
page the first time, there is no reason to think that it'll be any
different on the second or the tenth attempt. The eventual result
(absent user/DBA intervention) is that no antiwraparound autovacuum
will ever complete, leading to an outage when the system hits
xidStopLimit. (Actually this scenario won't result in the system
hitting xidStopLimit where the failsafe is available, but that's
another thing that is only in 14, so that's not any help.)
This seems low risk. The commit in question is very simple. It just
downgrades an old 9.4-era ereport() from ERROR to LOG, and adds a
"return false;" immediately after that. The function in question is
fundamentally structured in a way that allows it to back out of page
deletion because of problems that are far removed from where the
caller starts from. When and why we back out of page deletion is
already opaque to the caller, so it's very hard to imagine a new
problem caused by backpatching. Besides all this, 14 has been out for
a while now.
--
Peter Geoghegan