Re: pg_amcheck contrib application - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: pg_amcheck contrib application
Date
Msg-id CAH2-Wzma5G9CTtMjbrXTwOym+U=aWg-R7=-htySuztgoJLvZXg@mail.gmail.com
Whole thread Raw
In response to Re: pg_amcheck contrib application  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pg_amcheck contrib application
List pgsql-hackers
On Tue, Mar 23, 2021 at 12:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Right, good point. But... does that really apply to
> 005_opclass_damage.pl? I feel like with the kind of physical damage
> we're doing in 003_check.pl, it makes a lot of sense to stop vacuum
> from crashing headlong into that table. But, 005 is doing "logical"
> damage rather than "physical" damage, and I don't see why autovacuum
> should misbehave in that kind of case. In fact, the fact that
> autovacuum can handle such cases is one of the selling points for the
> whole design of vacuum, as opposed to, for example, retail index
> lookups.

FWIW that is only 99.9% true (contrary to what README.HOT says). This
is the case because nbtree page deletion will in fact search the tree
to find a downlink to the target page, which must be removed at the
same time -- see the call to _bt_search() made within nbtpage.c.

This is much less of a problem than you'd think, though, even with an
opclass that gives wrong answers all the time. Because it's also true
that _bt_getstackbuf() is remarkably tolerant when it actually goes to
locate the downlink -- because that happens via a linear search that
matches on downlink block number (it doesn't use the opclass for that
part). This means that we'll accidentally fail to fail if the page is
*somewhere* to the right in the "true" key space. Which probably means
that it has a greater than 50% chance of not failing with a 100%
broken opclass. Which probably makes our odds better with more
plausible levels of misbehavior (e.g. collation changes).

That being said, I should make _bt_lock_subtree_parent() return false
and back out of page deletion without raising an error in the case
where we really cannot locate a valid downlink. We really ought to
soldier on when that happens, since we'll do that for a bunch of other
reasons already. I believe that the only reason we throw an error
today is for parity with the page split case (the main
_bt_getstackbuf() call). But this isn't the same situation at all --
this is VACUUM.

I will make this change to HEAD soon, barring objections.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_upgrade failing for 200+ million Large Objects
Next
From: Tom Lane
Date:
Subject: Re: pg_amcheck contrib application