Re: VACUUM can finish an interrupted nbtree page split -- is that okay? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: VACUUM can finish an interrupted nbtree page split -- is that okay?
Date
Msg-id CAH2-Wzm_ntmqJjWLRyKzimFmFvk+BnVAvUpaA4s1h9Ja58woaQ@mail.gmail.com
Whole thread Raw
In response to VACUUM can finish an interrupted nbtree page split -- is that okay?  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: VACUUM can finish an interrupted nbtree page split -- is thatokay?  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Fri, Mar 1, 2019 at 3:59 PM Peter Geoghegan <pg@bowt.ie> wrote:
>             /*
>              * Perform the same check on this internal level that
>              * _bt_mark_page_halfdead performed on the leaf level.
>              */
>             if (_bt_is_page_halfdead(rel, *rightsib))

> I thought that internal pages were never half-dead after Postgres 9.4.
> If that happens, then the check within _bt_pagedel() will throw an
> ERRCODE_INDEX_CORRUPTED error, and tell the DBA to REINDEX. Shouldn't
> this internal level _bt_is_page_halfdead() check contain a "can't
> happen" error, or even a simple assertion?

I think that we should get rid of this code on HEAD shortly, because
it's effectively dead code. You don't have to know anything about
B-Trees to see why this must be true: VACUUM is specifically checking
if an internal page is half-dead here, even though it's already
treating half-dead internal pages as ipso facto corrupt in higher
level code (it's the first thing we check in _bt_pagedel()). This is
clearly contradictory. If there is a half-dead internal page, then
there is no danger of VACUUM not complaining loudly that you need to
REINDEX. This has been true since the page deletion overhaul that went
into 9.4.

Attached patch removes the internal page check, and adds a comment
that explains why it's sufficient to check on the leaf level alone.
Admittedly, it's much easier to understand why the
_bt_is_page_halfdead() internal page check is useless than it is to
understand why replacing it with this comment is helpful. My
observation that a half-dead leaf page is representative of the
subtree whose root the leaf page has stored as its "top parent" is
hardly controversial, though -- that's the whole basis of multi-level
page deletion. If you *visualize* how multi-level deletion works, and
consider its rightmost-in-subtree restriction, then it isn't hard to
see why everything works out with just the leaf level
right-sibling-is-half-dead check:

We can only have two adjacent "skinny" pending-deletion subtrees in
cases where the removed check might seem to be helpful -- each page is
both the leftmost and the rightmost on its level in its subtree. It's
okay to just check if the leaf is half-dead because it "owns" exactly
the same range in the keyspace as the internal pages up to and
including its top parent, if any, and because it is marked half-dead
by the same atomic operation that does initial removal of downlinks in
an ancestor page.

I'm fine with waiting until we branch-off v12 before pushing the
patch, even though it seems low risk.

--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: range_agg
Next
From: John Naylor
Date:
Subject: Re: Unhappy about API changes in the no-fsm-for-small-rels patch