Re: post-recovery amcheck expectations - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: post-recovery amcheck expectations
Date
Msg-id CAH2-WzmQJoLRSN6xTtK6ROyhDEX-m0+z8WJ8hSDtgrc4moVsRg@mail.gmail.com
Whole thread Raw
In response to Re: post-recovery amcheck expectations  (Noah Misch <noah@leadboat.com>)
Responses Re: post-recovery amcheck expectations
List pgsql-hackers
On Mon, Oct 23, 2023 at 7:28 PM Noah Misch <noah@leadboat.com> wrote:
> > That makes sense to me. I believe that it's not possible to have a
> > string of consecutive sibling pages that are all half-dead (regardless
> > of the BlockNumber order of sibling pages, even). But I'd probably
> > have written the fix in roughly the same way. Although...maybe you
> > should try to detect a string of half-dead pages? Hard to say if it's
> > worth the trouble.
>
> I imagined a string of half-dead siblings could arise in structure like this:
>
>  *               1
>  *          /    |    \
>  *         4 <-> 2 <-> 3
>
> With events like this:
>
> - DELETE renders blk 4 deletable.
> - Crash with concurrent VACUUM, leaving 4 half-dead after having visited 1-4.
> - DELETE renders blk 2 deletable.
> - Crash with concurrent VACUUM, leaving 2 half-dead after having visited 1-2.
>
> I didn't try to reproduce that, and something may well prevent it.

FWIW a couple of factors prevent it (in the absence of corruption). These are:

1. Only VACUUM can delete pages, and in general the only possible
source of half-dead pages is an unfortunately timed crash/error within
VACUUM. Each interrupted VACUUM can leave behind at most one half-dead
page.

2. One thing that makes VACUUM back out of deleting an empty page is
the presence of a half-dead right sibling leaf page left behind by
some VACUUM that was interrupted at some point in the past -- see
_bt_rightsib_halfdeadflag() for details.

Obviously, factors 1 and 2 together make three consecutive half-dead
sibling pages impossible. I'm not quite prepared to say that even two
neighboring half-dead sibling pages are an impossibility right now,
but I think that it might well be. Possibly for reasons that are more
accidental than anything else (is the _bt_rightsib_halfdeadflag thing
a "real invariant" or just something we do because we don't want to
add additional complicated handling to nbtpage.c?), so I'll avoid
going into further detail for now.

I'm pointing this out because it argues for softening the wording
about "accept[ing] an arbitrarily-long chain of half-dead,
sibling-linked pages to the left" from your patch.

I was also wondering (mostly to myself) about the relationship (if
any) between the _bt_rightsib_halfdeadflag/_bt_leftsib_splitflag
"invariants" and the bt_child_highkey_check() check. But I don't think
that it makes sense to put that in scope -- your fix seems like a
strict improvement. This relationship is perhaps in scope here to the
limited extent that talking about strings of consecutive half-dead
pages might make it even harder to understand the design of the
bt_child_highkey_check() check. On the other hand...I'm not sure that
I understand every nuance of it myself.

> > Suggest adding a CHECK_FOR_INTERRUPTS() call to the loop, too, just
> > for good luck.
>
> Added.  That gave me the idea to check for circular links, like other parts of
> amcheck do.

Good idea.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Synchronizing slots from primary to standby
Next
From: "David E. Wheeler"
Date:
Subject: Re: Patch: Improve Boolean Predicate JSON Path Docs