Re: Improve search for missing parent downlinks in amcheck - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Improve search for missing parent downlinks in amcheck
Date
Msg-id CAH2-WzmRtYu-HXEb0o_zMzaHWbft7Lv=F25B=Cba200hQcsm3w@mail.gmail.com
Whole thread Raw
In response to Re: Improve search for missing parent downlinks in amcheck  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: Improve search for missing parent downlinks in amcheck
List pgsql-hackers
On Mon, Aug 12, 2019 at 12:01 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> BTW, there is next revision of patch I'm proposing for v13.

Cool.

> In this revision check for missing downlinks is combined with
> bt_downlink_check().  So, pages visited by bt_downlink_check() patch
> doesn't cause extra accessed.  It only causes following additional
> page accesses:
> 1) Downlinks corresponding to "negative infinity" keys,
> 2) Pages of child level, which are not referenced by downlinks.
>
> But I think these two kinds are very minority, and those accesses
> could be trade off with more precise missing downlink check without
> bloom filter.  What do you think?

I am generally in favor of making the downlink check absolutely
reliable, and am not too worried about the modest additional overhead.
After all, bt_index_parent_check() is supposed to be thorough though
expensive. The only reason that I used a Bloom filter for
fingerprinting downlink blocks was that it seemed important to quickly
get amcheck coverage for subtle multi-level page deletion bugs just
after v11 feature freeze. We can now come up with a better design for
that.

I was confused about how this patch worked at first. But then I
remembered that Lehman and Yao consider downlinks to be distinct
things to separator keys in internal pages. The high key of an
internal page in the final separator key, so you have n downlinks and
n + 1 separator keys per internal page -- two distinct things that
appear in alternating order (the negative infinity item is not
considered to have any separator key here). So, while internal page
items are explicitly "(downlink, separator)" pairs that are grouped
into a single tuple in nbtree, with a separate tuple just for the high
key, Lehman and Yao would find it just as natural to treat them as
"(separator, downlink)" pairs. You have to skip the first downlink on
each internal level to make that work, but this makes our
bt_downlink_check() check have something from the target page (child's
parent page) that is like the high key in the child.

It's already really confusing that we don't quite mean the same thing
as Lehman and Yao when we say downlink (See also my long "why a
highkey is never truly a copy of another item" comment block within
bt_target_page_check()), and that is not your patch's fault. But maybe
we need to fix that to make your patch easier to understand. (i.e.
maybe we need to go over every use of the word "downlink" in nbtree,
and make it say something more precise, to make everything less
confusing.)

Other feedback:

* Did you miss the opportunity to verify that every high key matches
its right sibling page's downlink tuple in parent page? We talked
about this already, but you don't seem to match the key (you only
match the downlink block).

* You are decoupling the new check from the bt_index_parent_check()
"heapallindexed" option. That's probably a good thing, but you need to
update the sgml docs.

* Didn't you forget to remove the BtreeCheckState.rightsplit flag?

* You've significantly changed the behavior of bt_downlink_check() --
I would note this in its header comments. This is where ~99% of the
new work happens.

* I don't like that you use the loaded term "target" here -- anything
called "target" should be BtreeCheckState.target:

>  static void
> -bt_downlink_missing_check(BtreeCheckState *state)
> +bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
> +                         BlockNumber targetblock, Page target)
>  {

If it's unclear what I mean, this old comment should make it clearer:

/*
 * State associated with verifying a B-Tree index
 *
 * target is the point of reference for a verification operation.
 *
 * Other B-Tree pages may be allocated, but those are always auxiliary (e.g.,
 * they are current target's child pages).  Conceptually, problems are only
 * ever found in the current target page (or for a particular heap tuple during
 * heapallindexed verification).  Each page found by verification's left/right,
 * top/bottom scan becomes the target exactly once.
 */

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Dmitry Igrishin
Date:
Subject: Re: Small patch to fix build on Windows
Next
From: Ibrar Ahmed
Date:
Subject: Re: block-level incremental backup