Re: Improve search for missing parent downlinks in amcheck - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Improve search for missing parent downlinks in amcheck
Date
Msg-id CAPpHfduTWj00vPXTBfGd=apEiqyO3Hf+PEXgHC=6wDuJ8RBZHg@mail.gmail.com
Whole thread Raw
In response to Re: Improve search for missing parent downlinks in amcheck  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Improve search for missing parent downlinks in amcheck
List pgsql-hackers
On Tue, Aug 13, 2019 at 11:44 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > In this revision check for missing downlinks is combined with
> > bt_downlink_check().  So, pages visited by bt_downlink_check() patch
> > doesn't cause extra accessed.  It only causes following additional
> > page accesses:
> > 1) Downlinks corresponding to "negative infinity" keys,
> > 2) Pages of child level, which are not referenced by downlinks.
> >
> > But I think these two kinds are very minority, and those accesses
> > could be trade off with more precise missing downlink check without
> > bloom filter.  What do you think?
>
> I am generally in favor of making the downlink check absolutely
> reliable, and am not too worried about the modest additional overhead.
> After all, bt_index_parent_check() is supposed to be thorough though
> expensive. The only reason that I used a Bloom filter for
> fingerprinting downlink blocks was that it seemed important to quickly
> get amcheck coverage for subtle multi-level page deletion bugs just
> after v11 feature freeze. We can now come up with a better design for
> that.

Great!

> I was confused about how this patch worked at first. But then I
> remembered that Lehman and Yao consider downlinks to be distinct
> things to separator keys in internal pages. The high key of an
> internal page in the final separator key, so you have n downlinks and
> n + 1 separator keys per internal page -- two distinct things that
> appear in alternating order (the negative infinity item is not
> considered to have any separator key here). So, while internal page
> items are explicitly "(downlink, separator)" pairs that are grouped
> into a single tuple in nbtree, with a separate tuple just for the high
> key, Lehman and Yao would find it just as natural to treat them as
> "(separator, downlink)" pairs. You have to skip the first downlink on
> each internal level to make that work, but this makes our
> bt_downlink_check() check have something from the target page (child's
> parent page) that is like the high key in the child.
>
> It's already really confusing that we don't quite mean the same thing
> as Lehman and Yao when we say downlink (See also my long "why a
> highkey is never truly a copy of another item" comment block within
> bt_target_page_check()), and that is not your patch's fault. But maybe
> we need to fix that to make your patch easier to understand. (i.e.
> maybe we need to go over every use of the word "downlink" in nbtree,
> and make it say something more precise, to make everything less
> confusing.)

I agree that current terms nbtree use to describe downlinks and
separator keys may be confusing.  I'll try to fix this and come up
with patch if succeed.

> Other feedback:
>
> * Did you miss the opportunity to verify that every high key matches
> its right sibling page's downlink tuple in parent page? We talked
> about this already, but you don't seem to match the key (you only
> match the downlink block).
>
> * You are decoupling the new check from the bt_index_parent_check()
> "heapallindexed" option. That's probably a good thing, but you need to
> update the sgml docs.
>
> * Didn't you forget to remove the BtreeCheckState.rightsplit flag?
>
> * You've significantly changed the behavior of bt_downlink_check() --
> I would note this in its header comments. This is where ~99% of the
> new work happens.
>
> * I don't like that you use the loaded term "target" here -- anything
> called "target" should be BtreeCheckState.target:
>
> >  static void
> > -bt_downlink_missing_check(BtreeCheckState *state)
> > +bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
> > +                         BlockNumber targetblock, Page target)
> >  {
>
> If it's unclear what I mean, this old comment should make it clearer:
>
> /*
>  * State associated with verifying a B-Tree index
>  *
>  * target is the point of reference for a verification operation.
>  *
>  * Other B-Tree pages may be allocated, but those are always auxiliary (e.g.,
>  * they are current target's child pages).  Conceptually, problems are only
>  * ever found in the current target page (or for a particular heap tuple during
>  * heapallindexed verification).  Each page found by verification's left/right,
>  * top/bottom scan becomes the target exactly once.
>  */

The revised patch seems to fix all of above.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Zedstore - compressed in-core columnar storage
Next
From: Alexander Korotkov
Date:
Subject: Re: Support for jsonpath .datetime() method