Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Date
Msg-id CAH2-Wz=pTfuTwcYcgVz2nhaynVxQBww=BdPyinxQGYORBMCbEw@mail.gmail.com
Whole thread Raw
In response to Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
List pgsql-hackers
On Sat, Mar 16, 2019 at 1:33 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> AFAICS, there is a copy of every page's high key in its immediate
> parent. Either in the downlink of the right sibling, or as the high key
> of the parent page itself. Cross-checking those would catch any
> corruption in high keys.

I agree that it's always true that the high key is also in the parent,
and we could cross-check that within the child. Actually, I should
probably figure out a way of arranging for the Bloom filter used
within bt_relocate_from_root() (which has been around since PG v11) to
include the key itself when fingerprinting. That would probably
necessitate that we don't truncate "negative infinity" items (it was
actually that way about 18 years ago), just for the benefit of
verification. This is almost the same thing as what Graefe argues for
(don't think that you need a low key on the leaf level, since you can
cross a single level there). I wonder if truncating the negative
infinity item in internal pages to zero attributes is actually worth
it, since a low key might be useful for a number of reasons.

> Note that this would potentially catch some corruption that the
> descend-from-root would not. If you have a mismatch between the high key
> of a leaf page and its parent or grandparent, all the current tuples
> might be pass the rootdescend check. But a tuple might get inserted to
> wrong location later.

I also agree with this. However, the rootdescend check will always
work better than this in some cases that you can at least imagine, for
as long as there are negative infinity items to worry about. (And,
even if we decided not to truncate to support easy verification, there
is still a good argument to be made for involving the authoritative
_bt_search() code at some point).

> > Maybe you could actually do something with the high key from leaf page
> > 5 to detect the stray value "20" in leaf page 6, but again, we don't
> > do anything like that right now.
>
> Hmm, yeah, to check for stray values, you could follow the left-link,
> get the high key of the left sibling, and compare against that.

Graefe argues for retaining a low key, even in leaf pages (the left
page's old high key becomes the left page's low key during a split,
and the left page's new high key becomes the new right pages low key
at the same time). It's useful for what he calls "write-optimized
B-Trees", and maybe even for optional compression. As I said earlier,
I guess you can just go left on the leaf level if you need to, and you
have all you need. But I'd need to think about it some more.

Point taken; rootdescend isn't enough to make verification exactly
perfect. But it makes verification approach being perfect, because
you're going to get right answers to queries as long as it passes (I
think). There could be a future problem for a future insertion that
you could also detect, but can't. But you'd have to be extraordinarily
unlucky to have that happen for any amount of time. Unlucky even by my
own extremely paranoid standard.

-- 
Peter Geoghegan


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fix XML handling with DOCTYPE
Next
From: Tom Lane
Date:
Subject: Re: Fix XML handling with DOCTYPE