Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Date
Msg-id CAH2-Wz=AcYC5hNChERUQLsXHz4Fgf8fAqjUH1_JP68UEcrcn4Q@mail.gmail.com
Whole thread Raw
In response to Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
List pgsql-hackers
On Sat, Mar 16, 2019 at 1:44 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > It would be nice if you could take a look at the amcheck "relocate"
> > patch
> When I started looking at this, I thought that "relocate" means "move".
> So I thought that the new mode would actually move tuples, i.e. that it
> would modify the index. That sounded horrible. Of course, it doesn't
> actually do that. It just checks that each tuple can be re-found, or
> "relocated", by descending the tree from the root. I'd suggest changing
> the language to avoid that confusion.

Okay. What do you suggest? :-)

> It seems like a nice way to catch all kinds of index corruption issues.
> Although, we already check that the tuples are in order within the page.
> Is it really necessary to traverse the tree for every tuple, as well?
> Maybe do it just for the first and last item?

It's mainly intended as a developer option. I want it to be possible
to detect any form of corruption, however unlikely. It's an
adversarial mindset that will at least make me less nervous about the
patch.

> I don't understand this. Can you give an example of this kind of
> inconsistency?

The commit message gives an example, but I suggest trying it out for
yourself. Corrupt the least significant key byte of a root page of a
B-Tree using pg_hexedit. Say it's an index on a text column, then
you'd corrupt the last byte to be something slightly wrong. Then, the
only way to catch it is with "relocate" verification. You'll only miss
a few tuples on a cousin page at the leaf level (those on either side
of the high key that the corrupted separator key in the root was
originally copied from).

The regular checks won't catch this, because the keys are similar
enough one level down. The "minus infinity" item is a kind of a blind
spot, because we cannot do a parent check of its children, because we
don't have the key (it's truncated when the item becomes a right page
minus infinity item, during an internal page split).

-- 
Peter Geoghegan


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Next
From: Alexander Korotkov
Date:
Subject: Re: jsonpath