Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Date
Msg-id c248f8a3-7054-3338-73ec-5cd5df403c8a@iki.fi
Whole thread Raw
In response to Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
List pgsql-hackers
On 16/03/2019 06:16, Peter Geoghegan wrote:
> On Thu, Mar 14, 2019 at 2:21 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> It doesn't matter how often it happens, the code still needs to deal
>> with it. So let's try to make it as readable as possible.
> 
>> Well, IMHO holding the buffer and the bounds in the new struct is more
>> clean than the savebinsrc/restorebinsrch flags. That's exactly why I
>> suggested it. I don't know what else to suggest. I haven't done any
>> benchmarking, but I doubt there's any measurable difference.
> 
> Fair enough. Attached is v17, which does it using the approach taken
> in your earlier prototype. I even came around to your view on
> _bt_binsrch_insert() -- I kept that part, too. Note, however, that I
> still pass checkingunique to _bt_findinsertloc(), because that's a
> distinct condition to whether or not bounds were cached (they happen
> to be the same thing right now, but I don't want to assume that).
> 
> This revision also integrates your approach to the "continuescan"
> optimization patch, with the small tweak I mentioned yesterday (we
> also pass ntupatts). I also prefer this approach.

Great, thank you!

> It would be nice if you could take a look at the amcheck "relocate"
> patch
When I started looking at this, I thought that "relocate" means "move". 
So I thought that the new mode would actually move tuples, i.e. that it 
would modify the index. That sounded horrible. Of course, it doesn't 
actually do that. It just checks that each tuple can be re-found, or 
"relocated", by descending the tree from the root. I'd suggest changing 
the language to avoid that confusion.

It seems like a nice way to catch all kinds of index corruption issues. 
Although, we already check that the tuples are in order within the page. 
Is it really necessary to traverse the tree for every tuple, as well? 
Maybe do it just for the first and last item?

> + * This routine can detect very subtle transitive consistency issues across
> + * more than one level of the tree.  Leaf pages all have a high key (even the
> + * rightmost page has a conceptual positive infinity high key), but not a low
> + * key.  Their downlink in parent is a lower bound, which along with the high
> + * key is almost enough to detect every possible inconsistency.  A downlink
> + * separator key value won't always be available from parent, though, because
> + * the first items of internal pages are negative infinity items, truncated
> + * down to zero attributes during internal page splits.  While it's true that
> + * bt_downlink_check() and the high key check can detect most imaginable key
> + * space problems, there are remaining problems it won't detect with non-pivot
> + * tuples in cousin leaf pages.  Starting a search from the root for every
> + * existing leaf tuple detects small inconsistencies in upper levels of the
> + * tree that cannot be detected any other way.  (Besides all this, it's
> + * probably a useful testing strategy to exhaustively verify that all
> + * non-pivot tuples can be relocated in the index using the same code paths as
> + * those used by index scans.)

I don't understand this. Can you give an example of this kind of 
inconsistency?

- Heikki


pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: RE: Timeout parameters
Next
From: Peter Geoghegan
Date:
Subject: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons