Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Date
Msg-id CAH2-Wzms0bjLAig=VfkXaOxKczXMS5Y3Yes8_XC3DU+t002xfA@mail.gmail.com
Whole thread Raw
In response to Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
List pgsql-hackers
On Wed, Mar 6, 2019 at 10:15 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I made a copy of the _bt_binsrch, _bt_binsrch_insert. It does the binary
> search like _bt_binsrch does, but the bounds caching is only done in
> _bt_binsrch_insert. Seems more clear to have separate functions for them
> now, even though there's some duplication.

> /*
>   * Do the insertion. First move right to find the correct page to
>   * insert to, if necessary. If we're inserting to a non-unique index,
>   * _bt_search() already did this when it checked if a move to the
>   * right was required for leaf page.  Insertion scankey's scantid
>   * would have been filled out at the time. On a unique index, the
>   * current buffer is the first buffer containing duplicates, however,
>   * so we may need to move right to the correct location for this
>   * tuple.
>   */
> if (checkingunique || itup_key->heapkeyspace)
>         _bt_findinsertpage(rel, &insertstate, stack, heapRel);
>
> newitemoff = _bt_binsrch_insert(rel, &insertstate);

> The attached new version simplifies this, IMHO. The bounds and the
> current buffer go together in the same struct, so it's easier to keep
> track whether the bounds are valid or not.

Now that you have a full understanding of how the negative infinity
sentinel values work, and how page deletion's leaf page search and
!heapkeyspace indexes need to be considered, I think that we should
come back to this _bt_binsrch()/_bt_findsplitloc() stuff. My sense is
that you now have a full understanding of all the subtleties of the
patch, including those that that affect unique index insertion. That
will make it much easier to talk about these unresolved questions.

My current sense is that it isn't useful to store the current buffer
alongside the binary search bounds/hint. It'll hardly ever need to be
invalidated, because we'll hardly ever have to move right within
_bt_findsplitloc() when doing unique index insertion (as I said
before, the regression tests *never* have to do this according to
gcov). We're talking about a very specific set of conditions here, so
I'd like something that's lightweight and specialized. I agree that
the savebinsrch/restorebinsrch fields are a bit ugly, though. I can't
think of anything that's better offhand. Perhaps you can suggest
something that is both lightweight, and an improvement on
savebinsrch/restorebinsrch.

I'm of the opinion that having a separate _bt_binsrch_insert() does
not make anything clearer. Actually, I think that saving the bounds
within the original _bt_binsrch() makes the design of that function
clearer, not less clear. It's all quite confusing at the moment, given
the rightmost/!leaf/page empty special cases. Seeing how the bounds
are reused (or not reused) outside of _bt_binsrch() helps with that.

The first 3 patches seem commitable now, but I think that it's
important to be sure that I've addressed everything you raised
satisfactorily before pushing. Or that everything works in a way that
you can live with, at least.

It would be great if you could take a look at the 'Add high key
"continuescan" optimization' patch, which is the only one you haven't
commented on so far (excluding the amcheck "relocate" patch, which is
less important). I can put that one off for a while after the first 3
go in. I will also put off the "split after new item" commit for at
least a week or two. I'm sure that the idea behind the "continuescan"
patch will now seem pretty obvious to you -- it's just taking
advantage of the high key when an index scan on the leaf level (which
uses a search style scankey, not an insertion style scankey) looks
like it may have to move to the next leaf page, but we'd like to avoid
it where possible. Checking the high key there is now much more likely
to result in the index scan not going to the next page, since we're
more careful when considering a leaf split point these days. The high
key often looks like the items on the page to the right, not the items
on the same page.

Thanks
-- 
Peter Geoghegan


pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Refactoring the checkpointer's fsync request queue
Next
From: David Rowley
Date:
Subject: Re: Inadequate executor locking of indexes