Re: Index range search optimization - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Index range search optimization
Date
Msg-id CAH2-WzmwQHOFmvO2dAC-Z6DPE0eH5rDh_nFELuWXsKECExNniQ@mail.gmail.com
Whole thread Raw
In response to Re: Index range search optimization  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: Index range search optimization
List pgsql-hackers
On Thu, Sep 14, 2023 at 3:23 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> The attached patch allows Postgres to skip scan keys required for directional scans (even when other keys are present
inthe scan).  I'll soon post the testing results and a more polished version of this patch. 

This is very interesting to me, partly because it seems related to my
ongoing work on SAOP execution within nbtree.

My patch gives _bt_readpage and particularly _bt_checkkeys more
high-level context, which they use to intelligently control the scan.
That enables us to dynamically decide whether we should now perform
another descent of the index via another call to _bt_first, or if we
should prefer to continue on the leaf level for now. Maybe we will
match many distinct sets of array keys on the same leaf page, in the
same call to _bt_readpage. We don't want to miss out on such
opportunities, but we also want to quickly notice when we're on a page
where matching any more array keys is just hopeless.

There is a need to keep these two things in balance. We need to notice
the hopeless cases before wasting too many cycles on it. That creates
a practical need to do an early precheck of the high key (roughly the
same check that we do already). If the high key indicates that
continuing on this page is truly hopeless, then we should give up and
do another primitive index scan -- _bt_first will reposition us onto
the leaf page that we need to go to next, which is (hopefully) far
away from the leaf page we started on.

Your patch therefore has the potential to help my own patch. But, it
also has some potential to conflict with it, because my patch makes
the meaning of SK_BT_REQFWD and SK_BT_REQBKWD more complicated (though
only in cases where we have SK_SEARCHARRAY scan keys). I'm sure that
this can be managed sensibly, though.

Some feedback on your patch:

* I notice that you're not using the high key for this, even in a
forward scan -- you're using the last non-pivot tuple on the page. Why
is that? (I have some idea why, actually, but I'd like to hear your
thoughts first.)

* Separately, I don't think that it makes sense to use the same
requiredDirMatched value (which came from the last non-pivot tuple on
the page) when the special _bt_checkkeys call for the high key takes
place. I don't think that this will lead to wrong answers, but it's
weird, and is likely to defeat the existing optimization in some
important cases.

Due to the influence of suffix truncation, it's relatively likely that
the most significant column in the high key will be different to the
corresponding value from the last few non-pivot tuples on the page --
the high key tends to be "aligned with natural boundaries in the key
space", and so "gives us a good preview of the right sibling page". We
don't want to treat it the same as non-pivot tuples here, because it's
quite different, in ways that are subtle but still important.

* I would avoid using the terminology "preprocess scan keys" for this.
That exact terminology is already used to describe what
_bt_preprocess_keys() does.

That function is actually involved in Konstantin's patch, so that
could be very confusing. When we "preprocess" a scan key, it outputs a
processed scan key with markings such as the required markings that
you're using in the patch -- it's something that acts on/changes the
scan keys themselves. Whereas your patch is exploiting information
from already-processed scan keys, by applying it to the key space of a
page.

I suggest calling it "prechecking the page", or something like that. I
don't feel very strongly about what you call it, provided it isn't
confusing or ambiguous.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Pierre Ducroquet
Date:
Subject: Re: Improvements in pg_dump/pg_restore toc format and performances
Next
From: Andres Freund
Date:
Subject: Re: Sync scan & regression tests