Re: Index range search optimization - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Index range search optimization
Date
Msg-id CAH2-WzkE-V_Ap5w42hx3X=DRk_L9DgUPhNWC6APYDdPQ1KODYg@mail.gmail.com
Whole thread Raw
In response to Re: Index range search optimization  (Pavel Borisov <pashkin.elfe@gmail.com>)
Responses Re: Index range search optimization
Re: Index range search optimization
List pgsql-hackers
On Thu, Sep 21, 2023 at 5:11 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> I looked at the patch code and I agree with this optimization.
> Implementation also looks good to me except change :
> + if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) &&
> + !(key->sk_flags & SK_ROW_HEADER))
> + requiredDir = true;
> ...
> - if ((key->sk_flags & SK_BT_REQFWD) &&
> - ScanDirectionIsForward(dir))
> - *continuescan = false;
> - else if ((key->sk_flags & SK_BT_REQBKWD) &&
> - ScanDirectionIsBackward(dir))
> + if (requiredDir)
>   *continuescan = false;
>
> looks like changing behavior in the case when key->sk_flags &
> SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) &&
> (!requiredDirMatched)
> Originally it doesn't set *continuescan = false; and with the patch it will set.

I agree that this is a problem. Inequality strategy scan keys are used
when the initial positioning strategy used by _bt_first (for its
_bt_search call) is based on an operator other than the "=" operator
for the opclass. These scan keys are required in one direction only
(Konstantin's original patch just focussed on these cases, actually).
Obviously, that difference matters. I don't think that this patch
should do anything that even looks like it might be revising the
formal definition of "required in the current scan direction".

Why is SK_ROW_HEADER treated as a special case by the patch? Could it
be related to the issues with required-ness and scan direction? Note
that we never use BTEqualStrategyNumber for SK_ROW_HEADER scan key row
comparisons, so they're only ever required for one scan direction.
(Equality-type row constructor syntax can of course be used without
preventing the system from using an index scan, but the nbtree code
will not see that case as a row comparison in the first place. This is
due to preprocessing by the planner -- nbtree just sees conventional
scan keys with multiple simple equality scan keys with = row
comparisons.)

Also, what about NULLs? While "key IS NULL" is classified as an
equality check (see _bt_preprocess_keys comments), the same isn't true
with "key IS NOT NULL". The latter case usually has scan key flags
"SK_ISNULL | SK_SEARCHNOTNULL | SK_BT_REQFWD" -- there is no
SK_BT_REQBKWD here.

> This may be relevant for the first page when requiredDirMatched is
> intentionally skipped to be set and for call
> _bt_checkkeys(scan, itup, truncatt, dir, &continuescan, false);

Also, requiredDirMatched isn't initialized by _bt_readpage() when
"so->firstPage". Shouldn't it be initialized to false?

Also, don't we need to take more care with a fully empty page? The "if
(!so->firstPage) ... " block should be gated using a condition such as
"if (!so->firstPage && minoff < maxoff)". (Adding a "minoff <= maxoff"
test would also work, but then the optimization will get applied on
pages with only one non-pivot tuple. That would be harmless, but a
waste of cycles.)

> Also naming of requiredDirMatched and requiredDir seems semantically
> hard to understand the meaning without looking at the patch commit
> message. But I don't have better proposals yet, so maybe it's
> acceptable.

I agree. How about "requiredMatchedByPrecheck" instead of
"requiredDirMatched", and "required" instead of "requiredDir"?

It would be nice if this patch worked in a way that could be verified
by an assertion. Under this scheme, the optimization would only really
be used in release builds (builds without assertions enabled, really).
We'd only verify that the optimized case agreed with the slow path in
assert-enabled builds. It might also make sense to always "apply the
optimization" on assert-enabled builds, even for the first page seen
by _bt_readpage by any _bt_first-wise scan. Maybe this sort of
approach is impractical here for some reason, but I don't see why it
should be.

Obviously, the optimization should lower the amount of work in some
calls to _bt_checkkeys, without ever changing the answer _bt_checkkeys
gives. Ideally, it should be done in a way that makes that very
obvious. There are some very subtle interactions between _bt_checkkeys
and other, distant code -- which makes me feel paranoid. Notably, with
required equality strategy scan keys, we're crucially dependent on
_bt_first using an equality strategy for its initial positioning call
to _bt_search. This is described by comments in both _bt_checkkeys and
in _bt_first.

Note, in particular, that it is essential that the initial offnum
passed to _bt_readpage doesn't allow a call to _bt_checkkeys to take
place that could cause it to become confused by a required equality
strategy scan key, leading to _bt_checkkeys terminating the whole scan
"early" -- causing wrong answers. For a query "WHERE foo = 5" (and a
forward scan), we had better not pass _bt_readpage an offset number
for a tuple with "foo" value 4. If that is ever allowed then
_bt_checkkeys will terminate the scan immediately, leading to wrong
answers. All because _bt_checkkeys can't tell if 4 comes before 5 or
comes after 5 -- it only has an "=" operator to work with, so it can't
actually make this distinction, so it likes to assume that anything !=
5 must come after 5 (or before 5 during a backwards scan).

I added a very similar _bt_compare()-based assertion in
_bt_check_unique(), which went on to catch a very subtle bug in the
Postgres 12 nbtree work -- the bug fixed by commit 74eb2176bf. So I
have put this particular idea about asserting agreement between a fast
path and a slow comparison path into practice already.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Eliminate redundant tuple visibility check in vacuum
Next
From: Jeff Davis
Date:
Subject: Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }