Re: Index range search optimization - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Index range search optimization
Date
Msg-id CAPpHfdt9JNftLfB1zfSB-kj6BtL5yxSnztSUyQ76DzOoDKzPPw@mail.gmail.com
Whole thread Raw
In response to Re: Index range search optimization  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Index range search optimization
Re: Index range search optimization
List pgsql-hackers
Hi Peter,
Hi Pavel,

The v4 of the patch is attached.

On Thu, Sep 21, 2023 at 11:48 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Sep 21, 2023 at 5:11 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > I looked at the patch code and I agree with this optimization.
> > Implementation also looks good to me except change :
> > + if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) &&
> > + !(key->sk_flags & SK_ROW_HEADER))
> > + requiredDir = true;
> > ...
> > - if ((key->sk_flags & SK_BT_REQFWD) &&
> > - ScanDirectionIsForward(dir))
> > - *continuescan = false;
> > - else if ((key->sk_flags & SK_BT_REQBKWD) &&
> > - ScanDirectionIsBackward(dir))
> > + if (requiredDir)
> >   *continuescan = false;
> >
> > looks like changing behavior in the case when key->sk_flags &
> > SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) &&
> > (!requiredDirMatched)
> > Originally it doesn't set *continuescan = false; and with the patch it will set.
>
> I agree that this is a problem. Inequality strategy scan keys are used
> when the initial positioning strategy used by _bt_first (for its
> _bt_search call) is based on an operator other than the "=" operator
> for the opclass. These scan keys are required in one direction only
> (Konstantin's original patch just focussed on these cases, actually).
> Obviously, that difference matters. I don't think that this patch
> should do anything that even looks like it might be revising the
> formal definition of "required in the current scan direction".

Sorry, that was messed up from various attempts to write the patch.
Actually, I end up with two boolean variables indicating whether the
current key is required for the same direction or opposite direction
scan.  I believe that the key required for the opposite direction scan
should be already satisfied by _bt_first() except for NULLs case.
I've implemented a skip of calling the key function for this case
(with assert that result is the same).

> Why is SK_ROW_HEADER treated as a special case by the patch? Could it
> be related to the issues with required-ness and scan direction? Note
> that we never use BTEqualStrategyNumber for SK_ROW_HEADER scan key row
> comparisons, so they're only ever required for one scan direction.
> (Equality-type row constructor syntax can of course be used without
> preventing the system from using an index scan, but the nbtree code
> will not see that case as a row comparison in the first place. This is
> due to preprocessing by the planner -- nbtree just sees conventional
> scan keys with multiple simple equality scan keys with = row
> comparisons.)

The thing is that NULLs could appear in the middle of matching values.

# WITH t (a, b) AS (VALUES ('a', 'b'), ('a', NULL), ('b', 'a'))
SELECT a, b, (a, b) > ('a', 'a') FROM t ORDER BY (a, b);
 a |  b   | ?column?
---+------+----------
 a | b    | t
 a | NULL | NULL
 b | a    | t
(3 rows)

So we can't just skip the row comparison operator, because we can meet
NULL at any place.

> > This may be relevant for the first page when requiredDirMatched is
> > intentionally skipped to be set and for call
> > _bt_checkkeys(scan, itup, truncatt, dir, &continuescan, false);
>
> Also, requiredDirMatched isn't initialized by _bt_readpage() when
> "so->firstPage". Shouldn't it be initialized to false?
>
> Also, don't we need to take more care with a fully empty page? The "if
> (!so->firstPage) ... " block should be gated using a condition such as
> "if (!so->firstPage && minoff < maxoff)". (Adding a "minoff <= maxoff"
> test would also work, but then the optimization will get applied on
> pages with only one non-pivot tuple. That would be harmless, but a
> waste of cycles.)

This makes sense.  I've added (minoff < maxoff) to the condition.

> > Also naming of requiredDirMatched and requiredDir seems semantically
> > hard to understand the meaning without looking at the patch commit
> > message. But I don't have better proposals yet, so maybe it's
> > acceptable.
>
> I agree. How about "requiredMatchedByPrecheck" instead of
> "requiredDirMatched", and "required" instead of "requiredDir"?
>
> It would be nice if this patch worked in a way that could be verified
> by an assertion. Under this scheme, the optimization would only really
> be used in release builds (builds without assertions enabled, really).
> We'd only verify that the optimized case agreed with the slow path in
> assert-enabled builds. It might also make sense to always "apply the
> optimization" on assert-enabled builds, even for the first page seen
> by _bt_readpage by any _bt_first-wise scan. Maybe this sort of
> approach is impractical here for some reason, but I don't see why it
> should be.

Yes, this makes sense.  I've added an assert check that results are
the same as with requiredMatchedByPrecheck == false.

> Obviously, the optimization should lower the amount of work in some
> calls to _bt_checkkeys, without ever changing the answer _bt_checkkeys
> gives. Ideally, it should be done in a way that makes that very
> obvious. There are some very subtle interactions between _bt_checkkeys
> and other, distant code -- which makes me feel paranoid. Notably, with
> required equality strategy scan keys, we're crucially dependent on
> _bt_first using an equality strategy for its initial positioning call
> to _bt_search. This is described by comments in both _bt_checkkeys and
> in _bt_first.
>
> Note, in particular, that it is essential that the initial offnum
> passed to _bt_readpage doesn't allow a call to _bt_checkkeys to take
> place that could cause it to become confused by a required equality
> strategy scan key, leading to _bt_checkkeys terminating the whole scan
> "early" -- causing wrong answers. For a query "WHERE foo = 5" (and a
> forward scan), we had better not pass _bt_readpage an offset number
> for a tuple with "foo" value 4. If that is ever allowed then
> _bt_checkkeys will terminate the scan immediately, leading to wrong
> answers. All because _bt_checkkeys can't tell if 4 comes before 5 or
> comes after 5 -- it only has an "=" operator to work with, so it can't
> actually make this distinction, so it likes to assume that anything !=
> 5 must come after 5 (or before 5 during a backwards scan).
>
> I added a very similar _bt_compare()-based assertion in
> _bt_check_unique(), which went on to catch a very subtle bug in the
> Postgres 12 nbtree work -- the bug fixed by commit 74eb2176bf. So I
> have put this particular idea about asserting agreement between a fast
> path and a slow comparison path into practice already.

Good, thank you for the detailed clarification.

------
Regards,
Alexander Korotkov

Attachment

pgsql-hackers by date:

Previous
From: Pavel Borisov
Date:
Subject: Re: Index range search optimization
Next
From: Jacob Champion
Date:
Subject: Re: Row pattern recognition