Re: Index range search optimization - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Index range search optimization |
Date | |
Msg-id | CAH2-WzkE-V_Ap5w42hx3X=DRk_L9DgUPhNWC6APYDdPQ1KODYg@mail.gmail.com Whole thread Raw |
In response to | Re: Index range search optimization (Pavel Borisov <pashkin.elfe@gmail.com>) |
Responses |
Re: Index range search optimization
Re: Index range search optimization |
List | pgsql-hackers |
On Thu, Sep 21, 2023 at 5:11 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > I looked at the patch code and I agree with this optimization. > Implementation also looks good to me except change : > + if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) && > + !(key->sk_flags & SK_ROW_HEADER)) > + requiredDir = true; > ... > - if ((key->sk_flags & SK_BT_REQFWD) && > - ScanDirectionIsForward(dir)) > - *continuescan = false; > - else if ((key->sk_flags & SK_BT_REQBKWD) && > - ScanDirectionIsBackward(dir)) > + if (requiredDir) > *continuescan = false; > > looks like changing behavior in the case when key->sk_flags & > SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) && > (!requiredDirMatched) > Originally it doesn't set *continuescan = false; and with the patch it will set. I agree that this is a problem. Inequality strategy scan keys are used when the initial positioning strategy used by _bt_first (for its _bt_search call) is based on an operator other than the "=" operator for the opclass. These scan keys are required in one direction only (Konstantin's original patch just focussed on these cases, actually). Obviously, that difference matters. I don't think that this patch should do anything that even looks like it might be revising the formal definition of "required in the current scan direction". Why is SK_ROW_HEADER treated as a special case by the patch? Could it be related to the issues with required-ness and scan direction? Note that we never use BTEqualStrategyNumber for SK_ROW_HEADER scan key row comparisons, so they're only ever required for one scan direction. (Equality-type row constructor syntax can of course be used without preventing the system from using an index scan, but the nbtree code will not see that case as a row comparison in the first place. This is due to preprocessing by the planner -- nbtree just sees conventional scan keys with multiple simple equality scan keys with = row comparisons.) Also, what about NULLs? While "key IS NULL" is classified as an equality check (see _bt_preprocess_keys comments), the same isn't true with "key IS NOT NULL". The latter case usually has scan key flags "SK_ISNULL | SK_SEARCHNOTNULL | SK_BT_REQFWD" -- there is no SK_BT_REQBKWD here. > This may be relevant for the first page when requiredDirMatched is > intentionally skipped to be set and for call > _bt_checkkeys(scan, itup, truncatt, dir, &continuescan, false); Also, requiredDirMatched isn't initialized by _bt_readpage() when "so->firstPage". Shouldn't it be initialized to false? Also, don't we need to take more care with a fully empty page? The "if (!so->firstPage) ... " block should be gated using a condition such as "if (!so->firstPage && minoff < maxoff)". (Adding a "minoff <= maxoff" test would also work, but then the optimization will get applied on pages with only one non-pivot tuple. That would be harmless, but a waste of cycles.) > Also naming of requiredDirMatched and requiredDir seems semantically > hard to understand the meaning without looking at the patch commit > message. But I don't have better proposals yet, so maybe it's > acceptable. I agree. How about "requiredMatchedByPrecheck" instead of "requiredDirMatched", and "required" instead of "requiredDir"? It would be nice if this patch worked in a way that could be verified by an assertion. Under this scheme, the optimization would only really be used in release builds (builds without assertions enabled, really). We'd only verify that the optimized case agreed with the slow path in assert-enabled builds. It might also make sense to always "apply the optimization" on assert-enabled builds, even for the first page seen by _bt_readpage by any _bt_first-wise scan. Maybe this sort of approach is impractical here for some reason, but I don't see why it should be. Obviously, the optimization should lower the amount of work in some calls to _bt_checkkeys, without ever changing the answer _bt_checkkeys gives. Ideally, it should be done in a way that makes that very obvious. There are some very subtle interactions between _bt_checkkeys and other, distant code -- which makes me feel paranoid. Notably, with required equality strategy scan keys, we're crucially dependent on _bt_first using an equality strategy for its initial positioning call to _bt_search. This is described by comments in both _bt_checkkeys and in _bt_first. Note, in particular, that it is essential that the initial offnum passed to _bt_readpage doesn't allow a call to _bt_checkkeys to take place that could cause it to become confused by a required equality strategy scan key, leading to _bt_checkkeys terminating the whole scan "early" -- causing wrong answers. For a query "WHERE foo = 5" (and a forward scan), we had better not pass _bt_readpage an offset number for a tuple with "foo" value 4. If that is ever allowed then _bt_checkkeys will terminate the scan immediately, leading to wrong answers. All because _bt_checkkeys can't tell if 4 comes before 5 or comes after 5 -- it only has an "=" operator to work with, so it can't actually make this distinction, so it likes to assume that anything != 5 must come after 5 (or before 5 during a backwards scan). I added a very similar _bt_compare()-based assertion in _bt_check_unique(), which went on to catch a very subtle bug in the Postgres 12 nbtree work -- the bug fixed by commit 74eb2176bf. So I have put this particular idea about asserting agreement between a fast path and a slow comparison path into practice already. -- Peter Geoghegan
pgsql-hackers by date: