Re: Index range search optimization - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: Index range search optimization |
Date | |
Msg-id | CAPpHfdt9JNftLfB1zfSB-kj6BtL5yxSnztSUyQ76DzOoDKzPPw@mail.gmail.com Whole thread Raw |
In response to | Re: Index range search optimization (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Index range search optimization
Re: Index range search optimization |
List | pgsql-hackers |
Hi Peter, Hi Pavel, The v4 of the patch is attached. On Thu, Sep 21, 2023 at 11:48 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Sep 21, 2023 at 5:11 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > > I looked at the patch code and I agree with this optimization. > > Implementation also looks good to me except change : > > + if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD) && > > + !(key->sk_flags & SK_ROW_HEADER)) > > + requiredDir = true; > > ... > > - if ((key->sk_flags & SK_BT_REQFWD) && > > - ScanDirectionIsForward(dir)) > > - *continuescan = false; > > - else if ((key->sk_flags & SK_BT_REQBKWD) && > > - ScanDirectionIsBackward(dir)) > > + if (requiredDir) > > *continuescan = false; > > > > looks like changing behavior in the case when key->sk_flags & > > SK_BT_REQFWD && (! ScanDirectionIsForward(dir)) && > > (!requiredDirMatched) > > Originally it doesn't set *continuescan = false; and with the patch it will set. > > I agree that this is a problem. Inequality strategy scan keys are used > when the initial positioning strategy used by _bt_first (for its > _bt_search call) is based on an operator other than the "=" operator > for the opclass. These scan keys are required in one direction only > (Konstantin's original patch just focussed on these cases, actually). > Obviously, that difference matters. I don't think that this patch > should do anything that even looks like it might be revising the > formal definition of "required in the current scan direction". Sorry, that was messed up from various attempts to write the patch. Actually, I end up with two boolean variables indicating whether the current key is required for the same direction or opposite direction scan. I believe that the key required for the opposite direction scan should be already satisfied by _bt_first() except for NULLs case. I've implemented a skip of calling the key function for this case (with assert that result is the same). > Why is SK_ROW_HEADER treated as a special case by the patch? Could it > be related to the issues with required-ness and scan direction? Note > that we never use BTEqualStrategyNumber for SK_ROW_HEADER scan key row > comparisons, so they're only ever required for one scan direction. > (Equality-type row constructor syntax can of course be used without > preventing the system from using an index scan, but the nbtree code > will not see that case as a row comparison in the first place. This is > due to preprocessing by the planner -- nbtree just sees conventional > scan keys with multiple simple equality scan keys with = row > comparisons.) The thing is that NULLs could appear in the middle of matching values. # WITH t (a, b) AS (VALUES ('a', 'b'), ('a', NULL), ('b', 'a')) SELECT a, b, (a, b) > ('a', 'a') FROM t ORDER BY (a, b); a | b | ?column? ---+------+---------- a | b | t a | NULL | NULL b | a | t (3 rows) So we can't just skip the row comparison operator, because we can meet NULL at any place. > > This may be relevant for the first page when requiredDirMatched is > > intentionally skipped to be set and for call > > _bt_checkkeys(scan, itup, truncatt, dir, &continuescan, false); > > Also, requiredDirMatched isn't initialized by _bt_readpage() when > "so->firstPage". Shouldn't it be initialized to false? > > Also, don't we need to take more care with a fully empty page? The "if > (!so->firstPage) ... " block should be gated using a condition such as > "if (!so->firstPage && minoff < maxoff)". (Adding a "minoff <= maxoff" > test would also work, but then the optimization will get applied on > pages with only one non-pivot tuple. That would be harmless, but a > waste of cycles.) This makes sense. I've added (minoff < maxoff) to the condition. > > Also naming of requiredDirMatched and requiredDir seems semantically > > hard to understand the meaning without looking at the patch commit > > message. But I don't have better proposals yet, so maybe it's > > acceptable. > > I agree. How about "requiredMatchedByPrecheck" instead of > "requiredDirMatched", and "required" instead of "requiredDir"? > > It would be nice if this patch worked in a way that could be verified > by an assertion. Under this scheme, the optimization would only really > be used in release builds (builds without assertions enabled, really). > We'd only verify that the optimized case agreed with the slow path in > assert-enabled builds. It might also make sense to always "apply the > optimization" on assert-enabled builds, even for the first page seen > by _bt_readpage by any _bt_first-wise scan. Maybe this sort of > approach is impractical here for some reason, but I don't see why it > should be. Yes, this makes sense. I've added an assert check that results are the same as with requiredMatchedByPrecheck == false. > Obviously, the optimization should lower the amount of work in some > calls to _bt_checkkeys, without ever changing the answer _bt_checkkeys > gives. Ideally, it should be done in a way that makes that very > obvious. There are some very subtle interactions between _bt_checkkeys > and other, distant code -- which makes me feel paranoid. Notably, with > required equality strategy scan keys, we're crucially dependent on > _bt_first using an equality strategy for its initial positioning call > to _bt_search. This is described by comments in both _bt_checkkeys and > in _bt_first. > > Note, in particular, that it is essential that the initial offnum > passed to _bt_readpage doesn't allow a call to _bt_checkkeys to take > place that could cause it to become confused by a required equality > strategy scan key, leading to _bt_checkkeys terminating the whole scan > "early" -- causing wrong answers. For a query "WHERE foo = 5" (and a > forward scan), we had better not pass _bt_readpage an offset number > for a tuple with "foo" value 4. If that is ever allowed then > _bt_checkkeys will terminate the scan immediately, leading to wrong > answers. All because _bt_checkkeys can't tell if 4 comes before 5 or > comes after 5 -- it only has an "=" operator to work with, so it can't > actually make this distinction, so it likes to assume that anything != > 5 must come after 5 (or before 5 during a backwards scan). > > I added a very similar _bt_compare()-based assertion in > _bt_check_unique(), which went on to catch a very subtle bug in the > Postgres 12 nbtree work -- the bug fixed by commit 74eb2176bf. So I > have put this particular idea about asserting agreement between a fast > path and a slow comparison path into practice already. Good, thank you for the detailed clarification. ------ Regards, Alexander Korotkov
Attachment
pgsql-hackers by date: