Re: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Adding skip scan (including MDAM style range skip scan) to nbtree |
Date | |
Msg-id | aa55adf3-6466-4324-92e6-5ef54e7c3918@iki.fi Whole thread Raw |
In response to | Adding skip scan (including MDAM style range skip scan) to nbtree (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Adding skip scan (including MDAM style range skip scan) to nbtree
|
List | pgsql-hackers |
On 03/01/2025 21:43, Peter Geoghegan wrote: > The newly revised "skipskip" optimization seems to get the regressions > down to only a 5% - 10% increase in runtime across a wide variety of > unsympathetic cases -- I'm now validating performance against a test > suite based on the adversarial cases presented by Masahiro Ikeda on > this thread. Although I think that I'll end up tuning the "skipskip" > mechanism some more (I may have been too conservative in marginal > cases that actually do benefit from skipping), I deem these > regressions to be acceptable. They're only seen in the most > unsympathetic cases, where an omitted leading column has groupings of > no more than about 50 index tuples, making skipping pretty hopeless. On my laptop, this is the worst case I could come up with: create table skiptest as select g / 10 as a, g%10 as b from generate_series(1, 10000000) g; vacuum freeze skiptest; create index on skiptest (a, b); set enable_seqscan=off; set max_parallel_workers_per_gather=0; \timing on After repeating a few times, to warm the cache: postgres=# select count(*) from skiptest where b=1; count --------- 1000000 (1 row) Time: 202.501 ms And after 'set skipscan_prefix_cols=0': select count(*) from skiptest where b=1; count --------- 1000000 (1 row) Time: 164.762 ms EXPLAIN ANALYZE confirms that it uses an Index Only scan in both cases. > I knew from the outset that the hardest part of this project would be > avoiding regressions in highly unsympathetic cases. The regressions > that are still there seem very difficult to minimize any further; the > overhead that remains comes from the simple need to maintain the > scan's skip arrays once per page, before leaving the page. Once a scan > decides to apply the "skipskip" optimization, it tends to stick with > it for all future leaf pages -- leaving only the overhead of checking > the high key while advancing the scan's arrays. I've cut just about > all that that I can reasonably cut from the hot code paths that are at > issue with the regressed cases. Hmm, looking at the code and profile with perf, a lot of code is executed per tuple. Just function calls, passing arguments, checking flags etc. I suspect you could shave off some cycles by structuring the code in a more compiler and branch-prediction friendly way. Or just with more aggressive inlining. Not sure what exactly to suggest, but the code of _bt_readpage() and all the subroutines it calls is complicated. Aside from the performance of those routines, it doesn't feel very intuitive. _bt_checkkeys() not only checks the current keys, but it can also read ahead tuples on the same page and update the array keys. One little thing I noticed by stepping with debugger is that it calls index_getattr() twice for the same tuple and attribute. First in _bt_check_compare(), and if that sets *continuescan=false, again in _bt_tuple_before_array_skeys(). The first index_getattr() call is certainly pretty expensive because that's where you get the cache miss on the tuple when scanning. Not sure if the second call matters much, but it feels like a bad smell. The comment on _bt_tuple_before_array_skeys() says: > * readpagetup callers must only call here when _bt_check_compare already set > * continuescan=false. We help these callers deal with _bt_check_compare's > * inability to distinguishing between the < and > cases (it uses equality > * operator scan keys, whereas we use 3-way ORDER procs) That begs the question, why does _bt_check_compare() not call the 3-way ORDER proc in the first place? That would avoid the overhead of another call here. Aside from these micro-optimizations, I wonder about the "look-ahead" logic in _bt_checkkeys_look_ahead. It feels pretty simplistic. Could you use Exponential Search (https://en.wikipedia.org/wiki/Exponential_search) instead of a plain binary search on the page? -- Heikki Linnakangas Neon (https://neon.tech)
pgsql-hackers by date: