Re: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Adding skip scan (including MDAM style range skip scan) to nbtree |
Date | |
Msg-id | CAEze2WhiSkqifvZqrJQNgqktejJMb_-7pTWQUETN1w8a=h_-Yg@mail.gmail.com Whole thread Raw |
In response to | Re: Adding skip scan (including MDAM style range skip scan) to nbtree (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Re: Adding skip scan (including MDAM style range skip scan) to nbtree |
List | pgsql-hackers |
On Tue, 11 Mar 2025 at 16:53, Peter Geoghegan <pg@bowt.ie> wrote: > > On Sat, Mar 8, 2025 at 11:43 AM Peter Geoghegan <pg@bowt.ie> wrote: > > I plan on committing this one soon. It's obviously pretty pointless to > > make the BTMaxItemSize operate off of a page header, and not requiring > > it is more flexible. > > Committed. And committed a revised version of "Show index search count > in EXPLAIN ANALYZE" that addresses the issues with non-parallel-aware > index scan executor nodes that run from a parallel worker. > > Attached is v28. This is just to keep the patch series applying > cleanly -- no real changes here. You asked off-list for my review of 0003. I'd already reviewed 0001 before that, so that review also included. I'll see if I can spend some time on the other patches too, but for 0003 I think I got some good consistent feedback. 0001: > src/backend/access/nbtree/nbtsearch.c > _bt_readpage This hasn't changed meaningfully in this patch, but I noticed that pstate.finaltup is never set for the final page of the scan direction (i.e. P_RIGHTMOST or P_LEFTMOST for forward or backward, respectively). If it isn't used more than once after the first element of non-P_RIGHTMOST/LEFTMOST pages, why is it in pstate? Or, if it is used more than once, why shouldn't it be used in Apart from that, 0001 looks good to me. 0003: > _bt_readpage In forward scan mode, recovery from forcenonrequired happens after the main loop over all page items. In backward mode, it's in the loop: > + if (offnum == minoff && pstate.forcenonrequired) > + { > + Assert(so->skipScan); I think there's a comment missing that details _why_ we do this; probably something like: /* * We're about to process the final item on the page. * Un-set forcenonrequired, so the next _bt_checkkeys will * evaluate required scankeys and signal an end to this * primitive scan if we've reached a stopping point. */ In line with that, could you explain a bit more about the pstate.forcenonrequired optimization? I _think_ it's got something to do with "required" scankeys adding some overhead per scankey, which can be significant with skipscan evaluations and ignoring the requiredness can thus save some cycles, but the exact method doesn't seem to be very well articulated. > _bt_skip_ikeyprefix I _think_ it's worth special-casing firstchangingattnum=1, as in that case we know in advance there is no (immediate) common ground between the index tuples and thus any additional work we do towards parsing the scankeys would be wasted - except for matching inequality bounds for firstchangingatt, or matching "open" skip arrays for a prefix of attributes starting at firstchangingattnum (as per the array->null_elem case). I also notice somed some other missed opportunities for optimizing page accesses: > + if (key->sk_strategy != BTEqualStrategyNumber) The code halts optimizing "prefix prechecks" when we notice a non-equality key. It seems to me that we can do the precheck on shared prefixes with non-equality keys just the same as with equality keys; and it'd improve performance in those cases, too. > + if (!(key->sk_flags & SK_SEARCHARRAY)) > + if (key->sk_attno < firstchangingattnum) > + { > + if (result == 0) > + continue; /* safe, = key satisfied by every tuple */ > + } > + break; /* pstate.ikey to be set to scalar key's ikey */ This code finds out that no tuple on the page can possibly match the scankey (idxtup=scalar returns non-0 value) but doesn't (can't) use it to exit the scan. I think that's a missed opportunity for optimization; now we have to figure that out for every tuple in the scan. Same applies to the SAOP -array case (i.e. non-skiparray). Thank you for working on this. Kind regards, Matthias van de Meent Neon (https://neon.tech)
pgsql-hackers by date: