Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |
Date | |
Msg-id | CAEze2WiR5eJgVqDHCEaO0=mDiUyXY_o-8aK-PTKVPCKba_E4Vg@mail.gmail.com Whole thread Raw |
In response to | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
|
List | pgsql-hackers |
On Wed, 8 Nov 2023 at 02:53, Peter Geoghegan <pg@bowt.ie> wrote: > > On Tue, Nov 7, 2023 at 4:20 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > On Tue, 7 Nov 2023 at 00:03, Peter Geoghegan <pg@bowt.ie> wrote: > > > I should be able to post v6 later this week. My current plan is to > > > commit the other nbtree patch first (the backwards scan "boundary > > > cases" one from the ongoing CF) -- since I saw your review earlier > > > today. I think that you should probably wait for this v6 before > > > starting your review. > > > > Okay, thanks for the update, then I'll wait for v6 to be posted. > > On second thought, I'll just post v6 now (there won't be conflicts > against the master branch once the other patch is committed anyway). Thanks. Here's my review of the btree-related code: > +++ b/src/backend/access/nbtree/nbtsearch.c > @@ -1625,8 +1633,9 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum) > * set flag to true if all required keys are satisfied and false > * otherwise. > */ > - (void) _bt_checkkeys(scan, itup, indnatts, dir, > - &requiredMatchedByPrecheck, false); > + _bt_checkkeys(scan, &pstate, itup, false, false); > + requiredMatchedByPrecheck = pstate.continuescan; > + pstate.continuescan = true; /* reset */ The comment above the updated section needs to be updated. > @@ -1625,8 +1633,9 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum) > * set flag to true if all required keys are satisfied and false > * otherwise. > */ > - (void) _bt_checkkeys(scan, itup, indnatts, dir, > - &requiredMatchedByPrecheck, false); > + _bt_checkkeys(scan, &pstate, itup, false, false); This 'false' finaltup argument is surely wrong for the rightmost page's rightmost tuple, no? > +++ b/src/backend/access/nbtree/nbtutils.c > @@ -357,6 +431,46 @@ _bt_preprocess_array_keys(IndexScanDesc scan) > + /* We could pfree(elem_values) after, but not worth the cycles */ > + num_elems = _bt_merge_arrays(scan, cur, > + (indoption[cur->sk_attno - 1] & INDOPTION_DESC) != 0, > + prev->elem_values, prev->num_elems, > + elem_values, num_elems); This code can get hit several times when there are multiple = ANY clauses, which may result in repeated leakage of these arrays during this scan. I think cleaning up may well be worth the cycles when the total size of the arrays is large enough. > @@ -496,6 +627,48 @@ _bt_sort_array_elements(IndexScanDesc scan, ScanKey skey, > _bt_compare_array_elements, &cxt); > +_bt_merge_arrays(IndexScanDesc scan, ScanKey skey, bool reverse, > + Datum *elems_orig, int nelems_orig, > + Datum *elems_next, int nelems_next) > [...] > + /* > + * Incrementally copy the original array into a temp buffer, skipping over > + * any items that are missing from the "next" array > + */ Given that we only keep the members that both arrays have in common, the result array will be a strict subset of the original array. So, I don't quite see why we need the temporary buffer here - we can reuse the entries of the elems_orig array that we've already compared against the elems_next array. We may want to optimize this further by iterating over only the smallest array: With the current code, [1, 2] + [1....1000] is faster to merge than [1..1000] + [1000, 1001], because 2 * log(1000) is much smaller than 1000*log(2). In practice this may matter very little, though. An even better optimized version would do a merge join on the two arrays, rather than loop + binary search. > @@ -515,6 +688,161 @@ _bt_compare_array_elements(const void *a, const void *b, void *arg) > [...] > +_bt_binsrch_array_skey(FmgrInfo *orderproc, Is there a reason for this complex initialization of high/low_elem, rather than the this easier to understand and more compact initialization?: + low_elem = 0; + high_elem = array->num_elems - 1; + if (cur_elem_start) + { + if (ScanDirectionIsForward(dir)) + low_elem = array->cur_elem; + else + high_elem = array->cur_elem; + } > @@ -661,20 +1008,691 @@ _bt_restore_array_keys(IndexScanDesc scan) > [...] > + _bt_array_keys_remain(IndexScanDesc scan, ScanDirection dir) > [...] > + if (scan->parallel_scan != NULL) > + _bt_parallel_done(scan); > + > + /* > + * No more primitive index scans. Terminate the top-level scan. > + */ > + return false; I think the conditional _bt_parallel_done(scan) feels misplaced here, as the comment immediately below indicates the scan is to be terminated after that comment. So, either move this _bt_parallel_done call outside the function (which by name would imply it is read-only, without side effects like this) or move it below the comment "terminate the top-level scan". > +_bt_advance_array_keys(IndexScanDesc scan, BTReadPageState *pstate, > [...] > + * Set up ORDER 3-way comparison function and array state > [...] > + * Optimization: Skip over non-required scan keys when we know that These two sections should probably be swapped, as the skip makes the setup useless. Also, the comment here is wrong; the scan keys that are skipped are 'required', not 'non-required'. > +++ b/src/test/regress/expected/join.out > @@ -8620,10 +8620,9 @@ where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1 and j2.id1 >= any (array[1,5]); > Merge Cond: (j1.id1 = j2.id1) > Join Filter: (j2.id2 = j1.id2) > -> Index Scan using j1_id1_idx on j1 > - -> Index Only Scan using j2_pkey on j2 > + -> Index Scan using j2_id1_idx on j2 > Index Cond: (id1 >= ANY ('{1,5}'::integer[])) > - Filter: ((id1 % 1000) = 1) > -(7 rows) > +(6 rows) I'm a bit surprised that we don't have the `id1 % 1000 = 1` filter anymore. The output otherwise matches (quite possibly because the other join conditions don't match) and I don't have time to investigate the intricacies between IOS vs normal IS, but this feels off. ---- As for the planner changes, I don't think I'm familiar enough with the planner to make any authorative comments on this. However, it does look like you've changed the meaning of 'amsearcharray', and I'm not sure it's OK to assume all indexes that support amsearcharray will also support for this new assumption of ordered retrieval of SAOPs. For one, the pgroonga extension [0] does mark amcanorder+amsearcharray. Kind regards, Matthias van de Meent Neon (https://neon.tech) [0] https://github.com/pgroonga/pgroonga/blob/115414723c7eb8ce9eb667da98e008bd10fbae0a/src/pgroonga.c#L8782-L8788
pgsql-hackers by date: