Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |
Date | |
Msg-id | CAEze2Wj7tSTfTy5W0EKRQ-GReYRFaBTbPPnDxXJ8TZttoiZ1kA@mail.gmail.com Whole thread Raw |
In response to | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |
List | pgsql-hackers |
On Fri, 19 Jan 2024 at 23:42, Peter Geoghegan <pg@bowt.ie> wrote: > Thank you for your replies so far. > On Thu, Jan 18, 2024 at 11:39 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > I would agree with you if this was about new APIs and features, but > > here existing APIs are being repurposed without changing them. A > > maintainer of index AMs would not look at the commit title 'Enhance > > nbtree ScalarArrayOp execution' and think "oh, now I have to make sure > > my existing amsearcharray+amcanorder handling actually supports > > non-prefix arrays keys and return data in index order". > > This is getting ridiculous. > > It is quite likely that there are exactly zero affected out-of-core > index AMs. I don't count pgroonga as a counterexample (I don't think > that it actually fullfills the definition of a ). Basically, > "amcanorder" index AMs more or less promise to be compatible with > nbtree, down to having the same strategy numbers. So the idea that I'm > going to upset amsearcharray+amcanorder index AM authors is a > completely theoretical problem. The planner code evolved with nbtree, > hand-in-glove. And this is where I disagree with your (percieved) implicit argument that this should be and always stay this way. I don't mind changes in the planner for nbtree per se, but as I've mentioned before in other places, I really don't like how we handle amcanorder as if it were amisbtree. But it's not called that, so we shouldn't expect that to remain the case; and definitely not keep building on those expectations that it always is going to be the nbtree when amcanorder is set (or amsearcharray is set, or ..., or any combination of those that is currently used by btree). By keeping that expectation alive, this becomes a self-fulfilling prophecy, and I really don't like such restrictive self-fulfilling prophecies. It's nice that we have index AM feature flags, but if we only effectively allow one implementation for this set of flags by ignoring potential other users when changing behavior, then what is the API good for (apart from abstraction layering, which is nice)? /rant I'll see about the > I'm more than happy to add a reminder to the commit message about > adding a proforma listing to the compatibility section of the Postgres > 17 release notes. Can we actually agree on which theoretical index AM > types are affected first, though? Isn't it actually > amsearcharray+amcanorder+amcanmulticol external index AMs only? Do I > have that right? I think that may be right, but I could very well have built a btree-lite that doesn't have the historical baggage of having to deal with pages from before v12 (version 4) and some improvements that haven't made it to core yet. > BTW, how should we phrase this compatibility note, so that it can be > understood? It's rather awkward. Something like the following, maybe? """ Compatibility: The planner will now generate paths with array scan keys in any column for ordered indexes, rather than on only a prefix of the index columns. The planner still expects fully ordered data from those indexes. Historically, the btree AM couldn't output correctly ordered data for suffix array scan keys, which was "fixed" by prohibiting the planner from generating array scan keys without an equality prefix scan key up to that attribute. In this commit, the issue has been fixed, and the planner restriction has thus been removed as the only in-core IndexAM that reports amcanorder now supports array scan keys on any column regardless of what prefix scan keys it has. """ > > > As I said to Heikki, thinking about this some more is on my todo list. > > > I mean the way that this worked had substantial revisions on HEAD > > > right before I posted v9. v9 was only to fix the bit rot that that > > > caused. > > > > Then I'll be waiting for that TODO item to be resolved. > > My TODO item is to resolve the question of whether and to what extent > these two optimizations should be combined. It's possible that I'll > decide that it isn't worth it, for whatever reason. That's fine, this decision (and any work related to it) is exactly what I was referring to with that mention of the TODO. > > > Even single digit > > > thousand of array elements should be considered huge. Plus this code > > > path is only hit with poorly written queries. > > > > Would you accept suggestions? > > You mean that you want to have a go at it yourself? Sure, go ahead. > > I cannot promise that I'll accept your suggested revisions, but if > they aren't too invasive/complicated compared to what I have now, then > I will just accept them. I maintain that this isn't really necessary, > but it might be the path of least resistance at this point. Attached 2 patches: v11.patch-a and v11.patch-b. Both are incremental on top of your earlier set, and both don't allocate additional memory in the merge operation in non-assertion builds. patch-a is a trivial and clean implementation of mergesort, which tends to reduce the total number of compare operations if both arrays are of similar size and value range. It writes data directly back into the main array on non-assertion builds, and with assertions it reuses your binary search join on scratch space for algorithm validation. patch-b is an improved version of patch-a with reduced time complexity in some cases: It applies exponential search to reduce work done where there are large runs of unmatched values in either array, and does that while keeping O(n+m) worst-case complexity, now getting a best-case O(log(n)) complexity. > > > > I'm also no fan of the (tail) recursion. I would agree that this is > > > > unlikely to consume a lot of stack, but it does consume stackspace > > > > nonetheless, and I'd prefer if it was not done this way. > > > > > > I disagree. The amount of stack space it consumes in the worst case is fixed. > > > > Is it fixed? Without digging very deep into it, it looks like it can > > iterate on the order of n_scankeys deep? The comment above does hint > > on 2 iterations, but is not very clear about the conditions and why. > > The recursive call to _bt_advance_array_keys is needed to deal with > unsatisfied-and-required inequalities that were not detected by an > initial call to _bt_check_compare, following a second retry call to > _bt_check_compare. We know that this recursive call to > _bt_advance_array_keys won't cannot recurse again because we've > already identified which specific inequality scan key it is that's a > still-unsatisfied inequality, following an initial round of array key > advancement. So, it's a case of this? scankey: a > 1 AND a = ANY (1 (=current), 2, 3)) AND b < 4 AND b = ANY (2 (=current), 3, 4) tuple: a=2, b=4 We first match the 'a' inequality key, then find an array-key mismatch breaking out, then move the array keys forward to a=2 (of ANY (1,2,3)), b=4 (of ANY(2, 3, 4)); and now we have to recheck the later b < 4 inequality key, as that required inequality key wasn't checked because the earlier arraykey did not match in _bt_check_compare, causing it to stop at the a=1 condition as opposed to check the b < 4 condition? If so, then this visual explanation helped me understand the point and why it can't repeat more than once much better than all that text. Maybe this can be integrated? > This is a narrow edge-case -- it is an awkward one. Recursion does > seem natural here, because we're essentially repeating the call to > _bt_advance_array_keys from inside _bt_advance_array_keys, rather than > leaving it up to the usual external caller to do later on. If you > ripped this code out it would barely be noticeable at all. But it > seems worth it so that we can make a uniform promise to always advance > the array keys to the maximum extent possible, based on what the > caller's tuple tells us about the progress of the scan. Agreed on the awkwardness and recursion. > > > Because in one case we follow the convention for scan keys, whereas > > > so->orderProcs is accessed via an attribute number subscript. > > > > Okay, but how about this in _bt_merge_arrays? > > > > + Datum *elem = elems_orig + i; > > > > I'm not familiar with the scan key convention, as most other places > > use reference+subscripting. > > I meant the convention used in code like _bt_check_compare (which is > what we call _bt_checkkeys on HEAD, basically). > > Note that the _bt_merge_arrays code that you've highlighted isn't > iterating through so->keyData[] -- it is iterating through the > function caller's elements array, which actually come from > so->arrayKeys[]. It was exactly why I started asking about the use of pointer additions: _bt_merge_arrays is new code and I can't think of any other case of this style being used in new code, except maybe when surrounding code has this style. The reason I first highlighted _bt_update_keys_with_arraykeys is because it had a clear case of 2 different styles in the same function. > Like every other Postgres contributor, I do my best to follow the > conventions established by existing code. Sometimes that leads to > pretty awkward results, where CamelCase and underscore styles are > closely mixed together, because it works out to be the most consistent > way of doing it overall. I'm slightly surprised by that, as after this patch I can't find any code that wasn't touched by this patch in nbtutils that uses + for pointer offsets. Either way, that's the style for indexing into a ScanKey, apparently? Kind regards, Matthias van de Meent Neon (https://neon.tech)
Attachment
pgsql-hackers by date: