Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |
Date | |
Msg-id | CAH2-WzkXJnHOuaonxm4xGPXU4n8D6VS2kzdmSy929nmiYYXBeA@mail.gmail.com Whole thread Raw |
In response to | Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
|
List | pgsql-hackers |
On Mon, Nov 20, 2023 at 6:52 PM Peter Geoghegan <pg@bowt.ie> wrote: > It should be noted that the patch isn't strictly guaranteed to always > read fewer index pages than master, for a given query plan and index. > This is by design. Though the patch comes close, it's not quite a > certainty. There are known cases where the patch reads the occasional > extra page (relative to what master would have done under the same > circumstances). These are cases where the implementation just cannot > know for sure whether the next/sibling leaf page has key space covered > by any of the scan's array keys (at least not in a way that seems > practical). The implementation has simple heuristics that infer (a > polite word for "make an educated guess") about what will be found on > the next page. Theoretically we could be more conservative in how we > go about this, but that seems like a bad idea to me. It's really easy > to find cases where the maximally conservative approach loses by a > lot, and really hard to show cases where it wins at all. Attached is v8, which pretty much rips all of this stuff out. I definitely had a point when I said that it made sense to be optimistic about finding matches on the next page in respect of any truncated -inf attributes in high keys, though. And so we still do that much in v8. But, there is no reason why we need to go any further than that -- there is no reason why we should *also* be optimistic about *untruncated* high key/finaltup attributes that *aren't* exact matches for any of the scan's required array keys finding exact matches once we move onto the next sibling page. I reached this conclusion when working on a fix for the low cardinality index regression that Tomas' tests brought to my attention [1]. I started out with the intention of just fixing that one case, in a very targeted way, but quickly realized that it made almost no sense to just limit myself to the low cardinality cases. Even with Tomas' problematic low cardinality test cases, I saw untruncated high key attributes that were "close by to matching tuples" -- they just weren't close enough (i.e. exactly on the next leaf page) for us to actually win (so v7 lost). Being almost correct again and again, but still losing again and again is a good sign that certain basic assumptions were faulty (at least if it's realistic, which it was in this instance). To be clear, and to repeat, even in v8 we'll still "make guesses" about -inf truncated attributes. But it's a much more limited form of guessing. If we didn't at least do the -inf thing, then backwards scans would weirdly work better than forward scans in some cases -- a particular concern with queries that have index quals for each of multiple columns. I don't think that this remaining "speculative" behavior needs to be discussed at very much length in code comments, though. That's why v8 is a great deal simpler than v7 was here. No more huge comment block at the end of the new _bt_checkkeys. Notable stuff that *hasn't* changed from v7: I'm posting this v8 having not yet worked through all of Heikki's feedback. In particular, v8 doesn't deal with the relatively hard question of what to do about the optimizations added by Alexander Korotkov's commit e0b1ee17 (should I keep them disabled, selectively re-enable one or both optimizations, or something else?). This is partly due to at least one of the optimizations having problems of their own that are still outstanding [2]. I also wanted to get a revision out before travelling to Prague for PGConf.EU, which will be followed by other Christmas travel. That's likely to keep me away from the patch for weeks (that, plus I'm moving to NYC in early January). So I just ran out of time to do absolutely everything. Other notable changes: * Bug fixes for cases where the array state machine gets confused by a change in the scan direction, plus similar cases involving mark and restore processing. I'm not entirely happy with my approach here (mostly referring to the new code in _bt_steppage here). Feels like it needs to be a little better integrated with mark/restore processing. No doubt Heikki will have his own doubts about this. I've included my test cases for the issues in this area. The problems are really hard to reproduce, and writing these tests took a surprisingly large amount of effort. The tests might not be suitable for commit, but you really need to see the test cases to be able to review the code efficiently. It's just fiddly. * I've managed to make the array state machine just a little more streamlined compared to v7. Minor code polishing, not really worth describing in detail. * Addressed concerns about incomplete opfamilies not working with the patch by updating the error message within _bt_preprocess_array_keys/_bt_sort_array_cmp_setup. It now exactly matches the similar one in _bt_first. I don't think that we need any new opr_sanity tests, since we already have one for this. Making the error message match the one in _bt_first ensures that anybody that runs into a problem here will see the same error message that they'd have seen on earlier versions, anyway. It's a more useful error compared to the one from v7 (in that it names the index and its attribute directly). Plus it's good to be consistent. I don't see any potential for the underlying _bt_sort_array_cmp_setup behavior to be seen as a regression, in terms of our ability to cope with incomplete opfamilies (compared to earlier Postgres versions). Opfamilies that lack a cross-type ORDER proc mixed with queries that use the corresponding cross-type = operator were always very dicey. That situation isn't meaningfully different with the patch. (Actually, this isn't 100% true in the case of queries + indexes with non-required arrays specifically -- they'll need a 3-way comparator in _bt_preprocess_array_keys/_bt_sort_array_cmp_setup, and yet *won't* need one moments later, in _bt_first. This is because non-required scan keys don't end up in _bt_first's insertion scan key at all, in general. This distinction just seems pedantic, though. We're talking about a case where things accidentally failed to fail in previous versions, for some queries but not most queries. Now it'll fail in exactly the same way in slightly more cases. In reality, affected opfamilies are practically non-existent, so this is a hypothetical upon a hypothetical.) [1] https://postgr.es/m/CAH2-WzmtV7XEWxf_rP1pw=vyDjGLi__zGOy6Me5MovR3e1kfdg@mail.gmail.com [2] https://postgr.es/m/CAH2-Wzn0LeLcb1PdBnK0xisz8NpHkxRrMr3NWJ+KOK-WZ+QtTQ@mail.gmail.com -- Peter Geoghegan
Attachment
pgsql-hackers by date: