Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | CAH2-Wzkrg_nnh-xvnFZhNKEMJf0hZ0putQaqWhQrezEnb8b+XA@mail.gmail.com Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: POC, WIP: OR-clause support for indexes
Re: POC, WIP: OR-clause support for indexes |
List | pgsql-hackers |
On Thu, Oct 26, 2023 at 12:59 PM Robert Haas <robertmhaas@gmail.com> wrote: > Alexander's example seems to show that it's not that simple. If I'm > reading his example correctly, with things like aid = 1, the > transformation usually wins even if the number of things in the OR > expression is large, but with things like aid + 1 * bid = 1, the > transformation seems to lose at least with larger numbers of items. So > it's not JUST the number of OR elements but also what they contain, > unless I'm misunderstanding his point. Alexander said "Generally, I don't see why ANY could be executed slower than the equivalent OR clause". I understood that this was his way of expressing the following idea: "In principle, there is no reason to expect execution of ANY() to be slower than execution of an equivalent OR clause (except for noise-level differences). While it might not actually look that way for every single type of plan you can imagine right now, that doesn't argue for making a cost-based decision. It actually argues for fixing the underlying issue, which can't possibly be due to some kind of fundamental advantage enjoyed by expression evaluation with ORs". This is also what I think of all this. Alexander's partial index example had this quality to it. Obviously, the planner *could* be taught to do the right thing with such a case, with a little more work. The fact that it doesn't right now is definitely a problem, and should probably be treated as a blocker for this patch. But that doesn't really argue against the general idea behind the patch -- it just argues for fixing that one problem. There may also be a separate problem that comes from the added planner cycles required to do the transformation -- particularly in extreme or adversarial cases. We should worry about that, too. But, again, it doesn't change the basic fact, which is that having a standard/normalized representation of OR lists/DNF transformation is extremely useful in general, and *shouldn't* result in any real slowdowns at execution time if done well. > To me, this sort of output suggests that perhaps the transformation is > being done in the wrong place. I expect that we have to decide whether > to convert from OR to = ANY(...) at a very early stage of the planner, > before we have any idea what the selected path will ultimately be. But > this output suggests that we want the answer to depend on which kind > of path is going to be faster, which would seem to argue for doing > this sort of transformation as part of path generation for only those > paths that will benefit from it, rather than during earlier phases of > expression processing/simplification. I don't think that that's the right direction. They're semantically equivalent things. But a SAOP-based plan can be fundamentally better, since SAOPs enable passing down useful context to index AMs (at least nbtree). And because we can use a hash table for SAOP expression evaluation. It's a higher level, standardized, well optimized way of expressing exactly the same concept. I can come up with a case that'll be orders of magnitude more efficient with this patch, despite the transformation process only affecting a small OR list of 3 or 5 elements -- a 100x reduction in heap page accesses is quite possible. This is particularly likely to come up if you assume that the nbtree patch that I'm currently working on is also available. In general, I think that we totally over-rely on bitmap index scans, especially BitmapOrs. -- Peter Geoghegan
pgsql-hackers by date: