Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | CAH2-WznKiUwPtzLOVa7SQVSfurttZ+ULSRqhyseG_XbkYeyiag@mail.gmail.com Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: POC, WIP: OR-clause support for indexes
Re: POC, WIP: OR-clause support for indexes |
List | pgsql-hackers |
On Mon, Nov 27, 2023 at 1:04 PM Robert Haas <robertmhaas@gmail.com> wrote: > The use of op_mergejoinable() seems pretty random to me. Why should we > care about that? If somebody writes a<1 or a<2 or a<3 or a<4, you can > transform that to a<any(array[1,2,3,4]) if you want. It might not be a > good idea, but I think it's a legal transformation. That kind of transformation is likely to be a very good idea, because nbtree's _bt_preprocess_array_keys() function knows how to perform preprocessing that makes the final index qual "a < 1". Obviously that could be far more efficient. Further suppose you have a machine generated query "a<1 or a<2 or a<3 or a<4 AND a = 2" -- same as before, except that I added "AND a = 2" to the end. Now _bt_preprocess_array_keys() will be able to do the aforementioned inequality preprocessing, just as before. But this time _bt_preprocess_keys() (a different function with a similar name) can see that the quals are contradictory. That makes the entire index scan end, before it ever really began. Obviously, this is an example of a more general principle: a great deal of the benefit of these transformations is indirect, in the sense that they come from enabling further transformations/optimizations, that apply in the context of some particular query. So you have to think holistically. It's perhaps a little unfortunate that all of this nbtree preprocessing stuff is totally opaque to the planner. Tom has expressed concerns about that in the past, FWIW: https://www.postgresql.org/message-id/2587523.1647982549@sss.pgh.pa.us (see the final paragraph for the reference) There might be some bigger picture to doing all of these transformations, in a way that maximizes opportunities to apply further transformations/optimizations. You know much more about the optimizer than I do, so maybe this is already very obvious. Just pointing it out. > Honestly, it seems very hard to avoid the conclusion that this > transformation is being done at too early a stage. Parse analysis is > not the time to try to do query optimization. I can't really believe > that there's a way to produce a committable patch along these lines. > Ideally, a transformation like this should be done after we know what > plan shape we're using (or considering using), so that we can make > cost-based decisions about whether to transform or not. But at the > very least it should happen somewhere in the planner. There's really > no justification for parse analysis rewriting the SQL that the user > entered. I am sure that there is a great deal of truth to this. The general conclusion about parse analysis being the wrong place for this seems very hard to argue with. But I'm much less sure that there needs to be a conventional cost model. The planner's cost model is supposed to have some basis in physical runtime costs, which is not the case for any of these transformations. Not in any general sense; they're just transformations that enable finding a cheaper way to execute the query. While they have to pay for themselves, in some sense, I think that that's purely a matter of managing the added planner cycles. In principle they shouldn't have any direct impact on the physical costs incurred by physical operators. No? As I keep pointing out, there is a sound theoretical basis to the idea of normalizing to conjunctive normal form as its own standard step in query processing. To some extent we do this already, but it's all rather ad-hoc. Even if (say) the nbtree preprocessing transformations that I described were something that the planner already knew about directly, they still wouldn't really need to be costed. They're pretty much strictly better at runtime (at most you only have to worry about the fixed cost of determining if they apply at all). -- Peter Geoghegan
pgsql-hackers by date: