Re: Possible improvement - Mailing list pgsql-general
From | Tom Lane |
---|---|
Subject | Re: Possible improvement |
Date | |
Msg-id | 1470860.1591367497@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Possible improvement (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Possible improvement
|
List | pgsql-general |
David Rowley <dgrowleyml@gmail.com> writes: > On Fri, 5 Jun 2020 at 14:41, Paul van der Linden > <paul.doskabouter@gmail.com> wrote: >> If I have a query like: >> >> SELECT * FROM ( >> SELECT >> CASE >> WHEN field='value1' THEN 1 >> WHEN field='value2' THEN 2 >> END AS category >> FROM table1 >> ) AS foo >> WHERE category=1 >> >> doesn't use the index on field, while technically it could do that. >> Is it hard to implement drilling down the constant in the WHERE to within the CASE? > It doesn't look impossible to improve that particular case. See > eval_const_expressions_mutator() in clauses.c at T_CaseExpr. However, > this would need to take constant folding further than we take it > today. Today we just have the ability to simplify expressions which > are, by themselves, an expression which will always evaluate to a > constant value. This case is more complex as it requires something > outside the CASE expr to allow the simplification to take place. In > this case, we'd need to look at the other side of the OpExpr to see > the const there before any transformation could simplify it. I'd tend to see this as a transformation rule that acts on equality- with-a-CASE-input, thereby avoiding the "action at a distance" problem. > It's > also not entirely clear that the simplification would always be a good > idea. What, for example if there was an index on the case statement > but none on "field". The query may perform worse! FWIW, I'm not too fussed about that objection. If we rejected new optimizations on the basis that somebody's optimized-for-the-old-way query might perform worse, almost no planner changes would ever get in. I think most people would feel that an optimization like this is an improvement. (I recall coming across a similar case in an information_schema query just a few days ago.) The hard questions I would ask are 1. Is the transformation actually correct? 2. Does it improve queries often enough to be worth the planning cycles expended to look for the optimization? As far as #1 goes, note that this CASE produces NULL if "field" is neither 'value1' nor 'value2', whereupon the equality operator would also produce NULL, so that simplifying to "field='value1'" is not formally correct: that would produce FALSE not NULL for other values of "field". We can get away with the replacement anyway at the top level of WHERE, but not in other contexts. Hence, it'd be wrong to try to make this transformation in eval_const_expressions(), which is applied to all expressions. Possibly prepqual.c's canonicalize_qual() would be a better place. The real problem here is going to be objection #2. The rules under which any optimization could be applied are nontrivial, so that we'd spend quite a bit of time trying to figure out whether the optimization applies ... and I'm afraid that most of the time it would not. regards, tom lane
pgsql-general by date: