Thread: Possible improvement
Hi,
Don't know if this already came up earlier but I have an idea for improvement.
If I have a query like:
SELECT * FROM (
SELECT
CASE
WHEN field='value1' THEN 1
WHEN field='value2' THEN 2
END AS category
FROM table1
) AS foo
WHERE category=1
doesn't use the index on field, while technically it could do that.
Is it hard to implement drilling down the constant in the WHERE to within the CASE?
This is especially convenient with views (inner SELECT) where the category is some complex list of possibilities and you want to filter (outer SELECT) on specific categories
I know a different solution could be creating an index on that CASE but (especially if you're experimenting a bit) can be quite cumbersome to synchronize that with the actual query.
Is this something that could be put on some wishlist? If so where are the most looked at ones?
Paul
P.S. In replies please use reply to all...
On Fri, 5 Jun 2020 at 14:41, Paul van der Linden <paul.doskabouter@gmail.com> wrote: > If I have a query like: > > SELECT * FROM ( > SELECT > CASE > WHEN field='value1' THEN 1 > WHEN field='value2' THEN 2 > END AS category > FROM table1 > ) AS foo > WHERE category=1 > > doesn't use the index on field, while technically it could do that. > Is it hard to implement drilling down the constant in the WHERE to within the CASE? It doesn't look impossible to improve that particular case. See eval_const_expressions_mutator() in clauses.c at T_CaseExpr. However, this would need to take constant folding further than we take it today. Today we just have the ability to simplify expressions which are, by themselves, an expression which will always evaluate to a constant value. This case is more complex as it requires something outside the CASE expr to allow the simplification to take place. In this case, we'd need to look at the other side of the OpExpr to see the const there before any transformation could simplify it. It's also not entirely clear that the simplification would always be a good idea. What, for example if there was an index on the case statement but none on "field". The query may perform worse! The unfortunate part about this is that, generally, when we perform constant folding, we don't yet have an idea about which indexes exist. I imagine the only sane way to do it would be to allow expressions to have some sort of "alternative" expression that could be matched up to the index column instead. It wouldn't be a trivial piece of work to do that. For the more simple cases, you can see from looking at: postgres=# explain select * from pg_class where oid = (case when 'test' = 'test' then 1 else 0 end); QUERY PLAN ------------------------------------------------------------------------------------- Index Scan using pg_class_oid_index on pg_class (cost=0.27..8.29 rows=1 width=260) Index Cond: (oid = '1'::oid) (2 rows) that we do simplify case statements which are by themselves constant. > Is this something that could be put on some wishlist? If so where are the most looked at ones? There is a todo list of sorts in [1]. However, I'm really unsure if anyone ever looks at it for something to do. Mostly, people have their own ideas and problems to solve and spend their free cycles hacking away at those. You might have equal luck waiting until December and writing it on a piece of paper and setting it on fire. Likely there would be more chance if it was something simple as a novice who's looking into getting into working on Postgres might skim that list for something to work on. More experienced people, I imagine, would never look there. FWIW, many people who are now working on PostgreSQL once came along with a question or idea like yours. Many have been unable to escape ever since :) David [1] https://wiki.postgresql.org/wiki/Todo
David Rowley <dgrowleyml@gmail.com> writes: > On Fri, 5 Jun 2020 at 14:41, Paul van der Linden > <paul.doskabouter@gmail.com> wrote: >> If I have a query like: >> >> SELECT * FROM ( >> SELECT >> CASE >> WHEN field='value1' THEN 1 >> WHEN field='value2' THEN 2 >> END AS category >> FROM table1 >> ) AS foo >> WHERE category=1 >> >> doesn't use the index on field, while technically it could do that. >> Is it hard to implement drilling down the constant in the WHERE to within the CASE? > It doesn't look impossible to improve that particular case. See > eval_const_expressions_mutator() in clauses.c at T_CaseExpr. However, > this would need to take constant folding further than we take it > today. Today we just have the ability to simplify expressions which > are, by themselves, an expression which will always evaluate to a > constant value. This case is more complex as it requires something > outside the CASE expr to allow the simplification to take place. In > this case, we'd need to look at the other side of the OpExpr to see > the const there before any transformation could simplify it. I'd tend to see this as a transformation rule that acts on equality- with-a-CASE-input, thereby avoiding the "action at a distance" problem. > It's > also not entirely clear that the simplification would always be a good > idea. What, for example if there was an index on the case statement > but none on "field". The query may perform worse! FWIW, I'm not too fussed about that objection. If we rejected new optimizations on the basis that somebody's optimized-for-the-old-way query might perform worse, almost no planner changes would ever get in. I think most people would feel that an optimization like this is an improvement. (I recall coming across a similar case in an information_schema query just a few days ago.) The hard questions I would ask are 1. Is the transformation actually correct? 2. Does it improve queries often enough to be worth the planning cycles expended to look for the optimization? As far as #1 goes, note that this CASE produces NULL if "field" is neither 'value1' nor 'value2', whereupon the equality operator would also produce NULL, so that simplifying to "field='value1'" is not formally correct: that would produce FALSE not NULL for other values of "field". We can get away with the replacement anyway at the top level of WHERE, but not in other contexts. Hence, it'd be wrong to try to make this transformation in eval_const_expressions(), which is applied to all expressions. Possibly prepqual.c's canonicalize_qual() would be a better place. The real problem here is going to be objection #2. The rules under which any optimization could be applied are nontrivial, so that we'd spend quite a bit of time trying to figure out whether the optimization applies ... and I'm afraid that most of the time it would not. regards, tom lane
Thanks for your thoughts.
For the case where it isn't known if the case expression itself is indexed, technically that should be added as a decision-node in the query planner. After all there are 2 possibilities to handle that so it should be up to the planner to choose the cheapest.
Having said that, if the time spent planning the query is *that* critical I agree that it probably isn't worth it.
Just that in my line of work the execution time of a query is a lot of orders of magnitude larger than the planning time (my recordholder is a query that runs for just over 3 days...)
On Fri, Jun 5, 2020 at 4:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, 5 Jun 2020 at 14:41, Paul van der Linden
> <paul.doskabouter@gmail.com> wrote:
>> If I have a query like:
>>
>> SELECT * FROM (
>> SELECT
>> CASE
>> WHEN field='value1' THEN 1
>> WHEN field='value2' THEN 2
>> END AS category
>> FROM table1
>> ) AS foo
>> WHERE category=1
>>
>> doesn't use the index on field, while technically it could do that.
>> Is it hard to implement drilling down the constant in the WHERE to within the CASE?
> It doesn't look impossible to improve that particular case. See
> eval_const_expressions_mutator() in clauses.c at T_CaseExpr. However,
> this would need to take constant folding further than we take it
> today. Today we just have the ability to simplify expressions which
> are, by themselves, an expression which will always evaluate to a
> constant value. This case is more complex as it requires something
> outside the CASE expr to allow the simplification to take place. In
> this case, we'd need to look at the other side of the OpExpr to see
> the const there before any transformation could simplify it.
I'd tend to see this as a transformation rule that acts on equality-
with-a-CASE-input, thereby avoiding the "action at a distance" problem.
> It's
> also not entirely clear that the simplification would always be a good
> idea. What, for example if there was an index on the case statement
> but none on "field". The query may perform worse!
FWIW, I'm not too fussed about that objection. If we rejected new
optimizations on the basis that somebody's optimized-for-the-old-way
query might perform worse, almost no planner changes would ever get in.
I think most people would feel that an optimization like this is an
improvement. (I recall coming across a similar case in an
information_schema query just a few days ago.) The hard questions
I would ask are
1. Is the transformation actually correct?
2. Does it improve queries often enough to be worth the planning cycles
expended to look for the optimization?
As far as #1 goes, note that this CASE produces NULL if "field" is
neither 'value1' nor 'value2', whereupon the equality operator would
also produce NULL, so that simplifying to "field='value1'" is not
formally correct: that would produce FALSE not NULL for other values
of "field". We can get away with the replacement anyway at the top
level of WHERE, but not in other contexts. Hence, it'd be wrong to
try to make this transformation in eval_const_expressions(), which is
applied to all expressions. Possibly prepqual.c's canonicalize_qual()
would be a better place.
The real problem here is going to be objection #2. The rules under
which any optimization could be applied are nontrivial, so that we'd
spend quite a bit of time trying to figure out whether the optimization
applies ... and I'm afraid that most of the time it would not.
regards, tom lane
Paul van der Linden <paul.doskabouter@gmail.com> writes: > For the case where it isn't known if the case expression itself is indexed, > technically that should be added as a decision-node in the query planner. That'd be fairly hard to do, if we're regarding this as an expression simplification step, since expression simplification is run long before any consideration is given to indexes. (Even if we were willing to contemplate reversing that ordering, it'd be hard to do, because we need the simplified expressions to compare to index expressions --- else we'd get fooled by irrelevant discrepancies that simplification is supposed to remove.) The alternative is to try to wire this into index path generation instead of treating it as a general-purpose expression simplification ... but that likewise seems pretty undesirable. If you've got a case like this, you'd like it to be simplified whether it ends up as an indexqual or not. So, as I said, I'm inclined to dismiss David's complaint as an impracticable requirement. The other issues I raised are far more significant. BTW, speaking of correctness, this seems like a pretty dire counterexample: SELECT ... FROM (SELECT CASE WHEN x = 0 THEN 'zero' WHEN 1/x > 100 THEN 'tiny' ELSE 'whatever' END AS class, ... ) ss WHERE ss.class = 'tiny'; Naive application of this transformation would convert the WHERE to WHERE 1/x > 100 creating divide-by-zero failures where there should be none. I'm not sure how we get around that; in general the planner has little clue which operations can throw what errors. regards, tom lane
Ok, as always there's a lot more to take into account then when just superficially looking at it.
And indeed your counterexample shows that you'd have to include all the previous when-conditions too as false
WHERE x=0 IS DISTINCT FROM true AND 1/x > 100, which could become quite messy (especially with nested cases....)
On Fri, Jun 5, 2020 at 9:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Paul van der Linden <paul.doskabouter@gmail.com> writes:
> For the case where it isn't known if the case expression itself is indexed,
> technically that should be added as a decision-node in the query planner.
That'd be fairly hard to do, if we're regarding this as an expression
simplification step, since expression simplification is run long before
any consideration is given to indexes. (Even if we were willing to
contemplate reversing that ordering, it'd be hard to do, because we
need the simplified expressions to compare to index expressions ---
else we'd get fooled by irrelevant discrepancies that simplification
is supposed to remove.)
The alternative is to try to wire this into index path generation instead
of treating it as a general-purpose expression simplification ... but that
likewise seems pretty undesirable. If you've got a case like this, you'd
like it to be simplified whether it ends up as an indexqual or not.
So, as I said, I'm inclined to dismiss David's complaint as an
impracticable requirement. The other issues I raised are far more
significant.
BTW, speaking of correctness, this seems like a pretty dire
counterexample:
SELECT ... FROM
(SELECT CASE WHEN x = 0 THEN 'zero'
WHEN 1/x > 100 THEN 'tiny'
ELSE 'whatever' END AS class,
...
) ss
WHERE ss.class = 'tiny';
Naive application of this transformation would convert the WHERE to
WHERE 1/x > 100
creating divide-by-zero failures where there should be none.
I'm not sure how we get around that; in general the planner
has little clue which operations can throw what errors.
regards, tom lane