Re: Possible improvement - Mailing list pgsql-general

From Tom Lane
Subject Re: Possible improvement
Date
Msg-id 1470860.1591367497@sss.pgh.pa.us
Whole thread Raw
In response to Re: Possible improvement  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Possible improvement
List pgsql-general
David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, 5 Jun 2020 at 14:41, Paul van der Linden
> <paul.doskabouter@gmail.com> wrote:
>> If I have a query like:
>>
>> SELECT * FROM (
>> SELECT
>> CASE
>> WHEN field='value1' THEN 1
>> WHEN field='value2' THEN 2
>> END AS category
>> FROM table1
>> ) AS foo
>> WHERE category=1
>>
>> doesn't use the index on field, while technically it could do that.
>> Is it hard to implement drilling down the constant in the WHERE to within the CASE?

> It doesn't look impossible to improve that particular case.  See
> eval_const_expressions_mutator() in clauses.c at T_CaseExpr. However,
> this would need to take constant folding further than we take it
> today. Today we just have the ability to simplify expressions which
> are, by themselves, an expression which will always evaluate to a
> constant value. This case is more complex as it requires something
> outside the CASE expr to allow the simplification to take place. In
> this case, we'd need to look at the other side of the OpExpr to see
> the const there before any transformation could simplify it.

I'd tend to see this as a transformation rule that acts on equality-
with-a-CASE-input, thereby avoiding the "action at a distance" problem.

> It's
> also not entirely clear that the simplification would always be a good
> idea.  What, for example if there was an index on the case statement
> but none on "field". The query may perform worse!

FWIW, I'm not too fussed about that objection.  If we rejected new
optimizations on the basis that somebody's optimized-for-the-old-way
query might perform worse, almost no planner changes would ever get in.
I think most people would feel that an optimization like this is an
improvement.  (I recall coming across a similar case in an
information_schema query just a few days ago.)  The hard questions
I would ask are
1. Is the transformation actually correct?
2. Does it improve queries often enough to be worth the planning cycles
expended to look for the optimization?

As far as #1 goes, note that this CASE produces NULL if "field" is
neither 'value1' nor 'value2', whereupon the equality operator would
also produce NULL, so that simplifying to "field='value1'" is not
formally correct: that would produce FALSE not NULL for other values
of "field".  We can get away with the replacement anyway at the top
level of WHERE, but not in other contexts.  Hence, it'd be wrong to
try to make this transformation in eval_const_expressions(), which is
applied to all expressions.  Possibly prepqual.c's canonicalize_qual()
would be a better place.

The real problem here is going to be objection #2.  The rules under
which any optimization could be applied are nontrivial, so that we'd
spend quite a bit of time trying to figure out whether the optimization
applies ... and I'm afraid that most of the time it would not.

            regards, tom lane



pgsql-general by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: Oracle vs. PostgreSQL - a comment
Next
From: Achilleas Mantzios
Date:
Subject: Re: Oracle vs. PostgreSQL - a comment