Re: Possible improvement - Mailing list pgsql-general

From Paul van der Linden
Subject Re: Possible improvement
Date
Msg-id CAEC-EqDuDsS-vvifDDsvco_CGT9W75dMx03ideV=BnwWaAaHmA@mail.gmail.com
Whole thread Raw
In response to Re: Possible improvement  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general

Thanks for your thoughts.

For the case where it isn't known if the case expression itself is indexed, technically that should be added as a decision-node in the query planner. After all there are 2 possibilities to handle that so it should be up to the planner to choose the cheapest.

Having said that, if the time spent planning the query is *that* critical I agree that it probably isn't worth it.
Just that in my line of work the execution time of a query is a lot of orders of magnitude larger than the planning time (my recordholder is a query that runs for just over 3 days...)

On Fri, Jun 5, 2020 at 4:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, 5 Jun 2020 at 14:41, Paul van der Linden
> <paul.doskabouter@gmail.com> wrote:
>> If I have a query like:
>>
>> SELECT * FROM (
>> SELECT
>> CASE
>> WHEN field='value1' THEN 1
>> WHEN field='value2' THEN 2
>> END AS category
>> FROM table1
>> ) AS foo
>> WHERE category=1
>>
>> doesn't use the index on field, while technically it could do that.
>> Is it hard to implement drilling down the constant in the WHERE to within the CASE?

> It doesn't look impossible to improve that particular case.  See
> eval_const_expressions_mutator() in clauses.c at T_CaseExpr. However,
> this would need to take constant folding further than we take it
> today. Today we just have the ability to simplify expressions which
> are, by themselves, an expression which will always evaluate to a
> constant value. This case is more complex as it requires something
> outside the CASE expr to allow the simplification to take place. In
> this case, we'd need to look at the other side of the OpExpr to see
> the const there before any transformation could simplify it.

I'd tend to see this as a transformation rule that acts on equality-
with-a-CASE-input, thereby avoiding the "action at a distance" problem.

> It's
> also not entirely clear that the simplification would always be a good
> idea.  What, for example if there was an index on the case statement
> but none on "field". The query may perform worse!

FWIW, I'm not too fussed about that objection.  If we rejected new
optimizations on the basis that somebody's optimized-for-the-old-way
query might perform worse, almost no planner changes would ever get in.
I think most people would feel that an optimization like this is an
improvement.  (I recall coming across a similar case in an
information_schema query just a few days ago.)  The hard questions
I would ask are
1. Is the transformation actually correct?
2. Does it improve queries often enough to be worth the planning cycles
expended to look for the optimization?

As far as #1 goes, note that this CASE produces NULL if "field" is
neither 'value1' nor 'value2', whereupon the equality operator would
also produce NULL, so that simplifying to "field='value1'" is not
formally correct: that would produce FALSE not NULL for other values
of "field".  We can get away with the replacement anyway at the top
level of WHERE, but not in other contexts.  Hence, it'd be wrong to
try to make this transformation in eval_const_expressions(), which is
applied to all expressions.  Possibly prepqual.c's canonicalize_qual()
would be a better place.

The real problem here is going to be objection #2.  The rules under
which any optimization could be applied are nontrivial, so that we'd
spend quite a bit of time trying to figure out whether the optimization
applies ... and I'm afraid that most of the time it would not.

                        regards, tom lane

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Postgres12 - Confusion with pg_restore
Next
From: Tom Lane
Date:
Subject: Re: Possible improvement