Home > mailing lists

Thread: Possible improvement

Possible improvement

From

Paul van der Linden

Date:

04 June 2020, 17:04:54

Hi,

Don't know if this already came up earlier but I have an idea for improvement.

If I have a query like:

SELECT * FROM (

SELECT

CASE

WHEN field='value1' THEN 1

WHEN field='value2' THEN 2

END AS category

FROM table1

) AS foo

WHERE category=1

doesn't use the index on field, while technically it could do that.

Is it hard to implement drilling down the constant in the WHERE to within the CASE?

This is especially convenient with views (inner SELECT) where the category is some complex list of possibilities and you want to filter (outer SELECT) on specific categories

I know a different solution could be creating an index on that CASE but (especially if you're experimenting a bit) can be quite cumbersome to synchronize that with the actual query.

Is this something that could be put on some wishlist? If so where are the most looked at ones?

Paul

P.S. In replies please use reply to all...

Re: Possible improvement

From

David Rowley

Date:

05 June 2020, 08:13:18

On Fri, 5 Jun 2020 at 14:41, Paul van der Linden
<paul.doskabouter@gmail.com> wrote:
> If I have a query like:
>
> SELECT * FROM (
> SELECT
>   CASE
>       WHEN field='value1' THEN 1
>       WHEN field='value2' THEN 2
>   END AS category
> FROM table1
> ) AS foo
> WHERE category=1
>
> doesn't use the index on field, while technically it could do that.
> Is it hard to implement drilling down the constant in the WHERE to within the CASE?

It doesn't look impossible to improve that particular case.  See
eval_const_expressions_mutator() in clauses.c at T_CaseExpr. However,
this would need to take constant folding further than we take it
today. Today we just have the ability to simplify expressions which
are, by themselves, an expression which will always evaluate to a
constant value. This case is more complex as it requires something
outside the CASE expr to allow the simplification to take place. In
this case, we'd need to look at the other side of the OpExpr to see
the const there before any transformation could simplify it.  It's
also not entirely clear that the simplification would always be a good
idea.  What, for example if there was an index on the case statement
but none on "field". The query may perform worse!  The unfortunate
part about this is that, generally, when we perform constant folding,
we don't yet have an idea about which indexes exist.  I imagine the
only sane way to do it would be to allow expressions to have some sort
of "alternative" expression that could be matched up to the index
column instead. It wouldn't be a trivial piece of work to do that.

For the more simple cases, you can see from looking at:

postgres=# explain select * from pg_class where oid = (case when
'test' = 'test' then 1 else 0 end);
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Index Scan using pg_class_oid_index on pg_class  (cost=0.27..8.29
rows=1 width=260)
   Index Cond: (oid = '1'::oid)
(2 rows)

that we do simplify case statements which are by themselves constant.

> Is this something that could be put on some wishlist? If so where are the most looked at ones?

There is a todo list of sorts in [1]. However, I'm really unsure if
anyone ever looks at it for something to do. Mostly, people have their
own ideas and problems to solve and spend their free cycles hacking
away at those. You might have equal luck waiting until December and
writing it on a piece of paper and setting it on fire. Likely there
would be more chance if it was something simple as a novice who's
looking into getting into working on Postgres might skim that list for
something to work on.  More experienced people, I imagine, would never
look there.  FWIW, many people who are now working on PostgreSQL once
came along with a question or idea like yours. Many have been unable
to escape ever since :)

David

[1] https://wiki.postgresql.org/wiki/Todo

Re: Possible improvement

From

Tom Lane

Date:

05 June 2020, 14:31:37

David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, 5 Jun 2020 at 14:41, Paul van der Linden
> <paul.doskabouter@gmail.com> wrote:
>> If I have a query like:
>>
>> SELECT * FROM (
>> SELECT
>> CASE
>> WHEN field='value1' THEN 1
>> WHEN field='value2' THEN 2
>> END AS category
>> FROM table1
>> ) AS foo
>> WHERE category=1
>>
>> doesn't use the index on field, while technically it could do that.
>> Is it hard to implement drilling down the constant in the WHERE to within the CASE?

> It doesn't look impossible to improve that particular case.  See
> eval_const_expressions_mutator() in clauses.c at T_CaseExpr. However,
> this would need to take constant folding further than we take it
> today. Today we just have the ability to simplify expressions which
> are, by themselves, an expression which will always evaluate to a
> constant value. This case is more complex as it requires something
> outside the CASE expr to allow the simplification to take place. In
> this case, we'd need to look at the other side of the OpExpr to see
> the const there before any transformation could simplify it.

I'd tend to see this as a transformation rule that acts on equality-
with-a-CASE-input, thereby avoiding the "action at a distance" problem.

> It's
> also not entirely clear that the simplification would always be a good
> idea.  What, for example if there was an index on the case statement
> but none on "field". The query may perform worse!

FWIW, I'm not too fussed about that objection.  If we rejected new
optimizations on the basis that somebody's optimized-for-the-old-way
query might perform worse, almost no planner changes would ever get in.
I think most people would feel that an optimization like this is an
improvement.  (I recall coming across a similar case in an
information_schema query just a few days ago.)  The hard questions
I would ask are
1. Is the transformation actually correct?
2. Does it improve queries often enough to be worth the planning cycles
expended to look for the optimization?

As far as #1 goes, note that this CASE produces NULL if "field" is
neither 'value1' nor 'value2', whereupon the equality operator would
also produce NULL, so that simplifying to "field='value1'" is not
formally correct: that would produce FALSE not NULL for other values
of "field".  We can get away with the replacement anyway at the top
level of WHERE, but not in other contexts.  Hence, it'd be wrong to
try to make this transformation in eval_const_expressions(), which is
applied to all expressions.  Possibly prepqual.c's canonicalize_qual()
would be a better place.

The real problem here is going to be objection #2.  The rules under
which any optimization could be applied are nontrivial, so that we'd
spend quite a bit of time trying to figure out whether the optimization
applies ... and I'm afraid that most of the time it would not.

            regards, tom lane

Re: Possible improvement

From

Paul van der Linden

Date:

05 June 2020, 18:39:17

Thanks for your thoughts.

For the case where it isn't known if the case expression itself is indexed, technically that should be added as a decision-node in the query planner. After all there are 2 possibilities to handle that so it should be up to the planner to choose the cheapest.

Having said that, if the time spent planning the query is *that* critical I agree that it probably isn't worth it.

Just that in my line of work the execution time of a query is a lot of orders of magnitude larger than the planning time (my recordholder is a query that runs for just over 3 days...)

On Fri, Jun 5, 2020 at 4:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, 5 Jun 2020 at 14:41, Paul van der Linden
> <paul.doskabouter@gmail.com> wrote:
>> If I have a query like:
>>
>> SELECT * FROM (
>> SELECT
>> CASE
>> WHEN field='value1' THEN 1
>> WHEN field='value2' THEN 2
>> END AS category
>> FROM table1
>> ) AS foo
>> WHERE category=1
>>
>> doesn't use the index on field, while technically it could do that.
>> Is it hard to implement drilling down the constant in the WHERE to within the CASE?

> It doesn't look impossible to improve that particular case. See
> eval_const_expressions_mutator() in clauses.c at T_CaseExpr. However,
> this would need to take constant folding further than we take it
> today. Today we just have the ability to simplify expressions which
> are, by themselves, an expression which will always evaluate to a
> constant value. This case is more complex as it requires something
> outside the CASE expr to allow the simplification to take place. In
> this case, we'd need to look at the other side of the OpExpr to see
> the const there before any transformation could simplify it.

I'd tend to see this as a transformation rule that acts on equality-
with-a-CASE-input, thereby avoiding the "action at a distance" problem.

> It's
> also not entirely clear that the simplification would always be a good
> idea. What, for example if there was an index on the case statement
> but none on "field". The query may perform worse!

FWIW, I'm not too fussed about that objection. If we rejected new
optimizations on the basis that somebody's optimized-for-the-old-way
query might perform worse, almost no planner changes would ever get in.
I think most people would feel that an optimization like this is an
improvement. (I recall coming across a similar case in an
information_schema query just a few days ago.) The hard questions
I would ask are
1. Is the transformation actually correct?
2. Does it improve queries often enough to be worth the planning cycles
expended to look for the optimization?

As far as #1 goes, note that this CASE produces NULL if "field" is
neither 'value1' nor 'value2', whereupon the equality operator would
also produce NULL, so that simplifying to "field='value1'" is not
formally correct: that would produce FALSE not NULL for other values
of "field". We can get away with the replacement anyway at the top
level of WHERE, but not in other contexts. Hence, it'd be wrong to
try to make this transformation in eval_const_expressions(), which is
applied to all expressions. Possibly prepqual.c's canonicalize_qual()
would be a better place.

The real problem here is going to be objection #2. The rules under
which any optimization could be applied are nontrivial, so that we'd
spend quite a bit of time trying to figure out whether the optimization
applies ... and I'm afraid that most of the time it would not.

regards, tom lane

Re: Possible improvement

From

Tom Lane

Date:

05 June 2020, 19:02:34

Paul van der Linden <paul.doskabouter@gmail.com> writes:
> For the case where it isn't known if the case expression itself is indexed,
> technically that should be added as a decision-node in the query planner.

That'd be fairly hard to do, if we're regarding this as an expression
simplification step, since expression simplification is run long before
any consideration is given to indexes.  (Even if we were willing to
contemplate reversing that ordering, it'd be hard to do, because we
need the simplified expressions to compare to index expressions ---
else we'd get fooled by irrelevant discrepancies that simplification
is supposed to remove.)

The alternative is to try to wire this into index path generation instead
of treating it as a general-purpose expression simplification ... but that
likewise seems pretty undesirable.  If you've got a case like this, you'd
like it to be simplified whether it ends up as an indexqual or not.

So, as I said, I'm inclined to dismiss David's complaint as an
impracticable requirement.  The other issues I raised are far more
significant.

BTW, speaking of correctness, this seems like a pretty dire
counterexample:

SELECT ... FROM
   (SELECT CASE WHEN x = 0 THEN 'zero'
                WHEN 1/x > 100 THEN 'tiny'
        ELSE 'whatever' END AS class,
       ...
   ) ss
WHERE ss.class = 'tiny';

Naive application of this transformation would convert the WHERE to

WHERE 1/x > 100

creating divide-by-zero failures where there should be none.
I'm not sure how we get around that; in general the planner
has little clue which operations can throw what errors.

            regards, tom lane

Re: Possible improvement

From

Paul van der Linden

Date:

05 June 2020, 19:46:42

Ok, as always there's a lot more to take into account then when just superficially looking at it.

And indeed your counterexample shows that you'd have to include all the previous when-conditions too as false

WHERE x=0 IS DISTINCT FROM true AND 1/x > 100, which could become quite messy (especially with nested cases....)

On Fri, Jun 5, 2020 at 9:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Paul van der Linden <paul.doskabouter@gmail.com> writes:
> For the case where it isn't known if the case expression itself is indexed,
> technically that should be added as a decision-node in the query planner.

That'd be fairly hard to do, if we're regarding this as an expression
simplification step, since expression simplification is run long before
any consideration is given to indexes. (Even if we were willing to
contemplate reversing that ordering, it'd be hard to do, because we
need the simplified expressions to compare to index expressions ---
else we'd get fooled by irrelevant discrepancies that simplification
is supposed to remove.)

The alternative is to try to wire this into index path generation instead
of treating it as a general-purpose expression simplification ... but that
likewise seems pretty undesirable. If you've got a case like this, you'd
like it to be simplified whether it ends up as an indexqual or not.

So, as I said, I'm inclined to dismiss David's complaint as an
impracticable requirement. The other issues I raised are far more
significant.

BTW, speaking of correctness, this seems like a pretty dire
counterexample:

SELECT ... FROM
(SELECT CASE WHEN x = 0 THEN 'zero'
WHEN 1/x > 100 THEN 'tiny'
ELSE 'whatever' END AS class,
...
) ss
WHERE ss.class = 'tiny';

Naive application of this transformation would convert the WHERE to

WHERE 1/x > 100

creating divide-by-zero failures where there should be none.
I'm not sure how we get around that; in general the planner
has little clue which operations can throw what errors.

regards, tom lane