Should the optimiser convert a CASE into a WHERE if it can? - Mailing list pgsql-performance

From Richard Neill
Subject Should the optimiser convert a CASE into a WHERE if it can?
Date
Msg-id 4B5F2202.3050107@cam.ac.uk
Whole thread Raw
Responses Re: Should the optimiser convert a CASE into a WHERE if it can?
Re: Should the optimiser convert a CASE into a WHERE if it can?
List pgsql-performance
Dear All,

Just wondering whether there is a missing scope for the query planner
(on 8.4.2) to be cleverer than it currently is.

Specifically, I wonder whether the optimiser should know that by
converting a CASE condition into a WHERE condition, it can use an index.

Have I found a possible enhancement, or is this simply too hard to do?

Best wishes,

Richard



Example:
--------

In this example, tbl_tracker has 255751 rows, with a primary key "id",
whose values lie uniformly in the range 1...1255750.

If one is trying to count multiple conditions, the following query seems
to be the most obvious way to do it:

SELECT
   SUM (case when id > 1200000 and id < 1210000 then 1 else 0 end) AS c1,
   SUM (case when id > 1210000 and id < 1220000 then 1 else 0 end) AS c2,
   SUM (case when id > 1220000 and id < 1230000 then 1 else 0 end) AS c3,
   SUM (case when id > 1230000 and id < 1240000 then 1 else 0 end) AS c4,
   SUM (case when id > 1240000 and id < 1250000 then 1 else 0 end) AS c5
FROM tbl_tracker;


   c1  |  c2  |  c3  |  c4  |  c5
------+------+------+------+------
  2009 | 2018 | 2099 | 2051 | 2030

Time: 361.666 ms



This can be manually optimised into a far uglier (but much much faster)
query:

SELECT * FROM
  (SELECT COUNT (1) AS c1 FROM tbl_tracker
     WHERE id > 1200000 and id < 1210000) AS s1,
  (SELECT COUNT (1) AS c2 FROM tbl_tracker
     WHERE id > 1210000 and id < 1220000) AS s2,
  (SELECT COUNT (1) AS c3 FROM tbl_tracker
     WHERE id > 1220000 and id < 1230000) AS s3,
  (SELECT COUNT (1) AS c4 FROM tbl_tracker
     WHERE id > 1230000 and id < 1240000) AS s4,
  (SELECT COUNT (1) AS c5 FROM tbl_tracker
     WHERE id > 1240000 and id < 1250000) AS s5

   c1  |  c2  |  c3  |  c4  |  c5
------+------+------+------+------
  2009 | 2018 | 2099 | 2051 | 2030
(1 row)

Time: 21.091 ms





Debugging
---------

The simple queries are:

SELECT SUM (case when id > 1200000 and id < 1210000 then 1 else 0 end)
from tbl_tracker;

Time: 174.804 ms

Explain shows that this does a sequential scan.



SELECT COUNT(1) from tbl_tracker WHERE id > 1200000 and id < 1210000;

Time: 4.153 ms

Explain shows that this uses the index, as expected.



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Poor query plan across OR operator
Next
From: Tom Lane
Date:
Subject: Re: Should the optimiser convert a CASE into a WHERE if it can?