Thread: BUG #1470: Boolean expression index not used when it could be

BUG #1470: Boolean expression index not used when it could be

From
"Sergey Koshcheyev"
Date:
The following bug has been logged online:

Bug reference:      1470
Logged by:          Sergey Koshcheyev
Email address:      sergey.p.k@hotmail.com
PostgreSQL version: 7.4.6
Operating system:   Linux (Debian)
Description:        Boolean expression index not used when it could be
Details:

I'm trying to optimize "is null" queries, since PgSQL doesn't index null
values. I have found that creating an expression index on (column is null)
could work, but it doesn't get used unless the index expression is part of a
comparison. Could this be improved, so that (a boolean expression) is taken
as equivalent to (a boolean expression = true)?

Here's an example:

office=> create table tbl1 (abc int);
CREATE TABLE
office=> create index tbl1_abc on tbl1 ((abc is null));
CREATE INDEX
office=> explain select * from tbl1 where (abc is null) = true;
                              QUERY PLAN
----------------------------------------------------------------------
 Index Scan using tbl1_abc on tbl1  (cost=0.00..17.07 rows=6 width=4)
   Index Cond: ((abc IS NULL) = true)
(2 rows)

office=> explain select * from tbl1 where (abc is null);
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on tbl1  (cost=0.00..20.00 rows=6 width=4)
   Filter: (abc IS NULL)
(2 rows)

I would like the second select to pick up the index too.

Re: BUG #1470: Boolean expression index not used when it could be

From
Tom Lane
Date:
"Sergey Koshcheyev" <sergey.p.k@hotmail.com> writes:
> I'm trying to optimize "is null" queries, since PgSQL doesn't index null
> values. I have found that creating an expression index on (column is null)
> could work, but it doesn't get used unless the index expression is part of a
> comparison. Could this be improved, so that (a boolean expression) is taken
> as equivalent to (a boolean expression = true)?

You would be better off to use a partial index:

    create index tbl1_abc on tbl1 (somecol) where abc is null;

The advantage of this is that not only do you get the is-null filter,
but you may be able to filter on some other column(s) at the same time.
For instance if you commonly query

    select ... from tbl1 where abc is null and def > 42

then making "somecol" be "def" would be a winner.

            regards, tom lane

Re: BUG #1470: Boolean expression index not used when it could be

From
Sergey Koshcheyev
Date:
Tom Lane wrote:
> "Sergey Koshcheyev" <sergey.p.k@hotmail.com> writes:
>
>>I'm trying to optimize "is null" queries, since PgSQL doesn't index null
>>values. I have found that creating an expression index on (column is null)
>>could work, but it doesn't get used unless the index expression is part of a
>>comparison. Could this be improved, so that (a boolean expression) is taken
>>as equivalent to (a boolean expression = true)?
>
>
> You would be better off to use a partial index:
>
>     create index tbl1_abc on tbl1 (somecol) where abc is null;
>
> The advantage of this is that not only do you get the is-null filter,
> but you may be able to filter on some other column(s) at the same time.
> For instance if you commonly query
>
>     select ... from tbl1 where abc is null and def > 42
>
> then making "somecol" be "def" would be a winner.
>
>             regards, tom lane

Well, I have tried playing with indexes, but it's hard to define the
right indexes for the query, even if NULLs are not involved. One more
example (based on real-world data, except I renamed the fields and the
table):

office=> explain analyze select * from table where (field1 = 0) AND
((field2 IS NULL) OR (field3 = 0));
                                                  QUERY PLAN
(a line of dashes stripped to allow wrapping)
  Seq Scan on table  (cost=0.00..2103.92 rows=3091 width=174) (actual
time=0.176..2413.078 rows=15 loops=1)
    Filter: ((field1 = 0) AND ((field2 IS NULL) OR (field3 = 0)))
  Total runtime: 2413.349 ms
(3 rows)

- Note that the estimate is way off, and raising statistics actually
makes it even worse. There is an index defined on (field1), but it
doesn't get picked up (probably because lots of rows have field1 equal
0). I have tried to index (field1) where ((field2 IS NULL) OR (field3 =
0)), but it doesn't get picked up either.

office=> explain analyze select * from table where (field1 = 0) AND
((field2 IS NULL) OR (field3 = 0)) = true;
                                                             QUERY PLAN
(a line of dashes stripped to allow wrapping)
  Index Scan using table_index on table  (cost=0.00..1041.36 rows=289
width=174) (actual time=89.689..100.895 rows=15 loops=1)
    Index Cond: (((field1 = 0) AND ((field2 IS NULL) OR (field3 = 0))))
= true)
  Total runtime: 101.580 ms
(3 rows)

- This query runs much faster, but requires the "= true".

Would the change I suggested (adding the "= true" automatically) be
difficult to implement? We have about 10-20 various queries defined
using such conditions, and they are run frequently, so it would help us
a lot. Or are there any other ways to make it run fast which would not
require me to change the database schema?

Thanks,
Sergey

P.S. I know that this stuff belongs more to psql-performance list, I'm
writing it all here just to justify my request.