Tom Lane wrote:
> "Sergey Koshcheyev" <sergey.p.k@hotmail.com> writes:
>
>>I'm trying to optimize "is null" queries, since PgSQL doesn't index null
>>values. I have found that creating an expression index on (column is null)
>>could work, but it doesn't get used unless the index expression is part of a
>>comparison. Could this be improved, so that (a boolean expression) is taken
>>as equivalent to (a boolean expression = true)?
>
>
> You would be better off to use a partial index:
>
> create index tbl1_abc on tbl1 (somecol) where abc is null;
>
> The advantage of this is that not only do you get the is-null filter,
> but you may be able to filter on some other column(s) at the same time.
> For instance if you commonly query
>
> select ... from tbl1 where abc is null and def > 42
>
> then making "somecol" be "def" would be a winner.
>
> regards, tom lane
Well, I have tried playing with indexes, but it's hard to define the
right indexes for the query, even if NULLs are not involved. One more
example (based on real-world data, except I renamed the fields and the
table):
office=> explain analyze select * from table where (field1 = 0) AND
((field2 IS NULL) OR (field3 = 0));
QUERY PLAN
(a line of dashes stripped to allow wrapping)
Seq Scan on table (cost=0.00..2103.92 rows=3091 width=174) (actual
time=0.176..2413.078 rows=15 loops=1)
Filter: ((field1 = 0) AND ((field2 IS NULL) OR (field3 = 0)))
Total runtime: 2413.349 ms
(3 rows)
- Note that the estimate is way off, and raising statistics actually
makes it even worse. There is an index defined on (field1), but it
doesn't get picked up (probably because lots of rows have field1 equal
0). I have tried to index (field1) where ((field2 IS NULL) OR (field3 =
0)), but it doesn't get picked up either.
office=> explain analyze select * from table where (field1 = 0) AND
((field2 IS NULL) OR (field3 = 0)) = true;
QUERY PLAN
(a line of dashes stripped to allow wrapping)
Index Scan using table_index on table (cost=0.00..1041.36 rows=289
width=174) (actual time=89.689..100.895 rows=15 loops=1)
Index Cond: (((field1 = 0) AND ((field2 IS NULL) OR (field3 = 0))))
= true)
Total runtime: 101.580 ms
(3 rows)
- This query runs much faster, but requires the "= true".
Would the change I suggested (adding the "= true" automatically) be
difficult to implement? We have about 10-20 various queries defined
using such conditions, and they are run frequently, so it would help us
a lot. Or are there any other ways to make it run fast which would not
require me to change the database schema?
Thanks,
Sergey
P.S. I know that this stuff belongs more to psql-performance list, I'm
writing it all here just to justify my request.