Thread: Index on a NULL-value

From:
Tobias Brox
Date:

I read in the manual today:

  Indexes are not used for IS NULL clauses by default. The best way to use
  indexes in such cases is to create a partial index using an IS NULL
  predicate.

This is from the documentation for PostgreSQL 8.  I did not find anything
equivalent in the 7.4.8-documentation.

I wasn't aware of this until it became an issue :-) Well, so I follow the
tip but in vain.  Reduced and reproduced like this in PostgreSQL 7.4.7:

test=# create table mock(a int, b int);
CREATE TABLE
test=# create index b_is_null on mock((b IS NULL));
CREATE INDEX
test=# insert into mock values (10,20);
INSERT 70385040 1
test=# insert into mock values (20,30);
INSERT 70385041 1
test=# insert into mock values (30, NULL);
INSERT 70385042 1
test=# set enable_seqscan=off;
SET
test=# explain select * from mock where b is NULL;
                             QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on mock (cost=100000000.00..100000020.00 rows=6 width=8)
   Filter: (b IS NULL)
(2 rows)

vacuum analyze also didn't help to recognize the index ;-)

Any tips?  Rewrite the application to not use NULL-values?  Hide under
bedclothes and hope the problem goes away?  Install more memory in the
server? :-)

--
Tobias Brox, Beijing

From:
Bruno Wolff III
Date:

On Tue, May 31, 2005 at 11:02:07 +0800,
  Tobias Brox <> wrote:
> I read in the manual today:
>
>   Indexes are not used for IS NULL clauses by default. The best way to use
>   indexes in such cases is to create a partial index using an IS NULL
>   predicate.
>
> This is from the documentation for PostgreSQL 8.  I did not find anything
> equivalent in the 7.4.8-documentation.
>
> I wasn't aware of this until it became an issue :-) Well, so I follow the
> tip but in vain.  Reduced and reproduced like this in PostgreSQL 7.4.7:
>
> test=# create table mock(a int, b int);
> CREATE TABLE
> test=# create index b_is_null on mock((b IS NULL));
> CREATE INDEX
> test=# insert into mock values (10,20);
> INSERT 70385040 1
> test=# insert into mock values (20,30);
> INSERT 70385041 1
> test=# insert into mock values (30, NULL);
> INSERT 70385042 1
> test=# set enable_seqscan=off;
> SET
> test=# explain select * from mock where b is NULL;
>                              QUERY PLAN
> --------------------------------------------------------------------
>  Seq Scan on mock (cost=100000000.00..100000020.00 rows=6 width=8)
>    Filter: (b IS NULL)
> (2 rows)
>
> vacuum analyze also didn't help to recognize the index ;-)

It isn't surprising that an index wasn't used since a sequential scan is
going to be faster in your test case.

If you want to test this out, you to want use realistically sized tables.

From:
Tobias Brox
Date:

[Tobias Brox - Tue at 11:02:07AM +0800]
> test=# explain select * from mock where b is NULL;
>                              QUERY PLAN
> --------------------------------------------------------------------
>  Seq Scan on mock (cost=100000000.00..100000020.00 rows=6 width=8)
>    Filter: (b IS NULL)
> (2 rows)

(...)

> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

That tip helped me :-)

test=# explain select * from mock where (b IS NULL)=true;
                              QUERY PLAN

----------------------------------------------------------------------
 Index Scan using b_is_null on mock  (cost=0.00..4.68 rows=1 width=8)
   Index Cond: ((b IS NULL) = true)
(2 rows)

--
Tobias Brox, Beijing

From:
Tobias Brox
Date:

[Tobias Brox]
> test=# set enable_seqscan=off;

[Bruno Wolff III - Mon at 10:16:53PM -0500]
> It isn't surprising that an index wasn't used since a sequential scan is
> going to be faster in your test case.
>
> If you want to test this out, you to want use realistically sized tables.

Wrong.  In this case I was not wondering about the planners choise of not
using the index, but the fact that the planner could not find the index at
all.  Reproducing it on a simple table in a test environment was a valid
strategy to solve this specific problem.

--
Tobias Brox, Beijing

From:
Bruno Wolff III
Date:

On Tue, May 31, 2005 at 11:21:20 +0800,
  Tobias Brox <> wrote:
> [Tobias Brox - Tue at 11:02:07AM +0800]
> > test=# explain select * from mock where b is NULL;
> >                              QUERY PLAN
> > --------------------------------------------------------------------
> >  Seq Scan on mock (cost=100000000.00..100000020.00 rows=6 width=8)
> >    Filter: (b IS NULL)
> > (2 rows)
>
> (...)
>
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: the planner will ignore your desire to choose an index scan if your
> >       joining column's datatypes do not match
>
> That tip helped me :-)
>
> test=# explain select * from mock where (b IS NULL)=true;
>                               QUERY PLAN
>
> ----------------------------------------------------------------------
>  Index Scan using b_is_null on mock  (cost=0.00..4.68 rows=1 width=8)
>    Index Cond: ((b IS NULL) = true)
> (2 rows)

Looked back at your first example and saw that you didn't use a partial
index which is why you had to contort things to make it possible to
use an indexed search. (Though the planner really should have done this
since all of the rows should be in one disk block and doing an index
scan should require doing more disk reads than a sequential scan for
the test case you used.)

You want something like this:
CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL;

The advantage is that the index can be a lot smaller than an index over all
of the rows in the case where only a small fraction of rows have a null value
for b. (If this isn't the case you probably don't want the index.)

From:
Tobias Brox
Date:

[Bruno Wolff III - Mon at 10:36:33PM -0500]
> You want something like this:
> CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL;

Oh, cool.  I wasn't aware that this is possible.  This would probably help
us a lot of places. :-)

--
Tobias Brox, Beijing

From:
Bruno Wolff III
Date:

On Tue, May 31, 2005 at 11:31:58 +0800,
  Tobias Brox <> wrote:
> [Tobias Brox]
> > test=# set enable_seqscan=off;
>
> [Bruno Wolff III - Mon at 10:16:53PM -0500]
> > It isn't surprising that an index wasn't used since a sequential scan is
> > going to be faster in your test case.
> >
> > If you want to test this out, you to want use realistically sized tables.
>
> Wrong.  In this case I was not wondering about the planners choise of not
> using the index, but the fact that the planner could not find the index at
> all.  Reproducing it on a simple table in a test environment was a valid
> strategy to solve this specific problem.

I missed that you turned sequential scans off for your test.

From:
Tom Lane
Date:

Bruno Wolff III <> writes:
> Looked back at your first example and saw that you didn't use a partial
> index which is why you had to contort things to make it possible to
> use an indexed search.

FWIW, there is code in CVS tip that recognizes the connection between
an index on a boolean expression and a WHERE clause testing that
expression.  It's not quite perfect --- using Tobias' example I see

regression=#  explain select * from mock where b is NULL;
                               QUERY PLAN
------------------------------------------------------------------------
 Index Scan using b_is_null on mock  (cost=0.00..51.67 rows=10 width=8)
   Index Cond: ((b IS NULL) = true)
   Filter: (b IS NULL)
(3 rows)

so there's a useless filter condition still being generated.  But it
gets the job done as far as using the index, anyway.

> You want something like this:
> CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL;

I think best practice for something like this is to make the partial
index's columns be something different from what the partial condition
tests.  Done as above, every actual index entry will be a null, so the
entry contents are just dead weight.  Instead do, say,

CREATE INDEX b_is_null ON mock(a) WHERE b IS NULL;

where a is chosen as a column that you frequently also test in
conjunction with "b IS NULL".  That is, the above index can efficiently
handle queries like

    ... WHERE a = 42 AND b IS NULL ...

            regards, tom lane

From:
Greg Stark
Date:

Tobias Brox <> writes:

> [Bruno Wolff III - Mon at 10:36:33PM -0500]
> > You want something like this:
> > CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL;
>
> Oh, cool.  I wasn't aware that this is possible.  This would probably help
> us a lot of places. :-)

Yeah it's a cool feature.

I'm not 100% sure but I think it still won't consider this index unless the
column being indexed is used in some indexable operation. So for example if
you had

CREATE INDEX b_null on mock(other) WHERE b IS NULL;

and something like

 SELECT * FROM b WHERE b IS NULL ORDER BY other
or
 SELECT * FROM b where other > 0 AND b IS NULL

then it would be a candidate because the ORDER BY or the other > 0 make the
index look relevant. But I don't think (again I'm not 100% sure) that the
partial index WHERE clause is considered in picking which indexes to consider.

It *is* considered in evaluating which index is the best one to use and
whether it's better than a sequential scan. Just not in the initial choice of
which indexes to look at at all.

--
greg

From:
Tom Lane
Date:

Greg Stark <> writes:
> then it would be a candidate because the ORDER BY or the other > 0 make the
> index look relevant. But I don't think (again I'm not 100% sure) that the
> partial index WHERE clause is considered in picking which indexes to consider.

Nope, the partial index will be considered simply on the strength of its
predicate matching the WHERE clause.

Of course, if you can get some additional mileage by having the index
contents be useful, that's great --- but it's not necessary.

            regards, tom lane

From:
Tobias Brox
Date:

> CREATE INDEX b_is_null ON mock(a) WHERE b IS NULL;
>
> where a is chosen as a column that you frequently also test in
> conjunction with "b IS NULL".  That is, the above index can efficiently
> handle queries like
>
>     ... WHERE a = 42 AND b IS NULL ...

This is wonderful, it seems like most of our problems (probably also
regarding the "index on timestamp"-thread I started separately) can be
solved with partial indexing on expressions.  No need to hide under
bedclothes anymore ;-)

--
Tobias Brox, Beijing

From:
Tobias Brox
Date:

[Tobias Brox - Tue at 11:02:07AM +0800]
> I read in the manual today:
>
>   Indexes are not used for IS NULL clauses by default. The best way to use
>   indexes in such cases is to create a partial index using an IS NULL
>   predicate.

I have summarized this thread in a postgresql doc user comment, posted at
http://www.postgresql.org/docs/current/interactive/sql-createindex.html

I think it's a good thing to do, since it can be difficult to search the
mailing list archives :-)

--
Tobias Brox, Beijing