Thread: Index on a NULL-value
I read in the manual today: Indexes are not used for IS NULL clauses by default. The best way to use indexes in such cases is to create a partial index using an IS NULL predicate. This is from the documentation for PostgreSQL 8. I did not find anything equivalent in the 7.4.8-documentation. I wasn't aware of this until it became an issue :-) Well, so I follow the tip but in vain. Reduced and reproduced like this in PostgreSQL 7.4.7: test=# create table mock(a int, b int); CREATE TABLE test=# create index b_is_null on mock((b IS NULL)); CREATE INDEX test=# insert into mock values (10,20); INSERT 70385040 1 test=# insert into mock values (20,30); INSERT 70385041 1 test=# insert into mock values (30, NULL); INSERT 70385042 1 test=# set enable_seqscan=off; SET test=# explain select * from mock where b is NULL; QUERY PLAN -------------------------------------------------------------------- Seq Scan on mock (cost=100000000.00..100000020.00 rows=6 width=8) Filter: (b IS NULL) (2 rows) vacuum analyze also didn't help to recognize the index ;-) Any tips? Rewrite the application to not use NULL-values? Hide under bedclothes and hope the problem goes away? Install more memory in the server? :-) -- Tobias Brox, Beijing
On Tue, May 31, 2005 at 11:02:07 +0800, Tobias Brox <tobias@nordicbet.com> wrote: > I read in the manual today: > > Indexes are not used for IS NULL clauses by default. The best way to use > indexes in such cases is to create a partial index using an IS NULL > predicate. > > This is from the documentation for PostgreSQL 8. I did not find anything > equivalent in the 7.4.8-documentation. > > I wasn't aware of this until it became an issue :-) Well, so I follow the > tip but in vain. Reduced and reproduced like this in PostgreSQL 7.4.7: > > test=# create table mock(a int, b int); > CREATE TABLE > test=# create index b_is_null on mock((b IS NULL)); > CREATE INDEX > test=# insert into mock values (10,20); > INSERT 70385040 1 > test=# insert into mock values (20,30); > INSERT 70385041 1 > test=# insert into mock values (30, NULL); > INSERT 70385042 1 > test=# set enable_seqscan=off; > SET > test=# explain select * from mock where b is NULL; > QUERY PLAN > -------------------------------------------------------------------- > Seq Scan on mock (cost=100000000.00..100000020.00 rows=6 width=8) > Filter: (b IS NULL) > (2 rows) > > vacuum analyze also didn't help to recognize the index ;-) It isn't surprising that an index wasn't used since a sequential scan is going to be faster in your test case. If you want to test this out, you to want use realistically sized tables.
[Tobias Brox - Tue at 11:02:07AM +0800] > test=# explain select * from mock where b is NULL; > QUERY PLAN > -------------------------------------------------------------------- > Seq Scan on mock (cost=100000000.00..100000020.00 rows=6 width=8) > Filter: (b IS NULL) > (2 rows) (...) > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match That tip helped me :-) test=# explain select * from mock where (b IS NULL)=true; QUERY PLAN ---------------------------------------------------------------------- Index Scan using b_is_null on mock (cost=0.00..4.68 rows=1 width=8) Index Cond: ((b IS NULL) = true) (2 rows) -- Tobias Brox, Beijing
[Tobias Brox] > test=# set enable_seqscan=off; [Bruno Wolff III - Mon at 10:16:53PM -0500] > It isn't surprising that an index wasn't used since a sequential scan is > going to be faster in your test case. > > If you want to test this out, you to want use realistically sized tables. Wrong. In this case I was not wondering about the planners choise of not using the index, but the fact that the planner could not find the index at all. Reproducing it on a simple table in a test environment was a valid strategy to solve this specific problem. -- Tobias Brox, Beijing
On Tue, May 31, 2005 at 11:21:20 +0800, Tobias Brox <tobias@nordicbet.com> wrote: > [Tobias Brox - Tue at 11:02:07AM +0800] > > test=# explain select * from mock where b is NULL; > > QUERY PLAN > > -------------------------------------------------------------------- > > Seq Scan on mock (cost=100000000.00..100000020.00 rows=6 width=8) > > Filter: (b IS NULL) > > (2 rows) > > (...) > > > ---------------------------(end of broadcast)--------------------------- > > TIP 9: the planner will ignore your desire to choose an index scan if your > > joining column's datatypes do not match > > That tip helped me :-) > > test=# explain select * from mock where (b IS NULL)=true; > QUERY PLAN > > ---------------------------------------------------------------------- > Index Scan using b_is_null on mock (cost=0.00..4.68 rows=1 width=8) > Index Cond: ((b IS NULL) = true) > (2 rows) Looked back at your first example and saw that you didn't use a partial index which is why you had to contort things to make it possible to use an indexed search. (Though the planner really should have done this since all of the rows should be in one disk block and doing an index scan should require doing more disk reads than a sequential scan for the test case you used.) You want something like this: CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL; The advantage is that the index can be a lot smaller than an index over all of the rows in the case where only a small fraction of rows have a null value for b. (If this isn't the case you probably don't want the index.)
[Bruno Wolff III - Mon at 10:36:33PM -0500] > You want something like this: > CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL; Oh, cool. I wasn't aware that this is possible. This would probably help us a lot of places. :-) -- Tobias Brox, Beijing
On Tue, May 31, 2005 at 11:31:58 +0800, Tobias Brox <tobias@nordicbet.com> wrote: > [Tobias Brox] > > test=# set enable_seqscan=off; > > [Bruno Wolff III - Mon at 10:16:53PM -0500] > > It isn't surprising that an index wasn't used since a sequential scan is > > going to be faster in your test case. > > > > If you want to test this out, you to want use realistically sized tables. > > Wrong. In this case I was not wondering about the planners choise of not > using the index, but the fact that the planner could not find the index at > all. Reproducing it on a simple table in a test environment was a valid > strategy to solve this specific problem. I missed that you turned sequential scans off for your test.
Bruno Wolff III <bruno@wolff.to> writes: > Looked back at your first example and saw that you didn't use a partial > index which is why you had to contort things to make it possible to > use an indexed search. FWIW, there is code in CVS tip that recognizes the connection between an index on a boolean expression and a WHERE clause testing that expression. It's not quite perfect --- using Tobias' example I see regression=# explain select * from mock where b is NULL; QUERY PLAN ------------------------------------------------------------------------ Index Scan using b_is_null on mock (cost=0.00..51.67 rows=10 width=8) Index Cond: ((b IS NULL) = true) Filter: (b IS NULL) (3 rows) so there's a useless filter condition still being generated. But it gets the job done as far as using the index, anyway. > You want something like this: > CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL; I think best practice for something like this is to make the partial index's columns be something different from what the partial condition tests. Done as above, every actual index entry will be a null, so the entry contents are just dead weight. Instead do, say, CREATE INDEX b_is_null ON mock(a) WHERE b IS NULL; where a is chosen as a column that you frequently also test in conjunction with "b IS NULL". That is, the above index can efficiently handle queries like ... WHERE a = 42 AND b IS NULL ... regards, tom lane
Tobias Brox <tobias@nordicbet.com> writes: > [Bruno Wolff III - Mon at 10:36:33PM -0500] > > You want something like this: > > CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL; > > Oh, cool. I wasn't aware that this is possible. This would probably help > us a lot of places. :-) Yeah it's a cool feature. I'm not 100% sure but I think it still won't consider this index unless the column being indexed is used in some indexable operation. So for example if you had CREATE INDEX b_null on mock(other) WHERE b IS NULL; and something like SELECT * FROM b WHERE b IS NULL ORDER BY other or SELECT * FROM b where other > 0 AND b IS NULL then it would be a candidate because the ORDER BY or the other > 0 make the index look relevant. But I don't think (again I'm not 100% sure) that the partial index WHERE clause is considered in picking which indexes to consider. It *is* considered in evaluating which index is the best one to use and whether it's better than a sequential scan. Just not in the initial choice of which indexes to look at at all. -- greg
Greg Stark <gsstark@mit.edu> writes: > then it would be a candidate because the ORDER BY or the other > 0 make the > index look relevant. But I don't think (again I'm not 100% sure) that the > partial index WHERE clause is considered in picking which indexes to consider. Nope, the partial index will be considered simply on the strength of its predicate matching the WHERE clause. Of course, if you can get some additional mileage by having the index contents be useful, that's great --- but it's not necessary. regards, tom lane
> CREATE INDEX b_is_null ON mock(a) WHERE b IS NULL; > > where a is chosen as a column that you frequently also test in > conjunction with "b IS NULL". That is, the above index can efficiently > handle queries like > > ... WHERE a = 42 AND b IS NULL ... This is wonderful, it seems like most of our problems (probably also regarding the "index on timestamp"-thread I started separately) can be solved with partial indexing on expressions. No need to hide under bedclothes anymore ;-) -- Tobias Brox, Beijing
[Tobias Brox - Tue at 11:02:07AM +0800] > I read in the manual today: > > Indexes are not used for IS NULL clauses by default. The best way to use > indexes in such cases is to create a partial index using an IS NULL > predicate. I have summarized this thread in a postgresql doc user comment, posted at http://www.postgresql.org/docs/current/interactive/sql-createindex.html I think it's a good thing to do, since it can be difficult to search the mailing list archives :-) -- Tobias Brox, Beijing