Thread: Ignoring index on (A is null), (A is not null) conditions
Hi, suppose, for simplicity, there is a table with index like this: create table TABLE1 ( A integer ); create index TABLE1_A on TABLE1 (A); My question is: why psql (7.3.3) does not use index when filtering by A IS NULL, A IS NOT NULL expressions? In fact, I need to filter by expression ((A is null) or (A > const)). Is there a way to filter by this expression using index? Functional index cannot be used (except strange solution with CASE-ing and converting NULL values into some integer constant) ---------------------------------------------------------------------------- -- Index Scan using table1_a on table1 (cost=0.00..437.14 rows=29164 width=4) Index Cond: (a > 1000) ---------------------------------------------------------------------------- -- Seq Scan on table1 (cost=0.00..448.22 rows=1 width=4) Filter: (a IS NULL) -------------------------------------------------------- Seq Scan on table1 (cost=0.00..448.22 rows=30222 width=4) Filter: (a IS NOT NULL) ------------------------------------------------------------ Seq Scan on table1 (cost=0.00..523.77 rows=29164 width=4) Filter: ((a IS NULL) OR (a > 1000)) ------------------------------------------------------------ CH
Are you seeing this question as totally off-topic in this list, or there is really no one who knows something about indexing "is null" bits in postgres? Regards CH > Hi, > > suppose, for simplicity, there is a table with index like this: > > create table TABLE1 ( > A integer > ); > create index TABLE1_A on TABLE1 (A); > > My question is: why psql (7.3.3) does not use index when filtering by A IS > NULL, A IS NOT > NULL expressions? > > In fact, I need to filter by expression ((A is null) or (A > const)). > > Is there a way to filter by this expression using index? > > Functional index cannot be used (except strange solution with CASE-ing and > converting NULL values into some integer constant) > > > > -------------------------------------------------------------------------- -- > -- > Index Scan using table1_a on table1 (cost=0.00..437.14 rows=29164 width=4) > Index Cond: (a > 1000) > -------------------------------------------------------------------------- -- > -- > Seq Scan on table1 (cost=0.00..448.22 rows=1 width=4) > Filter: (a IS NULL) > -------------------------------------------------------- > Seq Scan on table1 (cost=0.00..448.22 rows=30222 width=4) > Filter: (a IS NOT NULL) > ------------------------------------------------------------ > Seq Scan on table1 (cost=0.00..523.77 rows=29164 width=4) > Filter: ((a IS NULL) OR (a > 1000)) > ------------------------------------------------------------ > > > CH
try this:
EXPLAIN [ANALYZE] SELECT a FROM table1 WHERE a IS NULL OR a>2;
SET enable_seqscan TO off;
EXPLAIN [ANALYZE] SELECT a FROM table1 WHERE a IS NULL OR a>2;
and compare the costs and times of both executions. This will tell you why postgresql is not using an index.
For example, if you have 1000 rows in your table, they will fit in only one page of the table, so postgresql will (correctly) think that fetching and procesing this only page will be faster than fetching the index page, procesing it, and fetching and procesing the table page.
Or perhaps there are so many rows that match your condition, that postgresql realizes that using and index or not it will still have to visit almost every page in the table.
Many things can cause postgresql to think that a seqscan is better than an indexscan, If after comparing the EXPLAINs you see that postgresql is wrong, you should tweak your postgresql.conf (for example the cpu_index_tuple_cost value).
hope it helps.
On Thu, 2003-10-30 at 08:34, Cestmir Hybl wrote:
EXPLAIN [ANALYZE] SELECT a FROM table1 WHERE a IS NULL OR a>2;
SET enable_seqscan TO off;
EXPLAIN [ANALYZE] SELECT a FROM table1 WHERE a IS NULL OR a>2;
and compare the costs and times of both executions. This will tell you why postgresql is not using an index.
For example, if you have 1000 rows in your table, they will fit in only one page of the table, so postgresql will (correctly) think that fetching and procesing this only page will be faster than fetching the index page, procesing it, and fetching and procesing the table page.
Or perhaps there are so many rows that match your condition, that postgresql realizes that using and index or not it will still have to visit almost every page in the table.
Many things can cause postgresql to think that a seqscan is better than an indexscan, If after comparing the EXPLAINs you see that postgresql is wrong, you should tweak your postgresql.conf (for example the cpu_index_tuple_cost value).
hope it helps.
On Thu, 2003-10-30 at 08:34, Cestmir Hybl wrote:
Are you seeing this question as totally off-topic in this list, or there is really no one who knows something about indexing "is null" bits in postgres? Regards CH > Hi, > > suppose, for simplicity, there is a table with index like this: > > create table TABLE1 ( > A integer > ); > create index TABLE1_A on TABLE1 (A); > > My question is: why psql (7.3.3) does not use index when filtering by A IS > NULL, A IS NOT > NULL expressions? > > In fact, I need to filter by expression ((A is null) or (A > const)). > > Is there a way to filter by this expression using index? > > Functional index cannot be used (except strange solution with CASE-ing and > converting NULL values into some integer constant) > > > > -------------------------------------------------------------------------- -- > -- > Index Scan using table1_a on table1 (cost=0.00..437.14 rows=29164 width=4) > Index Cond: (a > 1000) > -------------------------------------------------------------------------- -- > -- > Seq Scan on table1 (cost=0.00..448.22 rows=1 width=4) > Filter: (a IS NULL) > -------------------------------------------------------- > Seq Scan on table1 (cost=0.00..448.22 rows=30222 width=4) > Filter: (a IS NOT NULL) > ------------------------------------------------------------ > Seq Scan on table1 (cost=0.00..523.77 rows=29164 width=4) > Filter: ((a IS NULL) OR (a > 1000)) > ------------------------------------------------------------ > > > CH ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Attachment
On Thu, Oct 30, 2003 at 12:34:15 +0100, Cestmir Hybl <cestmirl@freeside.sk> wrote: > Are you seeing this question as totally off-topic in this list, or there is > really no one who knows something about indexing "is null" bits in postgres? There was some talk about IS NULL not being able to use indexes (unless you specifically created a partial index using that condition) a number of months ago. You could search through the archives if you are interested in what was said. My memory is that people thought it would be a good idea but that it wasn't that important to get done. > > > Hi, > > > > suppose, for simplicity, there is a table with index like this: > > > > create table TABLE1 ( > > A integer > > ); > > create index TABLE1_A on TABLE1 (A); > > > > My question is: why psql (7.3.3) does not use index when filtering by A IS > > NULL, A IS NOT > > NULL expressions? That is a Postgres limitation. If there are only a few null values, but you query for them a lot it may be worth creating a partial index. > > > > In fact, I need to filter by expression ((A is null) or (A > const)). This is a whole different matter. Using an indexed search on > is not necessarily a good idea. Unless you know only a small fraction of the table (often 10% is quoted) is greater than the constant, a sequential scan is probably a better plan than an index scan. If you know that there is only a small fraction of the values above constant and you know some large value greater than all values, you can try using a between comparison to coax the planner into doing an index scan.
"Cestmir Hybl" <cestmirl@freeside.sk> writes: >> In fact, I need to filter by expression ((A is null) or (A > const)). I wonder whether you shouldn't reconsider your data representation. Perhaps the condition you are using "null" for would be better represented by setting A to infinity. (The float and timestamp datatypes actually have a concept of infinity; for other types you can fake it with a large positive value.) regards, tom lane
First of all, thanks for all your suggestions. They were of two classes: 1. use different data representation (special constant from column domain instead of NULL) This is possible, of course, but it makes data model less portable and requires changes in database abstraction layer of application. 2. use partial indexes This is suitable for single null-allowed column index. With increasing number of null-allowed columns inside index, the number of partial indexes required grows exponentially. All RDBMSs I ever used (Sybase, MSSQL, or even MySQL) were using index to filter by expressions containing is NULL conditions /(A is NULL), (A is not NULL), (A is NULL or A = const), (A is NULL or A > const)/ so it seems pretty strange to me that PostgreSQL does not. Is this sheduled feature at least? CH