Thread: Drawbacks of create index where is not null ?
Hi, I have pretty large tables, with columns that might never receive any data, or always receive data, based on the customer needs. The index on these columns are really big, even if the column is never used, so I tend to add a "where col is not null" clause on those indexes. What are the drawbacks of defining my index with a "where col is not null" clause ? Franck
Attachment
On 10/11/2012 01:06 AM, Franck Routier wrote: > Hi, > > I have pretty large tables, with columns that might never receive any > data, or always receive data, based on the customer needs. > The index on these columns are really big, even if the column is never > used, so I tend to add a "where col is not null" clause on those indexes. > > What are the drawbacks of defining my index with a "where col is not > null" clause ? * You can't CLUSTER on a partial index; and * The partial index will only be used for queries that use the condition "WHERE col IS NOT NULL" themselves. The planner isn't super-smart about how it matches index WHERE conditions to query WHERE conditions, so you'll want to use exactly the same condition text where possible. -- Craig Ringer
On Wed, Oct 10, 2012 at 11:26 PM, Craig Ringer <ringerc@ringerc.id.au> wrote: > On 10/11/2012 01:06 AM, Franck Routier wrote: >> >> Hi, >> >> I have pretty large tables, with columns that might never receive any >> data, or always receive data, based on the customer needs. >> The index on these columns are really big, even if the column is never >> used, so I tend to add a "where col is not null" clause on those indexes. >> >> What are the drawbacks of defining my index with a "where col is not null" >> clause ? > > > * You can't CLUSTER on a partial index; and > > * The partial index will only be used for queries that use the condition > "WHERE col IS NOT NULL" themselves. The planner isn't super-smart about how > it matches index WHERE conditions to query WHERE conditions, so you'll want > to use exactly the same condition text where possible. I think the query planner has gotten a little smarter of late: smarlowe=# select version(); version ---------------------------------------------------------------------------------------------------------------- PostgreSQL 9.1.6 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit (1 row) smarlowe=# drop table a; DROP TABLE smarlowe=# create table a (i int); CREATE TABLE smarlowe=# insert into a select null from generate_series(1,10000); INSERT 0 10000 smarlowe=# insert into a values (10); INSERT 0 1 smarlowe=# insert into a select null from generate_series(1,10000); INSERT 0 10000 smarlowe=# create index on a (i) where i is not null; CREATE INDEX smarlowe=# explain select * from a where i =10; QUERY PLAN ------------------------------------------------------------------------ Bitmap Heap Scan on a (cost=4.28..78.00 rows=100 width=4) Recheck Cond: (i = 10) -> Bitmap Index Scan on a_i_idx (cost=0.00..4.26 rows=100 width=0) Index Cond: (i = 10) (4 rows)
On Wed, Oct 10, 2012 at 11:42 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Wed, Oct 10, 2012 at 11:26 PM, Craig Ringer <ringerc@ringerc.id.au> wrote: >> On 10/11/2012 01:06 AM, Franck Routier wrote: >>> >>> Hi, >>> >>> I have pretty large tables, with columns that might never receive any >>> data, or always receive data, based on the customer needs. >>> The index on these columns are really big, even if the column is never >>> used, so I tend to add a "where col is not null" clause on those indexes. >>> >>> What are the drawbacks of defining my index with a "where col is not null" >>> clause ? >> >> >> * You can't CLUSTER on a partial index; and >> >> * The partial index will only be used for queries that use the condition >> "WHERE col IS NOT NULL" themselves. The planner isn't super-smart about how >> it matches index WHERE conditions to query WHERE conditions, so you'll want >> to use exactly the same condition text where possible. > > I think the query planner has gotten a little smarter of late: > > smarlowe=# select version(); > version > ---------------------------------------------------------------------------------------------------------------- > PostgreSQL 9.1.6 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real > (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit > (1 row) > > smarlowe=# drop table a; > DROP TABLE > smarlowe=# create table a (i int); > CREATE TABLE > smarlowe=# insert into a select null from generate_series(1,10000); > INSERT 0 10000 > smarlowe=# insert into a values (10); > INSERT 0 1 > smarlowe=# insert into a select null from generate_series(1,10000); > INSERT 0 10000 > smarlowe=# create index on a (i) where i is not null; > CREATE INDEX > smarlowe=# explain select * from a where i =10; > QUERY PLAN > ------------------------------------------------------------------------ > Bitmap Heap Scan on a (cost=4.28..78.00 rows=100 width=4) > Recheck Cond: (i = 10) > -> Bitmap Index Scan on a_i_idx (cost=0.00..4.26 rows=100 width=0) > Index Cond: (i = 10) > (4 rows) Actually after an analyze it just uses the plain index no bitmap scan. So I get the same explain output with or without the "and i is not null" clause added in.
Le 11/10/2012 07:26, Craig Ringer a écrit : > * The partial index will only be used for queries that use the > condition "WHERE col IS NOT NULL" themselves. The planner isn't > super-smart about how it matches index WHERE conditions to query WHERE > conditions, so you'll want to use exactly the same condition text > where possible. > From my experiments, the planner seems to be smart enougth to tell that "where col = 'myvalue' " will match with partial index "where col is not null". So it will use the index and not do a full tablescan. (this is on 8.4). This is also what Scott says in his reply. I'm not thinking of using more complex where predicat for my indexes, just "is not null". So I think I should not be hit by this... Thanks, Franck
Attachment
On Wed, Oct 10, 2012 at 10:42 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > I think the query planner has gotten a little smarter of late: > > smarlowe=# create index on a (i) where i is not null; > CREATE INDEX > smarlowe=# explain select * from a where i =10; > QUERY PLAN > ------------------------------------------------------------------------ > Bitmap Heap Scan on a (cost=4.28..78.00 rows=100 width=4) > Recheck Cond: (i = 10) > -> Bitmap Index Scan on a_i_idx (cost=0.00..4.26 rows=100 width=0) > Index Cond: (i = 10) > (4 rows) It is even smarter a little bit more: [local]:5432 grayhemp@grayhemp=# create index h_idx1 on h (n) where v is not null; CREATE INDEX [local]:5432 grayhemp@grayhemp=# explain analyze select * from h where v = '0.5'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on h (cost=1616.10..8494.68 rows=1 width=30) (actual time=111.735..111.735 rows=0 loops=1) Recheck Cond: (v IS NOT NULL) Filter: (v = '0.5'::text) -> Bitmap Index Scan on h_idx1 (cost=0.00..1616.10 rows=102367 width=0) (actual time=19.027..19.027 rows=100271 loops=1) (5 rows) -- Sergey Konoplev a database and software architect http://www.linkedin.com/in/grayhemp Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984