Thread: Drawbacks of create index where is not null ?

Drawbacks of create index where is not null ?

From
Franck Routier
Date:
Hi,

I have pretty large tables, with columns that might never receive any
data, or always receive data, based on the customer needs.
The index on these columns are really big, even if the column is never
used, so I tend to add a "where col is not null" clause on those indexes.

What are the drawbacks of defining my index with a "where col is not
null" clause ?

Franck


Attachment

Re: Drawbacks of create index where is not null ?

From
Craig Ringer
Date:
On 10/11/2012 01:06 AM, Franck Routier wrote:
> Hi,
>
> I have pretty large tables, with columns that might never receive any
> data, or always receive data, based on the customer needs.
> The index on these columns are really big, even if the column is never
> used, so I tend to add a "where col is not null" clause on those indexes.
>
> What are the drawbacks of defining my index with a "where col is not
> null" clause ?

* You can't CLUSTER on a partial index; and

* The partial index will only be used for queries that use the condition
"WHERE col IS NOT NULL" themselves. The planner isn't super-smart about
how it matches index WHERE conditions to query WHERE conditions, so
you'll want to use exactly the same condition text where possible.

--
Craig Ringer


Re: Drawbacks of create index where is not null ?

From
Scott Marlowe
Date:
On Wed, Oct 10, 2012 at 11:26 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
> On 10/11/2012 01:06 AM, Franck Routier wrote:
>>
>> Hi,
>>
>> I have pretty large tables, with columns that might never receive any
>> data, or always receive data, based on the customer needs.
>> The index on these columns are really big, even if the column is never
>> used, so I tend to add a "where col is not null" clause on those indexes.
>>
>> What are the drawbacks of defining my index with a "where col is not null"
>> clause ?
>
>
> * You can't CLUSTER on a partial index; and
>
> * The partial index will only be used for queries that use the condition
> "WHERE col IS NOT NULL" themselves. The planner isn't super-smart about how
> it matches index WHERE conditions to query WHERE conditions, so you'll want
> to use exactly the same condition text where possible.

I think the query planner has gotten a little smarter of late:

smarlowe=# select version();
                                                    version
----------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.1.6 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real
(Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit
(1 row)

smarlowe=# drop table a;
DROP TABLE
smarlowe=# create table a (i int);
CREATE TABLE
smarlowe=# insert into a select null from generate_series(1,10000);
INSERT 0 10000
smarlowe=# insert into a values (10);
INSERT 0 1
smarlowe=# insert into a select null from generate_series(1,10000);
INSERT 0 10000
smarlowe=# create index on a (i) where i is not null;
CREATE INDEX
smarlowe=# explain select * from a where i =10;
                               QUERY PLAN
------------------------------------------------------------------------
 Bitmap Heap Scan on a  (cost=4.28..78.00 rows=100 width=4)
   Recheck Cond: (i = 10)
   ->  Bitmap Index Scan on a_i_idx  (cost=0.00..4.26 rows=100 width=0)
         Index Cond: (i = 10)
(4 rows)


Re: Drawbacks of create index where is not null ?

From
Scott Marlowe
Date:
On Wed, Oct 10, 2012 at 11:42 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, Oct 10, 2012 at 11:26 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
>> On 10/11/2012 01:06 AM, Franck Routier wrote:
>>>
>>> Hi,
>>>
>>> I have pretty large tables, with columns that might never receive any
>>> data, or always receive data, based on the customer needs.
>>> The index on these columns are really big, even if the column is never
>>> used, so I tend to add a "where col is not null" clause on those indexes.
>>>
>>> What are the drawbacks of defining my index with a "where col is not null"
>>> clause ?
>>
>>
>> * You can't CLUSTER on a partial index; and
>>
>> * The partial index will only be used for queries that use the condition
>> "WHERE col IS NOT NULL" themselves. The planner isn't super-smart about how
>> it matches index WHERE conditions to query WHERE conditions, so you'll want
>> to use exactly the same condition text where possible.
>
> I think the query planner has gotten a little smarter of late:
>
> smarlowe=# select version();
>                                                     version
> ----------------------------------------------------------------------------------------------------------------
>  PostgreSQL 9.1.6 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real
> (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit
> (1 row)
>
> smarlowe=# drop table a;
> DROP TABLE
> smarlowe=# create table a (i int);
> CREATE TABLE
> smarlowe=# insert into a select null from generate_series(1,10000);
> INSERT 0 10000
> smarlowe=# insert into a values (10);
> INSERT 0 1
> smarlowe=# insert into a select null from generate_series(1,10000);
> INSERT 0 10000
> smarlowe=# create index on a (i) where i is not null;
> CREATE INDEX
> smarlowe=# explain select * from a where i =10;
>                                QUERY PLAN
> ------------------------------------------------------------------------
>  Bitmap Heap Scan on a  (cost=4.28..78.00 rows=100 width=4)
>    Recheck Cond: (i = 10)
>    ->  Bitmap Index Scan on a_i_idx  (cost=0.00..4.26 rows=100 width=0)
>          Index Cond: (i = 10)
> (4 rows)


Actually after an analyze it just uses the plain index no bitmap scan.
 So I get the same explain output with or without the "and i is not
null" clause added in.


Re: Drawbacks of create index where is not null ?

From
Franck Routier
Date:
Le 11/10/2012 07:26, Craig Ringer a écrit :
> * The partial index will only be used for queries that use the
> condition "WHERE col IS NOT NULL" themselves. The planner isn't
> super-smart about how it matches index WHERE conditions to query WHERE
> conditions, so you'll want to use exactly the same condition text
> where possible.
>

 From my experiments, the planner seems to be smart enougth to tell that
"where col = 'myvalue' " will match with partial index "where col is not
null".
So it will use the index and not do a full tablescan. (this is on 8.4).
This is also what Scott says in his reply.
I'm not thinking of using more complex where predicat for my indexes,
just "is not null". So I think I should not be hit by this...

Thanks,
Franck


Attachment

Re: Drawbacks of create index where is not null ?

From
Sergey Konoplev
Date:
On Wed, Oct 10, 2012 at 10:42 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> I think the query planner has gotten a little smarter of late:
>
> smarlowe=# create index on a (i) where i is not null;
> CREATE INDEX
> smarlowe=# explain select * from a where i =10;
>                                QUERY PLAN
> ------------------------------------------------------------------------
>  Bitmap Heap Scan on a  (cost=4.28..78.00 rows=100 width=4)
>    Recheck Cond: (i = 10)
>    ->  Bitmap Index Scan on a_i_idx  (cost=0.00..4.26 rows=100 width=0)
>          Index Cond: (i = 10)
> (4 rows)

It is even smarter a little bit more:

[local]:5432 grayhemp@grayhemp=# create index h_idx1 on h (n) where v
is not null;
CREATE INDEX

[local]:5432 grayhemp@grayhemp=# explain analyze select * from h where
v = '0.5';
                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on h  (cost=1616.10..8494.68 rows=1 width=30)
(actual time=111.735..111.735 rows=0 loops=1)
   Recheck Cond: (v IS NOT NULL)
   Filter: (v = '0.5'::text)
   ->  Bitmap Index Scan on h_idx1  (cost=0.00..1616.10 rows=102367
width=0) (actual time=19.027..19.027 rows=100271 loops=1)
(5 rows)


--
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984