Home > mailing lists

Indexes with condition using immutable functions applied to column not used - Mailing list pgsql-performance

From	Sylvain Rabot
Subject	Indexes with condition using immutable functions applied to column not used
Date	February 7, 2011 20:15:09
Msg-id	1297124098.2549.65.camel@kheops Whole thread
Responses	Re: Indexes with condition using immutable functions applied to column not used Re: Indexes with condition using immutable functions applied to column not used Re: Indexes with condition using immutable functions applied to column not used
List	pgsql-performance

Tree view

Hi,

I am trying to understand how indexes works to get the most of them.

First I would like to know if there is more advantage than overhead to
split an index in several ones using conditions e.g. doing :

CREATE INDEX directory_id_user_0_btree_idx ON mike.directory USING btree (id_user) WHERE id_user < 250000;
CREATE INDEX directory_id_user_250000_btree_idx ON mike.directory USING btree (id_user) WHERE id_user >= 250000 AND
id_user< 500000; 
CREATE INDEX directory_id_user_500000_btree_idx ON mike.directory USING btree (id_user) WHERE id_user >= 500000 AND
id_user< 750000; 
CREATE INDEX directory_id_user_750000_btree_idx ON mike.directory USING btree (id_user) WHERE id_user >= 750000 AND
id_user< 1000000; 

instead of having only one index for all the id_user. the forecasts for
the table directory are +500 millions records and something like 1
million distinct id_user.

If there is my idea was to do a repartition in the indexes using a
consistent hash algorithm in order to fill the indexes in parallel
instead of successively :

CREATE OR REPLACE FUNCTION mike.__mod_cons_hash(
    IN  in_dividend     bigint,
    IN  in_divisor      integer,
    OUT remainder       integer
) AS $__$

BEGIN
    SELECT in_dividend % in_divisor INTO remainder;
END;

$__$ LANGUAGE plpgsql IMMUTABLE COST 10;

CREATE INDEX directory_id_user_mod_cons_hash_0_btree_idx ON mike.directory USING btree (id_user) WHERE
__mod_cons_hash(id_user,4) = 0; 
CREATE INDEX directory_id_user_mod_cons_hash_1_btree_idx ON mike.directory USING btree (id_user) WHERE
__mod_cons_hash(id_user,4) = 1; 
CREATE INDEX directory_id_user_mod_cons_hash_2_btree_idx ON mike.directory USING btree (id_user) WHERE
__mod_cons_hash(id_user,4) = 2; 
CREATE INDEX directory_id_user_mod_cons_hash_3_btree_idx ON mike.directory USING btree (id_user) WHERE
__mod_cons_hash(id_user,4) = 3; 

But the thing is the indexes are not used :

mike=# SELECT version();
                                                      version
-------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit
(1 row)

mike=# REINDEX INDEX directory_id_user_mod_cons_hash_0_btree_idx;
LOG:  duration: 14644.160 ms  statement: REINDEX INDEX
directory_id_user_mod_cons_hash_0_btree_idx;
REINDEX
mike=# EXPLAIN ANALYZE SELECT * FROM directory WHERE id_user = 4;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Seq Scan on directory  (cost=0.00..38140.66 rows=67 width=148) (actual time=0.077..348.211 rows=10303 loops=1)
   Filter: (id_user = 4)
 Total runtime: 351.114 ms
(3 rows)

So I also did this test :

mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING btree (id_user) WHERE id_user > 3 and id_user
<5; 
CREATE INDEX
mike=# EXPLAIN ANALYZE select * from directory where id_user = 4;
                                                                   QUERY PLAN
                         

-------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using directory_id_user_4_btree_idx on directory  (cost=0.00..10.58 rows=67 width=148) (actual
time=0.169..7.753rows=10303 loops=1) 
   Index Cond: (id_user = 4)
 Total runtime: 10.973 ms
(3 rows)

mike=# DROP INDEX directory_id_user_4_btree_idx;
DROP INDEX
mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING btree (id_user) WHERE id_user - 1 > 2 and
id_user+ 1 < 6; 
CREATE INDEX
mike=# EXPLAIN ANALYZE select * from directory where id_user = 4;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Seq Scan on directory  (cost=0.00..38140.66 rows=67 width=148) (actual time=0.153..360.020 rows=10303 loops=1)
   Filter: (id_user = 4)
 Total runtime: 363.106 ms
(3 rows)

mike=# DROP INDEX directory_id_user_4_btree_idx;
DROP INDEX
mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING btree (id_user) WHERE id_user > 2 + 1 and
id_user< 6 - 1; 
CREATE INDEX
mike=# EXPLAIN ANALYZE select * from directory where id_user = 4;
                                                                   QUERY PLAN
                         

-------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using directory_id_user_4_btree_idx on directory  (cost=0.00..10.58 rows=67 width=148) (actual
time=0.245..8.262rows=10303 loops=1) 
   Index Cond: (id_user = 4)
 Total runtime: 11.110 ms
(3 rows)

As you see the index condition although, differently written, is the
same but the second index is not used apparently because the immutable
function is applied on the column.

So do you know the reason why the planner is not able to use indexes
which have immutable functions applied to the column in their
condition ?

Regards.

--
Sylvain Rabot <sylvain@abstraction.fr>

Attachment

signature.asc

pgsql-performance by date:

From: Craig Ringer
Date: 07 February 2011, 19:56:15
Subject: Re: Really really slow select count(*)

From: Josh Berkus
Date: 07 February 2011, 21:40:00
Subject: Re: Write-heavy pg_stats_collector on mostly idle server

Indexes with condition using immutable functions applied to column not used - Mailing list pgsql-performance

Attachment

Previous

Next