Re: Indexes with condition using immutable functions applied to column not used - Mailing list pgsql-performance

From Sylvain Rabot
Subject Re: Indexes with condition using immutable functions applied to column not used
Date
Msg-id 1297195734.2497.93.camel@kheops
Whole thread Raw
In response to Indexes with condition using immutable functions applied to column not used  (Sylvain Rabot <sylvain@abstraction.fr>)
Responses Re: Re: Indexes with condition using immutable functions applied to column not used
List pgsql-performance
I also tried to do table partitioning using the same immutable function,
it works well except for constraint exclusion.

CREATE TABLE mike.directory_part_0 () INHERITS (mike.directory) WITH (fillfactor = 90);
CREATE RULE directory_part_0_insert AS ON INSERT TO mike.directory WHERE (__mod_cons_hash(new.id_user::bigint, 2) = 0)
DO INSTEAD INSERT INTO mike.directory_part_0 VALUES (new.*);

CREATE TABLE mike.directory_part_1 () INHERITS (mike.directory) WITH (fillfactor = 90);
CREATE RULE directory_part_1_insert AS ON INSERT TO mike.directory WHERE (__mod_cons_hash(new.id_user::bigint, 2) = 1)
DO INSTEAD INSERT INTO mike.directory_part_1 VALUES (new.*);

mike_part=# explain analyze select * from directory where id_user = 3;
                                                                                     QUERY PLAN
                                                             

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..310.21 rows=5226 width=141) (actual time=0.080..7.583 rows=2653 loops=1)
   ->  Append  (cost=0.00..310.21 rows=5226 width=141) (actual time=0.077..3.654 rows=2653 loops=1)
         ->  Index Scan using directory_id_user_btree_idx on directory  (cost=0.00..8.27 rows=1 width=141) (actual
time=0.007..0.007rows=0 loops=1) 
               Index Cond: (id_user = 3)
         ->  Index Scan using directory_part_0_id_user_btree_idx on directory_part_0 directory  (cost=0.00..8.27 rows=1
width=150)(actual time=0.035..0.035 rows=0 loops=1) 
               Index Cond: (id_user = 3)
         ->  Index Scan using directory_part_1_id_user_btree_idx on directory_part_1 directory  (cost=0.00..293.67
rows=5224width=141) (actual time=0.035..2.037 rows=2653 loops=1) 
               Index Cond: (id_user = 3)
 Total runtime: 8.807 ms
(9 rows)


On Tue, 2011-02-08 at 01:14 +0100, Sylvain Rabot wrote:
> Hi,
>
> I am trying to understand how indexes works to get the most of them.
>
> First I would like to know if there is more advantage than overhead to
> split an index in several ones using conditions e.g. doing :
>
> CREATE INDEX directory_id_user_0_btree_idx ON mike.directory USING btree (id_user) WHERE id_user < 250000;
> CREATE INDEX directory_id_user_250000_btree_idx ON mike.directory USING btree (id_user) WHERE id_user >= 250000 AND
id_user< 500000; 
> CREATE INDEX directory_id_user_500000_btree_idx ON mike.directory USING btree (id_user) WHERE id_user >= 500000 AND
id_user< 750000; 
> CREATE INDEX directory_id_user_750000_btree_idx ON mike.directory USING btree (id_user) WHERE id_user >= 750000 AND
id_user< 1000000; 
>
> instead of having only one index for all the id_user. the forecasts for
> the table directory are +500 millions records and something like 1
> million distinct id_user.
>
> If there is my idea was to do a repartition in the indexes using a
> consistent hash algorithm in order to fill the indexes in parallel
> instead of successively :
>
> CREATE OR REPLACE FUNCTION mike.__mod_cons_hash(
>     IN  in_dividend     bigint,
>     IN  in_divisor      integer,
>     OUT remainder       integer
> ) AS $__$
>
> BEGIN
>     SELECT in_dividend % in_divisor INTO remainder;
> END;
>
> $__$ LANGUAGE plpgsql IMMUTABLE COST 10;
>
> CREATE INDEX directory_id_user_mod_cons_hash_0_btree_idx ON mike.directory USING btree (id_user) WHERE
__mod_cons_hash(id_user,4) = 0; 
> CREATE INDEX directory_id_user_mod_cons_hash_1_btree_idx ON mike.directory USING btree (id_user) WHERE
__mod_cons_hash(id_user,4) = 1; 
> CREATE INDEX directory_id_user_mod_cons_hash_2_btree_idx ON mike.directory USING btree (id_user) WHERE
__mod_cons_hash(id_user,4) = 2; 
> CREATE INDEX directory_id_user_mod_cons_hash_3_btree_idx ON mike.directory USING btree (id_user) WHERE
__mod_cons_hash(id_user,4) = 3; 
>
> But the thing is the indexes are not used :
>
> mike=# SELECT version();
>                                                       version
> -------------------------------------------------------------------------------------------------------------------
>  PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit
> (1 row)
>
> mike=# REINDEX INDEX directory_id_user_mod_cons_hash_0_btree_idx;
> LOG:  duration: 14644.160 ms  statement: REINDEX INDEX
> directory_id_user_mod_cons_hash_0_btree_idx;
> REINDEX
> mike=# EXPLAIN ANALYZE SELECT * FROM directory WHERE id_user = 4;
>                                                    QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
>  Seq Scan on directory  (cost=0.00..38140.66 rows=67 width=148) (actual time=0.077..348.211 rows=10303 loops=1)
>    Filter: (id_user = 4)
>  Total runtime: 351.114 ms
> (3 rows)
>
> So I also did this test :
>
> mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING btree (id_user) WHERE id_user > 3 and
id_user< 5; 
> CREATE INDEX
> mike=# EXPLAIN ANALYZE select * from directory where id_user = 4;
>                                                                    QUERY PLAN
                           
>
-------------------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using directory_id_user_4_btree_idx on directory  (cost=0.00..10.58 rows=67 width=148) (actual
time=0.169..7.753rows=10303 loops=1) 
>    Index Cond: (id_user = 4)
>  Total runtime: 10.973 ms
> (3 rows)
>
> mike=# DROP INDEX directory_id_user_4_btree_idx;
> DROP INDEX
> mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING btree (id_user) WHERE id_user - 1 > 2 and
id_user+ 1 < 6; 
> CREATE INDEX
> mike=# EXPLAIN ANALYZE select * from directory where id_user = 4;
>                                                    QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
>  Seq Scan on directory  (cost=0.00..38140.66 rows=67 width=148) (actual time=0.153..360.020 rows=10303 loops=1)
>    Filter: (id_user = 4)
>  Total runtime: 363.106 ms
> (3 rows)
>
> mike=# DROP INDEX directory_id_user_4_btree_idx;
> DROP INDEX
> mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING btree (id_user) WHERE id_user > 2 + 1 and
id_user< 6 - 1; 
> CREATE INDEX
> mike=# EXPLAIN ANALYZE select * from directory where id_user = 4;
>                                                                    QUERY PLAN
                           
>
-------------------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using directory_id_user_4_btree_idx on directory  (cost=0.00..10.58 rows=67 width=148) (actual
time=0.245..8.262rows=10303 loops=1) 
>    Index Cond: (id_user = 4)
>  Total runtime: 11.110 ms
> (3 rows)
>
> As you see the index condition although, differently written, is the
> same but the second index is not used apparently because the immutable
> function is applied on the column.
>
> So do you know the reason why the planner is not able to use indexes
> which have immutable functions applied to the column in their
> condition ?
>
> Regards.
>

--
Sylvain Rabot <sylvain@abstraction.fr>

Attachment

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Really really slow select count(*)
Next
From: Greg Smith
Date:
Subject: Re: Really really slow select count(*)