Indexes and statistics - Mailing list pgsql-sql
From | David Witham |
---|---|
Subject | Indexes and statistics |
Date | |
Msg-id | CFA248776934FD43847E740E43C346D199DC13@ozimelb03.ozicom.com Whole thread Raw |
Responses |
Re: Indexes and statistics
|
List | pgsql-sql |
Hi all, This is a further post from last week. I've got a table of phone call detail records. buns=# select count(*) from cdr; count ---------2800653 (1 row) One of the customers is quite large (8.3% of the records): buns=# select count(*) from cdr where cust_id = 99201110; count --------231889 (1 row) I have indexes on cust_id (integer) and bill_id (integer). If I try to do a query on that customer it doesn't use the index: buns=# explain analyse select count(*) from cdr where cust_id = 99201110 and bill_id is null; QUERY PLAN -----------------------------------------------------------------------------------------------------------------Aggregate (cost=87082.81..87082.81 rows=1 width=0) (actual time=82279.63..82279.63 rows=1 loops=1) -> Seq Scan on cdr (cost=0.00..87037.71rows=18041 width=0) (actual time=82279.61..82279.61 rows=0 loops=1) Filter: ((cust_id = 99201110)AND (bill_id IS NULL))Total runtime: 82280.19 msec (4 rows) I tried this: alter table cdr alter column cust_id set statistics 1000; alter table cdr alter column bill_id set statistics 1000; analyze verbose; The I ran the query again but I still got the same result. Then I tried disabling sequential scans in postgresql.conf, restarted the postmaster and did the query again: buns=# explain analyse select count(*) from cdr where cust_id = 99201110 and bill_id is null; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=913498.60..913498.60 rows=1 width=0) (actual time=48387.91..48387.91 rows=1 loops=1) -> Index Scan using cdr_ix3on cdr (cost=0.00..913453.49 rows=18041 width=0) (actual time=48387.89..48387.89 rows=0 loops=1) Index Cond:(cust_id = 99201110) Filter: (bill_id IS NULL)Total runtime: 48388.47 msec (5 rows) The computed cost of using the index was a factor of 10 higher which I presume is why the query planner wasn't using theindex, but it ran in half the time. So I guess I need to know how to alter the statistics collection so that the indexwill get used. I gather that index columns that occur in more than "a few" percent of the table cause the query plannerto not use the index. Does that mean I won't be able to get the query planner to ever use the cust_id index for thatcustomer or can I tune some parameters to alter that? Any suggestions appreciated. Thanks, David David Witham Telephony Platforms Architect Unidial, Australia