Re: Forcing query to use an index - Mailing list pgsql-sql
From | Michael Nachbaur |
---|---|
Subject | Re: Forcing query to use an index |
Date | |
Msg-id | 06B62341-4DCA-11D7-90E0-000A27935D5A@nachbaur.com Whole thread Raw |
In response to | Re: Forcing query to use an index (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: Forcing query to use an index
|
List | pgsql-sql |
On Monday, Mar 3, 2003, at 14:21 US/Pacific, Greg Stark wrote: > You should send the query as well, and \d customer_month_summary so we > can see > how you defined your indexes. Table "customer_month_summary" Column | Type | Modifiers ---------------+--------------------------+----------- customerid | integer | timestart | timestampwith time zone | timeend | timestamp with time zone | accountnum | character varying(255) | firstname | character varying(255) | lastname | character varying(255) | organization | character varying(255) | package | character varying(255) | up | bigint | down | bigint | maxup | bigint | maxdown | bigint | violatedsize | bigint | maxtotal | bigint | total | bigint | violatedup | bigint | violateddown | bigint | violatedtotal | bigint | monthstart | date | Indexes: customer_month_summary_cid_idx, customer_month_summary_cm_idx, customer_month_summary_time_idx > There doesn't seem to be a filter on the scan so it looks like > postgres thinks > you're actually reading in the entire table, which is normally faster > with a > sequential scan than an index scan. In fact I'm surprised it's doing > an index > scan on the other table and not a sequential scan. > > Some things to try: > > set enable_seqscan = off > > Then try your query again, see if postgres is right and it really is > faster to > do the sequential scan. Sort (cost=100014872.07..100014872.07 rows=268 width=265) (actual time=382.51..382.51 rows=6 loops=1) -> Aggregate (cost=100014693.83..100014861.27 rows=268 width=265) (actual time=381.93..382.33 rows=6 loops=1) -> Group (cost=100014693.83..100014854.57 rows=2679 width=265) (actual time=381.79..381.90 rows=6 loops=1) -> Sort (cost=100014693.83..100014693.83 rows=2679 width=265) (actual time=381.75..381.75 rows=6 loops=1) -> Merge Join (cost=100000005.18..100014541.30 rows=2679 width=265) (actual time=38.21..381.13 rows=6 loops=1) -> Nested Loop (cost=0.00..14525.77 rows=2679 width=247) (actual time=0.14..149.21 rows=2679 loops=1) -> Index Scan using customer_id_keyon customer c (cost=0.00..129.73 rows=2679 width=156) (actual time=0.06..36.92 rows=2679 loops=1) -> Index Scan using customer_month_summary_cm_idx on customer_month_summary cms (cost=0.00..5.36 rows=1 width=91) (actual time=0.02..0.03 rows=1 loops=2679) -> Sort (cost=100000005.18..100000005.18 rows=77 width=18) (actual time=0.44..0.56 rows=77 loops=1) -> Seq Scan on emailaddress ea (cost=100000000.00..100000002.77 rows=77 width=18) (actual time=0.01..0.24 rows=77 loops=1) Total runtime: 383.25 msec So from the looks of things the index is way faster (querying 1 row, rather than over 8000). > set random_page_cost = 2 > > Or even lower values. Sort (cost=6655.12..6655.12 rows=268 width=265) (actual time=902.75..902.76 rows=6 loops=1) -> Aggregate (cost=6476.88..6644.32 rows=268 width=265) (actual time=902.18..902.58 rows=6 loops=1) -> Group (cost=6476.88..6637.62 rows=2679 width=265) (actual time=902.01..902.13 rows=6 loops=1) -> Sort (cost=6476.88..6476.88 rows=2679 width=265) (actual time=901.97..901.98 rows=6 loops=1) -> Merge Join (cost=6106.42..6324.34 rows=2679 width=265) (actual time=585.73..901.35 rows=6 loops=1) -> Merge Join (cost=6101.24..6308.82 rows=2679 width=247) (actual time=549.53..667.35 rows=2679 loops=1) -> Index Scan using customer_id_keyon customer c (cost=0.00..118.77 rows=2679 width=156) (actual time=0.25..42.08 rows=2679 loops=1) -> Sort (cost=6101.24..6101.24 rows=8117 width=91) (actual time=549.21..555.19 rows=8117 loops=1) -> Seq Scan on customer_month_summary cms (cost=0.00..5574.17 rows=8117 width=91) (actual time=252.90..472.29 rows=8117 loops=1) -> Sort (cost=5.18..5.18 rows=77 width=18) (actual time=0.70..0.81 rows=77 loops=1) -> Seq Scan on emailaddress ea (cost=0.00..2.77 rows=77 width=18) (actual time=0.08..0.35 rows=77 loops=1) Total runtime: 905.47 msec This is with enable_seqscan turned off, so it doesn't look like it's doing much good. > I've also had some success with raising cpu_tuple_cost, though I'm > unclear on > whether that's actually a good approach or not. > > Also, don't forget to do a vacuum full on these tables before doing > testing for optimizations at this level. You can get some confusing > results if > your tables have lots of empty holes in them. IIRC this locks the tables, so I'll have to run this tonight so the users of this system don't race into my office to tar and feather me. Does using "set enable_seqscan = off" impact anything else on the system, or is it connection-specific? Is this the recommended way of doing this, or is there another way of coercing Postgres into using indexes? I'm making these queries from Perl code running in an XSP page, so I'm not sure how flexible this option would be from a developer point of view. Thanks for your help.