Re: Forcing query to use an index - Mailing list pgsql-sql

From Michael Nachbaur
Subject Re: Forcing query to use an index
Date
Msg-id 06B62341-4DCA-11D7-90E0-000A27935D5A@nachbaur.com
Whole thread Raw
In response to Re: Forcing query to use an index  (Greg Stark <gsstark@mit.edu>)
Responses Re: Forcing query to use an index
List pgsql-sql
On Monday, Mar 3, 2003, at 14:21 US/Pacific, Greg Stark wrote:
> You should send the query as well, and \d customer_month_summary so we 
> can see
> how you defined your indexes.
            Table "customer_month_summary"    Column     |           Type           | Modifiers
---------------+--------------------------+----------- customerid    | integer                  | timestart     |
timestampwith time zone | timeend       | timestamp with time zone | accountnum    | character varying(255)   |
firstname    | character varying(255)   | lastname      | character varying(255)   | organization  | character
varying(255)  | package       | character varying(255)   | up            | bigint                   | down          |
bigint                  | maxup         | bigint                   | maxdown       | bigint                   |
violatedsize | bigint                   | maxtotal      | bigint                   | total         | bigint
     | violatedup    | bigint                   | violateddown  | bigint                   | violatedtotal | bigint
             | monthstart    | date                     |
 
Indexes: customer_month_summary_cid_idx,         customer_month_summary_cm_idx,
customer_month_summary_time_idx

> There doesn't seem to be a filter on the scan so it looks like 
> postgres thinks
> you're actually reading in the entire table, which is normally faster 
> with a
> sequential scan than an index scan. In fact I'm surprised it's doing 
> an index
> scan on the other table and not a sequential scan.
>
> Some things to try:
>
> set enable_seqscan = off
>
> Then try your query again, see if postgres is right and it really is 
> faster to
> do the sequential scan.

Sort  (cost=100014872.07..100014872.07 rows=268 width=265) (actual 
time=382.51..382.51 rows=6 loops=1)  ->  Aggregate  (cost=100014693.83..100014861.27 rows=268 width=265) 
(actual time=381.93..382.33 rows=6 loops=1)        ->  Group  (cost=100014693.83..100014854.57 rows=2679 
width=265) (actual time=381.79..381.90 rows=6 loops=1)              ->  Sort  (cost=100014693.83..100014693.83
rows=2679
 
width=265) (actual time=381.75..381.75 rows=6 loops=1)                    ->  Merge Join
(cost=100000005.18..100014541.30
 
rows=2679 width=265) (actual time=38.21..381.13 rows=6 loops=1)                          ->  Nested Loop
(cost=0.00..14525.77
 
rows=2679 width=247) (actual time=0.14..149.21 rows=2679 loops=1)                                ->  Index Scan using
customer_id_keyon 
 
customer c  (cost=0.00..129.73 rows=2679 width=156) (actual 
time=0.06..36.92 rows=2679 loops=1)                                ->  Index Scan using 
customer_month_summary_cm_idx on customer_month_summary cms  
(cost=0.00..5.36 rows=1 width=91) (actual time=0.02..0.03 rows=1 
loops=2679)                          ->  Sort  (cost=100000005.18..100000005.18 
rows=77 width=18) (actual time=0.44..0.56 rows=77 loops=1)                                ->  Seq Scan on emailaddress
ea 
 
(cost=100000000.00..100000002.77 rows=77 width=18) (actual 
time=0.01..0.24 rows=77 loops=1)
Total runtime: 383.25 msec

So from the looks of things the index is way faster (querying 1 row, 
rather than over 8000).

> set random_page_cost = 2
>
> Or even lower values.

Sort  (cost=6655.12..6655.12 rows=268 width=265) (actual 
time=902.75..902.76 rows=6 loops=1)  ->  Aggregate  (cost=6476.88..6644.32 rows=268 width=265) (actual 
time=902.18..902.58 rows=6 loops=1)        ->  Group  (cost=6476.88..6637.62 rows=2679 width=265) (actual 
time=902.01..902.13 rows=6 loops=1)              ->  Sort  (cost=6476.88..6476.88 rows=2679 width=265) 
(actual time=901.97..901.98 rows=6 loops=1)                    ->  Merge Join  (cost=6106.42..6324.34 rows=2679 
width=265) (actual time=585.73..901.35 rows=6 loops=1)                          ->  Merge Join  (cost=6101.24..6308.82

rows=2679 width=247) (actual time=549.53..667.35 rows=2679 loops=1)                                ->  Index Scan using
customer_id_keyon 
 
customer c  (cost=0.00..118.77 rows=2679 width=156) (actual 
time=0.25..42.08 rows=2679 loops=1)                                ->  Sort  (cost=6101.24..6101.24 
rows=8117 width=91) (actual time=549.21..555.19 rows=8117 loops=1)                                      ->  Seq Scan on

customer_month_summary cms  (cost=0.00..5574.17 rows=8117 width=91) 
(actual time=252.90..472.29 rows=8117 loops=1)                          ->  Sort  (cost=5.18..5.18 rows=77 width=18) 
(actual time=0.70..0.81 rows=77 loops=1)                                ->  Seq Scan on emailaddress ea  
(cost=0.00..2.77 rows=77 width=18) (actual time=0.08..0.35 rows=77 
loops=1)
Total runtime: 905.47 msec

This is with enable_seqscan turned off, so it doesn't look like it's 
doing much good.

> I've also had some success with raising cpu_tuple_cost, though I'm 
> unclear on
> whether that's actually a good approach or not.
>
> Also, don't forget to do a vacuum full on these tables before doing
> testing for optimizations at this level. You can get some confusing 
> results if
> your tables have lots of empty holes in them.

IIRC this locks the tables, so I'll have to run this tonight so the 
users of this system don't race into my office to tar and feather me.

Does using "set enable_seqscan = off" impact anything else on the 
system, or is it connection-specific?  Is this the recommended way of 
doing this, or is there another way of coercing Postgres into using 
indexes?  I'm making these queries from Perl code running in an XSP 
page, so I'm not sure how flexible this option would be from a 
developer point of view.

Thanks for your help.



pgsql-sql by date:

Previous
From: Michael Nachbaur
Date:
Subject: Re: Forcing query to use an index
Next
From: Josh Berkus
Date:
Subject: Re: Forcing query to use an index