A question on the query planner - Mailing list pgsql-performance
From | Jared Carr |
---|---|
Subject | A question on the query planner |
Date | |
Msg-id | 3FCBB639.10409@89glass.com Whole thread Raw |
Responses |
Re: A question on the query planner
Re: A question on the query planner |
List | pgsql-performance |
I am currently working on optimizing some fairly time consuming queries on a decently large dataset. The Following is the query in question. SELECT z.lat, z.lon, z.city, z.state, q.date_time, c.make, c.model, c.year FROM quotes AS q, zips AS z, cars AS c WHERE z.zip = q.zip AND c.car_id = q.car_id AND z.state != 'AA' AND z.state != 'AE' AND z.state != 'AP' AND z.state = 'WA' ORDER BY date_time; The tables are as follows. Table "public.cars" Column | Type | Modifiers ---------------+-----------------------+---------------------------------------- car_id | character varying(10) | not null default ''::character varying nags_glass_id | character varying(7) | not null default ''::character varying make | character varying(30) | not null default ''::character varying model | character varying(30) | not null default ''::character varying year | character varying(4) | not null default ''::character varying style | character varying(30) | not null default ''::character varying price | double precision | not null default (0)::double precision Indexes: "cars_pkey" primary key, btree (car_id) "cars_car_id_btree_index" btree (car_id) "make_cars_index" btree (make) "model_cars_index" btree (model) "year_cars_index" btree ("year") Table "public.quotes" Column | Type | Modifiers -------------------+-----------------------------+--------------------------------------------------------------------- quote_id | bigint | not null default nextval('quotes_quote_id_seq'::text) visitor_id | bigint | not null default (0)::bigint date_time | timestamp without time zone | not null default '0001-01-01 00:00:00'::timestamp without time zone car_id | character varying(10) | not null default ''::character varying email | text | not null default ''::text zip | character varying(5) | not null default ''::character varying current_referrer | text | not null default ''::text original_referrer | text | not null default ''::text Indexes: "quotes_pkey" primary key, btree (quote_id) "car_id_quotes_index" btree (car_id) "visitor_id_quotes_index" btree (visitor_id) "zip_quotes_index" btree (zip) Table "public.zips" Column | Type | Modifiers --------+-----------------------+--------------------------------------------------- zip_id | bigint | not null default nextval('zips_zip_id_seq'::text) zip | character varying(5) | not null default ''::character varying city | character varying(28) | not null default ''::character varying state | character varying(2) | not null default ''::character varying lat | character varying(10) | not null default ''::character varying lon | character varying(10) | not null default ''::character varying Indexes: "zips_pkey" primary key, btree (zip_id) "zip_zips_index" btree (zip) "zips_state_btree_index" btree (state) The above query with the default setting of 10 for default_statistics_target runs as follows (From Explain Analyze) --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=58064.16..58074.20 rows=4015 width=80) (actual time=2415.060..2421.421 rows=4539 loops=1) Sort Key: q.date_time -> Merge Join (cost=57728.02..57823.84 rows=4015 width=80) (actual time=2254.056..2345.013 rows=4539 loops=1) Merge Cond: ("outer"."?column7?" = "inner"."?column5?") -> Sort (cost=56880.61..56890.65 rows=4015 width=62) (actual time=2054.353..2062.189 rows=4693 loops=1) Sort Key: (q.car_id)::text -> Hash Join (cost=1403.91..56640.29 rows=4015 width=62) (actual time=8.479..1757.126 rows=10151 loops=1) Hash Cond: (("outer".zip)::text = ("inner".zip)::text) -> Seq Scan on quotes q (cost=0.00..10657.42 rows=336142 width=27) (actual time=0.062..657.015 rows=336166 loops=1) -> Hash (cost=1402.63..1402.63 rows=511 width=52) (actual time=8.273..8.273 rows=0 loops=1) -> Index Scan using zips_state_btree_index on zips z (cost=0.00..1402.63 rows=511 width=52) (actual time=0.215..6.877 rows=718 loops=1) Index Cond: ((state)::text = 'WA'::text) Filter: (((state)::text <> 'AA'::text) AND ((state)::text <> 'AE'::text) AND ((state)::text <> 'AP'::text)) -> Sort (cost=847.41..870.91 rows=9401 width=37) (actual time=199.172..216.354 rows=11922 loops=1) Sort Key: (c.car_id)::text -> Seq Scan on cars c (cost=0.00..227.01 rows=9401 width=37) (actual time=0.104..43.523 rows=9401 loops=1) Total runtime: 2427.937 ms If I set enable_seqscan=off I get the following QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=122108.52..122118.62 rows=4039 width=80) (actual time=701.002..707.442 rows=4541 loops=1) Sort Key: q.date_time -> Nested Loop (cost=0.00..121866.59 rows=4039 width=80) (actual time=0.648..624.134 rows=4541 loops=1) -> Nested Loop (cost=0.00..102256.36 rows=4039 width=62) (actual time=0.374..381.440 rows=10153 loops=1) -> Index Scan using zips_state_btree_index on zips z (cost=0.00..1413.31 rows=514 width=52) (actual time=0.042..9.043 rows=718 loops=1) Index Cond: ((state)::text = 'WA'::text) Filter: (((state)::text <> 'AA'::text) AND ((state)::text <> 'AE'::text) AND ((state)::text <> 'AP'::text)) -> Index Scan using zip_quotes_index on quotes q (cost=0.00..195.59 rows=48 width=27) (actual time=0.039..0.426 rows=14 loops=718) Index Cond: (("outer".zip)::text = (q.zip)::text) -> Index Scan using cars_car_id_btree_index on cars c (cost=0.00..4.84 rows=1 width=37) (actual time=0.015..0.017 rows=0 loops=10153) Index Cond: ((c.car_id)::text = ("outer".car_id)::text) Total runtime: 711.375 ms I can also get a similar plan if I disable both Hash Joins and Merge Joins. Furthermore I can get some additional speedup without turning off sequence scans if I set the value of default_statistics_target = 1000 then the runtime will be around 1200 otoh if I set default_statistics_target = 100 then the runtime will be around 12000. So, my question is is there any way to get the query planner to recognize the potential performance increase available by using the indexes that are set up without specifically turning off sequential scans before I run this query every time? Thanks for the help. Jared
pgsql-performance by date: