A question on the query planner - Mailing list pgsql-performance

From Jared Carr
Subject A question on the query planner
Date
Msg-id 3FCBB639.10409@89glass.com
Whole thread Raw
Responses Re: A question on the query planner
Re: A question on the query planner
List pgsql-performance
I am currently working on optimizing some fairly time consuming queries
on a decently large
dataset.

The Following is the query in question.

SELECT z.lat, z.lon, z.city, z.state, q.date_time, c.make, c.model, c.year
    FROM quotes AS q, zips AS z, cars AS c
        WHERE
            z.zip = q.zip AND
            c.car_id = q.car_id AND
            z.state != 'AA' AND
            z.state != 'AE' AND
            z.state != 'AP' AND
            z.state = 'WA'
     ORDER BY date_time;

The tables are as follows.

                              Table "public.cars"
    Column     |         Type          |               Modifiers
---------------+-----------------------+----------------------------------------
 car_id        | character varying(10) | not null default ''::character
varying
 nags_glass_id | character varying(7)  | not null default ''::character
varying
 make          | character varying(30) | not null default ''::character
varying
 model         | character varying(30) | not null default ''::character
varying
 year          | character varying(4)  | not null default ''::character
varying
 style         | character varying(30) | not null default ''::character
varying
 price         | double precision      | not null default (0)::double
precision
Indexes:
    "cars_pkey" primary key, btree (car_id)
    "cars_car_id_btree_index" btree (car_id)
    "make_cars_index" btree (make)
    "model_cars_index" btree (model)
    "year_cars_index" btree ("year")

                                                 Table "public.quotes"
      Column       |            Type
|                              Modifiers
-------------------+-----------------------------+---------------------------------------------------------------------
 quote_id          | bigint                      | not null default
nextval('quotes_quote_id_seq'::text)
 visitor_id        | bigint                      | not null default
(0)::bigint
 date_time         | timestamp without time zone | not null default
'0001-01-01 00:00:00'::timestamp without time zone
 car_id            | character varying(10)       | not null default
''::character varying
 email             | text                        | not null default ''::text
 zip               | character varying(5)        | not null default
''::character varying
 current_referrer  | text                        | not null default ''::text
 original_referrer | text                        | not null default ''::text
Indexes:
    "quotes_pkey" primary key, btree (quote_id)
    "car_id_quotes_index" btree (car_id)
    "visitor_id_quotes_index" btree (visitor_id)
    "zip_quotes_index" btree (zip)

                                Table "public.zips"
 Column |         Type          |                     Modifiers
--------+-----------------------+---------------------------------------------------
 zip_id | bigint                | not null default
nextval('zips_zip_id_seq'::text)
 zip    | character varying(5)  | not null default ''::character varying
 city   | character varying(28) | not null default ''::character varying
 state  | character varying(2)  | not null default ''::character varying
 lat    | character varying(10) | not null default ''::character varying
 lon    | character varying(10) | not null default ''::character varying
Indexes:
    "zips_pkey" primary key, btree (zip_id)
    "zip_zips_index" btree (zip)
    "zips_state_btree_index" btree (state)

The above query with the default setting of 10 for
default_statistics_target runs as follows

(From Explain Analyze)


---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=58064.16..58074.20 rows=4015 width=80) (actual
time=2415.060..2421.421 rows=4539 loops=1)
   Sort Key: q.date_time
   ->  Merge Join  (cost=57728.02..57823.84 rows=4015 width=80) (actual
time=2254.056..2345.013 rows=4539 loops=1)
         Merge Cond: ("outer"."?column7?" = "inner"."?column5?")
         ->  Sort  (cost=56880.61..56890.65 rows=4015 width=62) (actual
time=2054.353..2062.189 rows=4693 loops=1)
               Sort Key: (q.car_id)::text
               ->  Hash Join  (cost=1403.91..56640.29 rows=4015
width=62) (actual time=8.479..1757.126 rows=10151 loops=1)
                     Hash Cond: (("outer".zip)::text = ("inner".zip)::text)
                     ->  Seq Scan on quotes q  (cost=0.00..10657.42
rows=336142 width=27) (actual time=0.062..657.015 rows=336166 loops=1)
                     ->  Hash  (cost=1402.63..1402.63 rows=511 width=52)
(actual time=8.273..8.273 rows=0 loops=1)
                           ->  Index Scan using zips_state_btree_index
on zips z  (cost=0.00..1402.63 rows=511 width=52) (actual
time=0.215..6.877 rows=718 loops=1)
                                 Index Cond: ((state)::text = 'WA'::text)
                                 Filter: (((state)::text <> 'AA'::text)
AND ((state)::text <> 'AE'::text) AND ((state)::text <> 'AP'::text))
         ->  Sort  (cost=847.41..870.91 rows=9401 width=37) (actual
time=199.172..216.354 rows=11922 loops=1)
               Sort Key: (c.car_id)::text
               ->  Seq Scan on cars c  (cost=0.00..227.01 rows=9401
width=37) (actual time=0.104..43.523 rows=9401 loops=1)
 Total runtime: 2427.937 ms

If I set enable_seqscan=off I get the following

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=122108.52..122118.62 rows=4039 width=80) (actual
time=701.002..707.442 rows=4541 loops=1)
   Sort Key: q.date_time
   ->  Nested Loop  (cost=0.00..121866.59 rows=4039 width=80) (actual
time=0.648..624.134 rows=4541 loops=1)
         ->  Nested Loop  (cost=0.00..102256.36 rows=4039 width=62)
(actual time=0.374..381.440 rows=10153 loops=1)
               ->  Index Scan using zips_state_btree_index on zips z
(cost=0.00..1413.31 rows=514 width=52) (actual time=0.042..9.043
rows=718 loops=1)
                     Index Cond: ((state)::text = 'WA'::text)
                     Filter: (((state)::text <> 'AA'::text) AND
((state)::text <> 'AE'::text) AND ((state)::text <> 'AP'::text))
               ->  Index Scan using zip_quotes_index on quotes q
(cost=0.00..195.59 rows=48 width=27) (actual time=0.039..0.426 rows=14
loops=718)
                     Index Cond: (("outer".zip)::text = (q.zip)::text)
         ->  Index Scan using cars_car_id_btree_index on cars c
(cost=0.00..4.84 rows=1 width=37) (actual time=0.015..0.017 rows=0
loops=10153)
               Index Cond: ((c.car_id)::text = ("outer".car_id)::text)
 Total runtime: 711.375 ms

I can also get a similar plan if I disable both Hash Joins and Merge Joins.

Furthermore I can get some additional speedup without turning off
sequence scans if I
set the value of  default_statistics_target = 1000 then the runtime will
be around 1200
otoh if I set default_statistics_target = 100 then the runtime will be
around 12000.

So, my question is is there any way to get the query planner to
recognize the potential
performance increase available by using the indexes that are set up
without specifically
turning off sequential scans before I run this query every time?

Thanks for the help.

Jared









pgsql-performance by date:

Previous
From: Roger Ging
Date:
Subject: Re: Followup - expression (functional) index use in joins
Next
From: LIANHE SHAO
Date:
Subject: Is clustering possible to enhance the performance?