Query Optimization - Hash Join estimate off? - Mailing list pgsql-performance

From mark.lubratt@indeq.com
Subject Query Optimization - Hash Join estimate off?
Date
Msg-id 20050305183835.17457.qmail@webmail02.mesa1.secureserver.net
Whole thread Raw
List pgsql-performance
<div>Hello!</div><div> </div><div>First off, I'm a real newbie at trying to read the output of explain
analyze.</div><div> </div><div>Ihave several similar queries in my application that I've got incorporated into views. 
Whenthey run sub 300ms, the users don't seem to mind.  However, one of them (query is below along with some relevant
tableinformation) is running about 800ms and my users are starting to grumble.</div><div> </div><div>I ran explain
analyzeon it (explain analyze results are below).  I noticed that the biggest chunk of time is being taken by a Hash
Joinnear the top of the output (I'm still not sure what the indentation means and what the order means).  If I look at
theestimate, it is comparable to several other hash join estimates in the query; however, the actual cost in time is
significantlyhigher than those other hash joins.  Is this significant?</div><div> </div><div>I tried optimizing
accordingto "SQL Tuning" by Tow, but this actually seemed to slow things down.  It also seemed that the query optimizer
inPostgreSQL reordered things on its own according to its own plan anyway.  Is this correct?</div><div> </div><div>I'd
appreciateany help I can get to try to get this query below
300ms.</div><div> </div><div>Thanks!</div><div>Mark</div><div> </div><div>Theplatform is a dual 2.2GHz Xeon 1.2GB RAM
withmirrored drives (raid 1) running Win2000 Pro.  I run "vacuum analyze" every night.  The postgresql.conf is
basicallystandard except that I've opened it up to listen to the external network.  Other
changes:</div><div> </div><div>max_connections= 100</div><div>shared_buffers = 10000</div><div> </div><div>query (the
person_id= 1 in the where clause is changed on a case by case basis - depending upon who's running the
query):</div><div> </div><div>explainanalyze<br /> SELECT DISTINCT c.job_id, g.person_id, c.job_no, b.deadline, c.name,
bid_date(c.job_id)AS bid_date, c.miscq, c.city, c.st, j.name AS eng, c.s_team AS salesteam, <br />        CASE<br
/>           WHEN c.file_loc = 0 THEN 'No Bid'::character varying<br />            WHEN c.file_loc = -1 THEN 'Bid
Board'::charactervarying<br />            WHEN c.file_loc = -2 THEN 'Lost Job'::character varying<br />            WHEN
c.file_loc= -3 THEN 'See Job Notes'::character varying<br />            WHEN c.file_loc < -3 OR c.file_loc IS NULL
THEN''::character varying<br />            WHEN h.initials IS NOT NULL THEN h.initials<br />            ELSE 'Unknown
person'::charactervarying<br />        END AS file_loc, COALESCE(c.city::text || COALESCE(', '::text || c.st::text,
''::text),COALESCE(c.st, ''::character varying)::text) AS "location", c.file_loc AS file_loc_id<br />   FROM status
a<br/>   LEFT JOIN status_list b ON a.status_id = b.status_id AND b.active<br />   LEFT JOIN job c ON c.job_id =
b.job_id<br/>   LEFT JOIN builder_list d ON c.job_id = d.job_id AND (d.won_heat OR d.won_vent OR d.won_tc OR c.heat AND
d.bid_heatAND d.won_heat IS NULL OR c.vent AND d.bid_vent AND d.won_vent IS NULL OR c.tc AND d.bid_tc AND d.won_tc IS
NULL)AND d.role = 'C'::bpchar<br />   LEFT JOIN company e ON d.company_id = e.company_id<br />   LEFT JOIN call_list f
ONe.company_id = f.company_id<br />   LEFT JOIN person g ON f.person_id = g.person_id OR "position"(c.s_team::text,
g.initials::text)> 0<br />   LEFT JOIN person h ON c.file_loc = h.person_id<br />   LEFT JOIN builder_list i ON
c.job_id= i.job_id AND i.role = 'E'::bpchar<br />   LEFT JOIN company j ON i.company_id = j.company_id<br />  WHERE
a.name::text= 'Awaiting Award'::character varying::text and g.person_id = 1<br />  ORDER BY c.job_id, g.person_id,
c.job_no,b.deadline, c.name, bid_date(c.job_id), c.miscq, c.city, COALESCE(c.city::text || COALESCE(', '::text ||
c.st::text,''::text), COALESCE(c.st, ''::character varying)::text), c.st, <br />CASE<br />    WHEN c.file_loc = 0 THEN
'NoBid'::character varying<br />    WHEN c.file_loc = -1 THEN 'Bid Board'::character varying<br />    WHEN c.file_loc =
-2THEN 'Lost Job'::character varying<br />    WHEN c.file_loc = -3 THEN 'See Job Notes'::character varying<br />   
WHENc.file_loc < -3 OR c.file_loc IS NULL THEN ''::character varying<br />    WHEN h.initials IS NOT NULL THEN
h.initials<br/>    ELSE 'Unknown person'::character varying<br />END, j.name, c.s_team, c.file_loc;<br
/></div><div>Tables:</div><div>status- 14 rows</div><div>status_list - 6566 rows</div><div>job - 2210
rows</div><div>builder_list -9670 rows</div><div>company - 1249 rows</div><div>call_list - 4731 rows</div><div>person -
27rows</div><div> </div><div>Primary keys:</div><div>any field with a "_id" suffix is a primary key; and thus is
implicitlyindexed.</div><div> </div><div>Other indexes:</div><div>status_list(job_id)
btree</div><div>status_list(status_id)btree</div><div>job(file_loc) btree</div><div>builder_list(company_id)
btree</div><div>call_list(company_id)btree</div><div>call_list(person_id) btree</div><div>call_list(company_id)
btree</div><div>person(company_id)btree</div><div> </div><div>explain analyze:</div><div>Unique  (cost=1798.47..1809.38
rows=291width=114) (actual time=766.000..781.000 rows=566 loops=1)<br />  ->  Sort  (cost=1798.47..1799.19 rows=291
width=114)(actual time=766.000..766.000 rows=1473 loops=1)<br />        Sort Key: c.job_id, g.person_id, c.job_no,
b.deadline,c.name, bid_date(c.job_id), c.miscq, c.city, COALESCE(((c.city)::text || COALESCE((', '::text ||
(c.st)::text),''::text)), (COALESCE(c.st, ''::character varying))::text), c.st, CASE WHEN (c.fi (..)<br />       
-> Hash Left Join  (cost=1750.81..1786.56 rows=291 width=114) (actual time=453.000..750.000 rows=1473 loops=1)<br
/>             Hash Cond: ("outer".company_id = "inner".company_id)<br />              ->  Merge Left Join 
(cost=1707.20..1722.53rows=291 width=95) (actual time=437.000..484.000 rows=1473 loops=1)<br />                   
MergeCond: ("outer".job_id = "inner".job_id)<br />                    ->  Sort  (cost=1382.44..1383.17 rows=291
width=91)(actual time=406.000..406.000 rows=1473 loops=1)<br />                          Sort Key: c.job_id<br
/>                         ->  Hash Left Join  (cost=1137.28..1370.53 rows=291 width=91) (actual
time=234.000..390.000rows=1473 loops=1)<br />                                Hash Cond: ("outer".file_loc =
"inner".person_id)<br/>                                ->  Nested Loop  (cost=1135.94..1365.27 rows=291 width=84)
(actualtime=234.000..390.000 rows=1473 loops=1)<br />                                      Join Filter:
(("inner".person_id= "outer".person_id) OR ("position"(("inner".s_team)::text, ("outer".initials)::text) > 0))<br
/>                                     ->  Seq Scan on person g  (cost=0.00..1.34 rows=1 width=11) (actual
time=0.000..0.000rows=1 loops=1)<br />                                            Filter: (person_id = 1)<br
/>                                     ->  Merge Right Join  (cost=1135.94..1349.74 rows=811 width=84) (actual
time=234.000..297.000rows=7490 loops=1)<br />                                            Merge Cond:
("outer".company_id= "inner".company_id)<br />                                            ->  Index Scan using
idx_company_id_call_liston call_list f  (cost=0.00..189.80 rows=4731 width=8) (actual time=0.000..15.000 rows=4731
loops=1)<br/>                                            ->  Sort  (cost=1135.94..1136.48 rows=214 width=84) (actual
time=234.000..234.000rows=7490 loops=1)<br />                                                  Sort Key:
e.company_id<br/>                                                  ->  Merge Right Join  (cost=1004.19..1127.66
rows=214width=84) (actual time=203.000..219.000 rows=1569 loops=1)<br
/>                                                       Merge Cond: ("outer".company_id = "inner".company_id)<br
/>                                                       ->  Index Scan using company_pkey on company e 
(cost=0.00..117.13rows=1249 width=4) (actual time=0.000..0.000 rows=1249 loops=1)<br
/>                                                       ->  Sort  (cost=1004.19..1004.73 rows=214 width=84) (actual
time=203.000..203.000rows=1569 loops=1)<br />                                                              Sort Key:
d.company_id<br/>                                                              ->  Hash Left Join 
(cost=633.74..995.91rows=214 width=84) (actual time=156.000..187.000 rows=1569 loops=1)<br
/>                                                                   Hash Cond: ("outer".job_id = "inner".job_id)<br
/>                                                                   Join Filter: ("inner".won_heat OR "inner".won_vent
OR"inner".won_tc OR ("outer".heat AND "inner".bid_heat AND ("inner".won_heat IS NULL)) OR ("outer".vent AND
"inner".bid_ventAND ("inner (..)<br />                                                                    ->  Merge
LeftJoin  (cost=368.17..381.60 rows=159 width=83) (actual time=78.000..93.000 rows=695 loops=1)<br
/>                                                                         Merge Cond: ("outer".job_id =
"inner".job_id)<br/>                                                                          ->  Sort 
(cost=168.31..168.71rows=159 width=8) (actual time=31.000..31.000 rows=695 loops=1)<br
/>                                                                               Sort Key: b.job_id<br
/>                                                                               ->  Nested Loop Left Join 
(cost=0.00..162.50rows=159 width=8) (actual time=0.000..31.000 rows=695 loops=1)<br
/>                                                                                     Join Filter: ("outer".status_id
="inner".status_id)<br />                                                                                      -> 
SeqScan on status a  (cost=0.00..1.18 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)<br
/>                                                                                           Filter: ((name)::text =
'AwaitingAward'::text)<br />                                                                                     
-> Seq Scan on status_list b  (cost=0.00..133.66 rows=2213 width=12) (actual time=0.000..15.000 rows=2210
loops=1)<br/>                                                                                            Filter:
active<br/>                                                                          ->  Sort  (cost=199.86..205.39
rows=2210width=79) (actual time=47.000..47.000 rows=2194 loops=1)<br
/>                                                                               Sort Key: c.job_id<br
/>                                                                               ->  Seq Scan on job c 
(cost=0.00..77.10rows=2210 width=79) (actual time=0.000..31.000 rows=2210 loops=1)<br
/>                                                                   ->  Hash  (cost=202.88..202.88 rows=7475
width=14)(actual time=78.000..78.000 rows=0 loops=1)<br
/>                                                                         ->  Seq Scan on builder_list d 
(cost=0.00..202.88rows=7475 width=14) (actual time=0.000..15.000 rows=7517 loops=1)<br
/>                                                                               Filter: (role = 'C'::bpchar)<br
/>                               ->  Hash  (cost=1.27..1.27 rows=27 width=11) (actual time=0.000..0.000 rows=0
loops=1)<br/>                                      ->  Seq Scan on person h  (cost=0.00..1.27 rows=27 width=11)
(actualtime=0.000..0.000 rows=27 loops=1)<br />                    ->  Sort  (cost=324.76..330.25 rows=2196 width=8)
(actualtime=31.000..31.000 rows=3044 loops=1)<br />                          Sort Key: i.job_id<br
/>                         ->  Seq Scan on builder_list i  (cost=0.00..202.88 rows=2196 width=8) (actual
time=0.000..31.000rows=2153 loops=1)<br />                                Filter: (role = 'E'::bpchar)<br
/>             ->  Hash  (cost=40.49..40.49 rows=1249 width=27) (actual time=16.000..16.000 rows=0
loops=1)</div><div>                   ->  Seq Scan on company j  (cost=0.00..40.49 rows=1249 width=27) (actual
time=0.000..0.000rows=1249 loops=1)<br />Total runtime: 781.000 ms<br /></div> 

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: MAIN vs. PLAIN
Next
From: "xsk"
Date:
Subject: How to Partition?