Join method influences scan method? - Mailing list pgsql-performance

From mark.lubratt@indeq.com
Subject Join method influences scan method?
Date
Msg-id 20050317061624.12920.qmail@gem-wbe01.mesa1.secureserver.net
Whole thread Raw
Responses Re: Join method influences scan method?
List pgsql-performance
This apparently didn't go through the first time; so, I'm reposting...
-------------------------------------------------------------------------------------
Hello!

First off, I'm running 8.0.1 on Win2000 Server.  Vacuum analyze is done
every night.  Query cost parameters are standard, I've only bumped the
estimated_cache_size up to 50000; shared_buffers=15000.

I've been working on optimizing a query.  In the process, I've been
playing around with Tow's method from "SQL Tuning."  He seems pretty
enamored with nested loops over hash joins or merge joins.  So, I
thought
I'd give that a try.  Here's the explain analyze prior to my little
adventure:

QUERY PLAN
Subquery Scan view_get_all_user_award2  (cost=1091.20..1116.09 rows=524
width=493) (actual time=499.000..499.000 rows=368 loops=1)
  - Unique  (cost=1091.20..1110.85 rows=524 width=114) (actual
time=499.000..499.000 rows=368 loops=1)
        - Sort  (cost=1091.20..1092.51 rows=524 width=114) (actual
time=499.000..499.000 rows=1103 loops=1)
              Sort Key: c.job_id, g.person_id, c.job_no, b.deadline,
c.name, bid_date(c.job_id), c.miscq, c.city, COALESCE(((c.city)::text ||
COALESCE((', '::text || (c.st)::text), ''::text)), (COALESCE(c.st,
''::character varying))::text), c.st, CASE WHEN (c.file_loc = 0) THEN
'No
Bid'::character varying WHEN (c.file_loc = -1) THEN 'Bid
Board'::character
varying WHEN (c.file_loc = -2) THEN 'Lost Job'::character varying WHEN
(c.file_loc = -3) THEN 'See Job Notes'::character varying WHEN
((c.file_loc < -3) OR (c.file_loc IS NULL)) THEN ''::character varying
WHEN (h.initials IS NOT NULL) THEN h.initials ELSE 'Unknown
person'::character varying END, j.name, c.s_team, c.file_loc
              - Hash Join  (cost=848.87..1067.53 rows=524 width=114)
(actual time=171.000..484.000 rows=1103 loops=1)
                    Hash Cond: ("outer".company_id = "inner".company_id)
                    - Nested Loop  (cost=805.21..1005.53 rows=524
width=122) (actual time=156.000..314.000 rows=1103 loops=1)
                          Join Filter: (("inner".person_id =
"outer".person_id) OR ("position"(("inner".s_team)::text,
("outer".initials)::text) 0))
                          - Seq Scan on person g  (cost=0.00..1.34
rows=1 width=11) (actual time=0.000..0.000 rows=1 loops=1)
                                Filter: (person_id = 1)
                          - Hash Join  (cost=805.21..988.81 rows=879
width=122) (actual time=156.000..251.000 rows=5484 loops=1)
                                Hash Cond: ("outer".company_id =
"inner".company_id)
                                - Hash Join  (cost=761.55..931.96
rows=879 width=103) (actual time=156.000..188.000 rows=5484 loops=1)
                                      Hash Cond: ("outer".company_id =
"inner".company_id)
                                      - Seq Scan on call_list f
(cost=0.00..78.46 rows=4746 width=8) (actual time=0.000..16.000
rows=4752
loops=1)
                                      - Hash  (cost=761.11..761.11
rows=176 width=95) (actual time=156.000..156.000 rows=0 loops=1)
                                            - Hash Join
(cost=505.90..761.11 rows=176 width=95) (actual time=94.000..140.000
rows=1079 loops=1)
                                                  Hash Cond:
("outer".job_id = "inner".job_id)
                                                  Join Filter:
("outer".won_heat OR "outer".won_vent OR "outer".won_tc OR ("inner".heat
AND "outer".bid_heat AND ("outer".won_heat IS NULL)) OR ("inner".vent
AND
"outer".bid_vent AND ("outer".won_vent IS NULL)) OR ("inner".tc AND
"outer".bid_tc AND ("outer".won_tc IS NULL)))
                                                  - Seq Scan on
builder_list d  (cost=0.00..212.26 rows=7687 width=14) (actual
time=0.000..16.000 rows=7758 loops=1)
                                                        Filter: (role =
'C'::bpchar)
                                                  - Hash
(cost=505.55..505.55 rows=138 width=102) (actual time=94.000..94.000
rows=0 loops=1)
                                                        - Hash Join
(cost=268.41..505.55 rows=138 width=102) (actual time=47.000..94.000
rows=443 loops=1)
                                                              Hash Cond:
("outer".job_id = "inner".job_id)
                                                              - Seq
Scan on builder_list i  (cost=0.00..212.26 rows=2335 width=8) (actual
time=0.000..47.000 rows=2245 loops=1)

Filter: (role = 'E'::bpchar)
                                                              - Hash
(cost=268.06..268.06 rows=139 width=94) (actual time=47.000..47.000
rows=0 loops=1)
                                                                    -
Hash Join  (cost=156.15..268.06 rows=139 width=94) (actual
time=31.000..47.000 rows=451 loops=1)


Hash Cond: ("outer".file_loc = "inner".person_id)


- Hash Join  (cost=154.81..264.51 rows=166 width=87) (actual
time=31.000..47.000 rows=694 loops=1)


      Hash Cond: ("outer".job_id = "inner".job_id)


      - Seq Scan on job c  (cost=0.00..78.57 rows=2357 width=79) (actual
time=0.000..0.000 rows=2302 loops=1)


      - Hash  (cost=154.40..154.40 rows=166 width=8) (actual
time=31.000..31.000 rows=0 loops=1)


            - Hash Join  (cost=1.18..154.40 rows=166 width=8) (actual
time=0.000..31.000 rows=694 loops=1)


                  Hash Cond: ("outer".status_id = "inner".status_id)


                  - Seq Scan on status_list b  (cost=0.00..139.96
rows=2320 width=12) (actual time=0.000..15.000 rows=2302 loops=1)


                        Filter: active


                  - Hash  (cost=1.18..1.18 rows=1 width=4) (actual
time=0.000..0.000 rows=0 loops=1)


                        - Seq Scan on status a  (cost=0.00..1.18 rows=1
width=4) (actual time=0.000..0.000 rows=1 loops=1)


                              Filter: ((name)::text = 'Awaiting
Award'::text)


- Hash  (cost=1.27..1.27 rows=27 width=11) (actual time=0.000..0.000
rows=0 loops=1)


      - Seq Scan on person h  (cost=0.00..1.27 rows=27 width=11) (actual
time=0.000..0.000 rows=27 loops=1)
                                - Hash  (cost=40.53..40.53 rows=1253
width=27) (actual time=0.000..0.000 rows=0 loops=1)
                                      - Seq Scan on company j
(cost=0.00..40.53 rows=1253 width=27) (actual time=0.000..0.000
rows=1254
loops=1)
                    - Hash  (cost=40.53..40.53 rows=1253 width=4)
(actual time=15.000..15.000 rows=0 loops=1)
                          - Seq Scan on company e  (cost=0.00..40.53
rows=1253 width=4) (actual time=0.000..0.000 rows=1254 loops=1)
Total runtime: 499.000 ms


As you can see, it's almost all hash joins and sequential scans.  I
tried
explain analyze again after setting enable_hashjoin=false and the
planner
started using merge joins.  So, I set enable_mergejoin=false and ran it
again.  Here is the resulting explain analyze:

QUERY PLAN
Subquery Scan view_get_all_user_award2  (cost=9525.65..9550.54 rows=524
width=493) (actual time=531.000..547.000 rows=368 loops=1)
  - Unique  (cost=9525.65..9545.30 rows=524 width=114) (actual
time=531.000..547.000 rows=368 loops=1)
        - Sort  (cost=9525.65..9526.96 rows=524 width=114) (actual
time=531.000..531.000 rows=1103 loops=1)
              Sort Key: c.job_id, g.person_id, c.job_no, b.deadline,
c.name, bid_date(c.job_id), c.miscq, c.city, COALESCE(((c.city)::text ||
COALESCE((', '::text || (c.st)::text), ''::text)), (COALESCE(c.st,
''::character varying))::text), c.st, CASE WHEN (c.file_loc = 0) THEN
'No
Bid'::character varying WHEN (c.file_loc = -1) THEN 'Bid
Board'::character
varying WHEN (c.file_loc = -2) THEN 'Lost Job'::character varying WHEN
(c.file_loc = -3) THEN 'See Job Notes'::character varying WHEN
((c.file_loc < -3) OR (c.file_loc IS NULL)) THEN ''::character varying
WHEN (h.initials IS NOT NULL) THEN h.initials ELSE 'Unknown
person'::character varying END, j.name, c.s_team, c.file_loc
              - Nested Loop  (cost=1.30..9501.98 rows=524 width=114)
(actual time=0.000..500.000 rows=1103 loops=1)
                    Join Filter: (("inner".person_id =
"outer".person_id)
OR ("position"(("inner".s_team)::text, ("outer".initials)::text) 0))
                    - Seq Scan on person g  (cost=0.00..1.34 rows=1
width=11) (actual time=0.000..0.000 rows=1 loops=1)
                          Filter: (person_id = 1)
                    - Nested Loop  (cost=1.30..9464.56 rows=1463
width=114) (actual time=0.000..360.000 rows=5484 loops=1)
                          - Nested Loop  (cost=1.30..5350.20 rows=174
width=118) (actual time=0.000..268.000 rows=1079 loops=1)
                                - Nested Loop  (cost=1.30..4399.46
rows=174 width=114) (actual time=0.000..237.000 rows=1079 loops=1)
                                      Join Filter: ("inner".won_heat OR
"inner".won_vent OR "inner".won_tc OR ("outer".heat AND "inner".bid_heat
AND ("inner".won_heat IS NULL)) OR ("outer".vent AND "inner".bid_vent
AND
("inner".won_vent IS NULL)) OR ("outer".tc AND "inner".bid_tc AND
("inner".won_tc IS NULL)))
                                      - Nested Loop
(cost=1.30..3207.87 rows=138 width=121) (actual time=0.000..221.000
rows=443 loops=1)
                                            - Nested Loop
(cost=1.30..2453.84 rows=138 width=102) (actual time=0.000..221.000
rows=443 loops=1)
                                                  - Nested Loop
(cost=1.30..1258.83 rows=139 width=94) (actual time=0.000..189.000
rows=451 loops=1)
                                                        Join Filter:
("outer".file_loc = "inner".person_id)
                                                        - Nested Loop
(cost=0.00..1156.69 rows=166 width=87) (actual time=0.000..31.000
rows=694 loops=1)
                                                              - Nested
Loop  (cost=0.00..170.14 rows=166 width=8) (actual time=0.000..0.000
rows=694 loops=1)
                                                                    Join
Filter: ("outer".status_id = "inner".status_id)
                                                                    -
Seq Scan on status a  (cost=0.00..1.18 rows=1 width=4) (actual
time=0.000..0.000 rows=1 loops=1)


Filter: ((name)::text = 'Awaiting Award'::text)
                                                                    -
Seq Scan on status_list b  (cost=0.00..139.96 rows=2320 width=12)
(actual
time=0.000..0.000 rows=2302 loops=1)


Filter: active
                                                              - Index
Scan using job_pkey on job c  (cost=0.00..5.93 rows=1 width=79) (actual
time=0.000..0.023 rows=1 loops=694)

Index
Cond: (c.job_id = "outer".job_id)
                                                        - Materialize
(cost=1.30..1.57 rows=27 width=11) (actual time=0.000..0.069 rows=27
loops=694)
                                                              - Seq
Scan on person h  (cost=0.00..1.27 rows=27 width=11) (actual
time=0.000..0.000 rows=27 loops=1)
                                                  - Index Scan using
idx_builder_list_job_id on builder_list i  (cost=0.00..8.57 rows=2
width=8) (actual time=0.000..0.000 rows=1 loops=451)
                                                        Index Cond:
(i.job_id = "outer".job_id)
                                                        Filter: (role =
'E'::bpchar)
                                            - Index Scan using
company_pkey on company j  (cost=0.00..5.45 rows=1 width=27) (actual
time=0.000..0.000 rows=1 loops=443)
                                                  Index Cond:
("outer".company_id = j.company_id)
                                      - Index Scan using
idx_builder_list_job_id on builder_list d  (cost=0.00..8.57 rows=5
width=14) (actual time=0.000..0.036 rows=3 loops=443)
                                            Index Cond: (d.job_id =
"outer".job_id)
                                            Filter: (role = 'C'::bpchar)
                                - Index Scan using company_pkey on
company e  (cost=0.00..5.45 rows=1 width=4) (actual time=0.029..0.029
rows=1 loops=1079)
                                      Index Cond: ("outer".company_id =
e.company_id)
                          - Index Scan using idx_company_id_call_list
on call_list f  (cost=0.00..23.57 rows=6 width=8) (actual
time=0.014..0.057 rows=5 loops=1079)
                                Index Cond: (f.company_id =
"outer".company_id)
Total runtime: 547.000 ms


The total run times turn out to be anecdotally insignificant (due to
variations from one run to the next).  I haven't had a chance to
quantify
the variations, I plan on doing that soon.  However, now the planner is
choosing index scans for almost all the tables.  Granted that the
planner
seems to have chosen a different plan, but if I compare the estimated
costs and actual time for the index scans vs. the sequential scans, it
looks like the planner should be choosing index scans, but it isn't.

So, it would seem like my optimal plan should have hash joins with index
scans.  How do I get the planner to the same conclusion?  Should the
join
method influence the scan method?  These seem like they should be
unrelated to me.

Of course, all of this might be moot if my lack of knowledge is
showing...

Any thoughts??
Mark



pgsql-performance by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: Help to find out problem with joined tables
Next
From: Tom Lane
Date:
Subject: Re: Join method influences scan method?