Join method influences scan method? - Mailing list pgsql-performance
From | mark.lubratt@indeq.com |
---|---|
Subject | Join method influences scan method? |
Date | |
Msg-id | 20050317061624.12920.qmail@gem-wbe01.mesa1.secureserver.net Whole thread Raw |
Responses |
Re: Join method influences scan method?
|
List | pgsql-performance |
This apparently didn't go through the first time; so, I'm reposting... ------------------------------------------------------------------------------------- Hello! First off, I'm running 8.0.1 on Win2000 Server. Vacuum analyze is done every night. Query cost parameters are standard, I've only bumped the estimated_cache_size up to 50000; shared_buffers=15000. I've been working on optimizing a query. In the process, I've been playing around with Tow's method from "SQL Tuning." He seems pretty enamored with nested loops over hash joins or merge joins. So, I thought I'd give that a try. Here's the explain analyze prior to my little adventure: QUERY PLAN Subquery Scan view_get_all_user_award2 (cost=1091.20..1116.09 rows=524 width=493) (actual time=499.000..499.000 rows=368 loops=1) - Unique (cost=1091.20..1110.85 rows=524 width=114) (actual time=499.000..499.000 rows=368 loops=1) - Sort (cost=1091.20..1092.51 rows=524 width=114) (actual time=499.000..499.000 rows=1103 loops=1) Sort Key: c.job_id, g.person_id, c.job_no, b.deadline, c.name, bid_date(c.job_id), c.miscq, c.city, COALESCE(((c.city)::text || COALESCE((', '::text || (c.st)::text), ''::text)), (COALESCE(c.st, ''::character varying))::text), c.st, CASE WHEN (c.file_loc = 0) THEN 'No Bid'::character varying WHEN (c.file_loc = -1) THEN 'Bid Board'::character varying WHEN (c.file_loc = -2) THEN 'Lost Job'::character varying WHEN (c.file_loc = -3) THEN 'See Job Notes'::character varying WHEN ((c.file_loc < -3) OR (c.file_loc IS NULL)) THEN ''::character varying WHEN (h.initials IS NOT NULL) THEN h.initials ELSE 'Unknown person'::character varying END, j.name, c.s_team, c.file_loc - Hash Join (cost=848.87..1067.53 rows=524 width=114) (actual time=171.000..484.000 rows=1103 loops=1) Hash Cond: ("outer".company_id = "inner".company_id) - Nested Loop (cost=805.21..1005.53 rows=524 width=122) (actual time=156.000..314.000 rows=1103 loops=1) Join Filter: (("inner".person_id = "outer".person_id) OR ("position"(("inner".s_team)::text, ("outer".initials)::text) 0)) - Seq Scan on person g (cost=0.00..1.34 rows=1 width=11) (actual time=0.000..0.000 rows=1 loops=1) Filter: (person_id = 1) - Hash Join (cost=805.21..988.81 rows=879 width=122) (actual time=156.000..251.000 rows=5484 loops=1) Hash Cond: ("outer".company_id = "inner".company_id) - Hash Join (cost=761.55..931.96 rows=879 width=103) (actual time=156.000..188.000 rows=5484 loops=1) Hash Cond: ("outer".company_id = "inner".company_id) - Seq Scan on call_list f (cost=0.00..78.46 rows=4746 width=8) (actual time=0.000..16.000 rows=4752 loops=1) - Hash (cost=761.11..761.11 rows=176 width=95) (actual time=156.000..156.000 rows=0 loops=1) - Hash Join (cost=505.90..761.11 rows=176 width=95) (actual time=94.000..140.000 rows=1079 loops=1) Hash Cond: ("outer".job_id = "inner".job_id) Join Filter: ("outer".won_heat OR "outer".won_vent OR "outer".won_tc OR ("inner".heat AND "outer".bid_heat AND ("outer".won_heat IS NULL)) OR ("inner".vent AND "outer".bid_vent AND ("outer".won_vent IS NULL)) OR ("inner".tc AND "outer".bid_tc AND ("outer".won_tc IS NULL))) - Seq Scan on builder_list d (cost=0.00..212.26 rows=7687 width=14) (actual time=0.000..16.000 rows=7758 loops=1) Filter: (role = 'C'::bpchar) - Hash (cost=505.55..505.55 rows=138 width=102) (actual time=94.000..94.000 rows=0 loops=1) - Hash Join (cost=268.41..505.55 rows=138 width=102) (actual time=47.000..94.000 rows=443 loops=1) Hash Cond: ("outer".job_id = "inner".job_id) - Seq Scan on builder_list i (cost=0.00..212.26 rows=2335 width=8) (actual time=0.000..47.000 rows=2245 loops=1) Filter: (role = 'E'::bpchar) - Hash (cost=268.06..268.06 rows=139 width=94) (actual time=47.000..47.000 rows=0 loops=1) - Hash Join (cost=156.15..268.06 rows=139 width=94) (actual time=31.000..47.000 rows=451 loops=1) Hash Cond: ("outer".file_loc = "inner".person_id) - Hash Join (cost=154.81..264.51 rows=166 width=87) (actual time=31.000..47.000 rows=694 loops=1) Hash Cond: ("outer".job_id = "inner".job_id) - Seq Scan on job c (cost=0.00..78.57 rows=2357 width=79) (actual time=0.000..0.000 rows=2302 loops=1) - Hash (cost=154.40..154.40 rows=166 width=8) (actual time=31.000..31.000 rows=0 loops=1) - Hash Join (cost=1.18..154.40 rows=166 width=8) (actual time=0.000..31.000 rows=694 loops=1) Hash Cond: ("outer".status_id = "inner".status_id) - Seq Scan on status_list b (cost=0.00..139.96 rows=2320 width=12) (actual time=0.000..15.000 rows=2302 loops=1) Filter: active - Hash (cost=1.18..1.18 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=1) - Seq Scan on status a (cost=0.00..1.18 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1) Filter: ((name)::text = 'Awaiting Award'::text) - Hash (cost=1.27..1.27 rows=27 width=11) (actual time=0.000..0.000 rows=0 loops=1) - Seq Scan on person h (cost=0.00..1.27 rows=27 width=11) (actual time=0.000..0.000 rows=27 loops=1) - Hash (cost=40.53..40.53 rows=1253 width=27) (actual time=0.000..0.000 rows=0 loops=1) - Seq Scan on company j (cost=0.00..40.53 rows=1253 width=27) (actual time=0.000..0.000 rows=1254 loops=1) - Hash (cost=40.53..40.53 rows=1253 width=4) (actual time=15.000..15.000 rows=0 loops=1) - Seq Scan on company e (cost=0.00..40.53 rows=1253 width=4) (actual time=0.000..0.000 rows=1254 loops=1) Total runtime: 499.000 ms As you can see, it's almost all hash joins and sequential scans. I tried explain analyze again after setting enable_hashjoin=false and the planner started using merge joins. So, I set enable_mergejoin=false and ran it again. Here is the resulting explain analyze: QUERY PLAN Subquery Scan view_get_all_user_award2 (cost=9525.65..9550.54 rows=524 width=493) (actual time=531.000..547.000 rows=368 loops=1) - Unique (cost=9525.65..9545.30 rows=524 width=114) (actual time=531.000..547.000 rows=368 loops=1) - Sort (cost=9525.65..9526.96 rows=524 width=114) (actual time=531.000..531.000 rows=1103 loops=1) Sort Key: c.job_id, g.person_id, c.job_no, b.deadline, c.name, bid_date(c.job_id), c.miscq, c.city, COALESCE(((c.city)::text || COALESCE((', '::text || (c.st)::text), ''::text)), (COALESCE(c.st, ''::character varying))::text), c.st, CASE WHEN (c.file_loc = 0) THEN 'No Bid'::character varying WHEN (c.file_loc = -1) THEN 'Bid Board'::character varying WHEN (c.file_loc = -2) THEN 'Lost Job'::character varying WHEN (c.file_loc = -3) THEN 'See Job Notes'::character varying WHEN ((c.file_loc < -3) OR (c.file_loc IS NULL)) THEN ''::character varying WHEN (h.initials IS NOT NULL) THEN h.initials ELSE 'Unknown person'::character varying END, j.name, c.s_team, c.file_loc - Nested Loop (cost=1.30..9501.98 rows=524 width=114) (actual time=0.000..500.000 rows=1103 loops=1) Join Filter: (("inner".person_id = "outer".person_id) OR ("position"(("inner".s_team)::text, ("outer".initials)::text) 0)) - Seq Scan on person g (cost=0.00..1.34 rows=1 width=11) (actual time=0.000..0.000 rows=1 loops=1) Filter: (person_id = 1) - Nested Loop (cost=1.30..9464.56 rows=1463 width=114) (actual time=0.000..360.000 rows=5484 loops=1) - Nested Loop (cost=1.30..5350.20 rows=174 width=118) (actual time=0.000..268.000 rows=1079 loops=1) - Nested Loop (cost=1.30..4399.46 rows=174 width=114) (actual time=0.000..237.000 rows=1079 loops=1) Join Filter: ("inner".won_heat OR "inner".won_vent OR "inner".won_tc OR ("outer".heat AND "inner".bid_heat AND ("inner".won_heat IS NULL)) OR ("outer".vent AND "inner".bid_vent AND ("inner".won_vent IS NULL)) OR ("outer".tc AND "inner".bid_tc AND ("inner".won_tc IS NULL))) - Nested Loop (cost=1.30..3207.87 rows=138 width=121) (actual time=0.000..221.000 rows=443 loops=1) - Nested Loop (cost=1.30..2453.84 rows=138 width=102) (actual time=0.000..221.000 rows=443 loops=1) - Nested Loop (cost=1.30..1258.83 rows=139 width=94) (actual time=0.000..189.000 rows=451 loops=1) Join Filter: ("outer".file_loc = "inner".person_id) - Nested Loop (cost=0.00..1156.69 rows=166 width=87) (actual time=0.000..31.000 rows=694 loops=1) - Nested Loop (cost=0.00..170.14 rows=166 width=8) (actual time=0.000..0.000 rows=694 loops=1) Join Filter: ("outer".status_id = "inner".status_id) - Seq Scan on status a (cost=0.00..1.18 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1) Filter: ((name)::text = 'Awaiting Award'::text) - Seq Scan on status_list b (cost=0.00..139.96 rows=2320 width=12) (actual time=0.000..0.000 rows=2302 loops=1) Filter: active - Index Scan using job_pkey on job c (cost=0.00..5.93 rows=1 width=79) (actual time=0.000..0.023 rows=1 loops=694) Index Cond: (c.job_id = "outer".job_id) - Materialize (cost=1.30..1.57 rows=27 width=11) (actual time=0.000..0.069 rows=27 loops=694) - Seq Scan on person h (cost=0.00..1.27 rows=27 width=11) (actual time=0.000..0.000 rows=27 loops=1) - Index Scan using idx_builder_list_job_id on builder_list i (cost=0.00..8.57 rows=2 width=8) (actual time=0.000..0.000 rows=1 loops=451) Index Cond: (i.job_id = "outer".job_id) Filter: (role = 'E'::bpchar) - Index Scan using company_pkey on company j (cost=0.00..5.45 rows=1 width=27) (actual time=0.000..0.000 rows=1 loops=443) Index Cond: ("outer".company_id = j.company_id) - Index Scan using idx_builder_list_job_id on builder_list d (cost=0.00..8.57 rows=5 width=14) (actual time=0.000..0.036 rows=3 loops=443) Index Cond: (d.job_id = "outer".job_id) Filter: (role = 'C'::bpchar) - Index Scan using company_pkey on company e (cost=0.00..5.45 rows=1 width=4) (actual time=0.029..0.029 rows=1 loops=1079) Index Cond: ("outer".company_id = e.company_id) - Index Scan using idx_company_id_call_list on call_list f (cost=0.00..23.57 rows=6 width=8) (actual time=0.014..0.057 rows=5 loops=1079) Index Cond: (f.company_id = "outer".company_id) Total runtime: 547.000 ms The total run times turn out to be anecdotally insignificant (due to variations from one run to the next). I haven't had a chance to quantify the variations, I plan on doing that soon. However, now the planner is choosing index scans for almost all the tables. Granted that the planner seems to have chosen a different plan, but if I compare the estimated costs and actual time for the index scans vs. the sequential scans, it looks like the planner should be choosing index scans, but it isn't. So, it would seem like my optimal plan should have hash joins with index scans. How do I get the planner to the same conclusion? Should the join method influence the scan method? These seem like they should be unrelated to me. Of course, all of this might be moot if my lack of knowledge is showing... Any thoughts?? Mark
pgsql-performance by date: