Re: bad performances using hashjoin - Mailing list pgsql-performance
From | Gaetano Mendola |
---|---|
Subject | Re: bad performances using hashjoin |
Date | |
Msg-id | 42192F0F.30404@bigfoot.com Whole thread Raw |
In response to | Re: bad performances using hashjoin (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-performance |
Tom Lane wrote: > Gaetano Mendola <mendola@bigfoot.com> writes: > >>If you need other info in order to improve the planner, > > > ... like, say, the PG version you are using, or the definitions of the > views involved? It's difficult to say much of anything about this. That is true, sorry I forgot it :-( The engine is a 7.4.5 and these are the views definitions: sat_request is just a table CREATE OR REPLACE VIEW v_sc_packages AS SELECT * FROM v_programs vpr, v_packages vpk, v_sequences vs WHERE ------------ JOIN ------------- vpr.id_program = vs.id_program AND vpk.id_package = vs.id_package AND ------------------------------- vs.estimated_start IS NOT NULL ORDER BY vs.estimated_start; CREATE OR REPLACE VIEW v_programs AS SELECT * FROM programs WHERE id_program<>0 ORDER BY id_publisher, id_program ; CREATE OR REPLACE VIEW v_packages AS SELECT * FROM packages p LEFT OUTER JOIN package_security ps USING (id_package) ORDER BY p.id_publisher, p.name ; CREATE OR REPLACE VIEW v_sequences AS SELECT id_package AS id_package, id_program AS id_program, internal_position AS internal_position, estimated_start AS estimated_start FROM sequences ORDER BY id_program, internal_position ; > However: the reason the second plan wins is because there are zero rows > fetched from sat_request, and so the bulk of the plan is never executed > at all. I doubt the second plan would win if there were any matching > sat_request rows. If this is the case you actually need to optimize, > probably the thing to do is to get rid of the ORDER BY clauses you > evidently have in your views, so that there's some chance of building > a fast-start plan. Removed all order by from that views, this is the comparison between the two (orderdered and not ordered): empdb=# explain analyze SELECT id_sat_request empdb-# FROM sat_request sr, empdb-# v_sc_packages vs empdb-# WHERE ----- JOIN ---- empdb-# sr.id_package = vs.id_package AND empdb-# --------------- empdb-# id_user = 29416 AND empdb-# id_url = 424364 AND empdb-# vs.estimated_start > now() AND empdb-# id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=280.98..284.74 rows=1 width=4) (actual time=895.344..895.344 rows=0 loops=1) Hash Cond: ("outer".id_package = "inner".id_package) -> Subquery Scan vs (cost=277.94..280.19 rows=301 width=4) (actual time=893.191..894.396 rows=569 loops=1) -> Sort (cost=277.94..278.69 rows=301 width=263) (actual time=893.184..893.546 rows=569 loops=1) Sort Key: vs.estimated_start -> Hash Join (cost=232.61..265.54 rows=301 width=263) (actual time=868.980..889.643 rows=569 loops=1) Hash Cond: ("outer".id_package = "inner".id_package) -> Subquery Scan vpk (cost=150.29..159.26 rows=1196 width=218) (actual time=822.281..834.063 rows=1203loops=1) -> Sort (cost=150.29..153.28 rows=1196 width=159) (actual time=822.266..823.190 rows=1203 loops=1) Sort Key: p.id_publisher, p.name -> Hash Left Join (cost=16.14..89.16 rows=1196 width=159) (actual time=3.504..809.262rows=1203 loops=1) Hash Cond: ("outer".id_package = "inner".id_package) -> Seq Scan on packages p (cost=0.00..53.98 rows=1196 width=143) (actual time=0.018..13.869rows=1203 loops=1) -> Hash (cost=14.09..14.09 rows=818 width=20) (actual time=2.395..2.395 rows=0 loops=1) -> Seq Scan on package_security ps (cost=0.00..14.09 rows=818 width=20) (actualtime=0.020..1.520 rows=845 loops=1) -> Hash (cost=82.19..82.19 rows=51 width=49) (actual time=46.402..46.402 rows=0 loops=1) -> Merge Join (cost=79.54..82.19 rows=51 width=49) (actual time=39.435..45.376 rows=569 loops=1) Merge Cond: ("outer".id_program = "inner".id_program) -> Subquery Scan vs (cost=70.98..72.59 rows=214 width=16) (actual time=16.834..19.102rows=569 loops=1) -> Sort (cost=70.98..71.52 rows=214 width=20) (actual time=16.824..17.338 rows=569loops=1) Sort Key: sequences.id_program, sequences.internal_position -> Seq Scan on sequences (cost=0.00..62.70 rows=214 width=20) (actual time=0.638..11.969rows=569 loops=1) Filter: ((estimated_start IS NOT NULL) AND (date_trunc('seconds'::text,estimated_start) > now())) -> Sort (cost=8.56..8.68 rows=47 width=37) (actual time=22.580..23.123 rows=605 loops=1) Sort Key: vpr.id_program -> Subquery Scan vpr (cost=6.90..7.25 rows=47 width=37) (actual time=22.294..22.464rows=48 loops=1) -> Sort (cost=6.90..7.02 rows=47 width=61) (actual time=22.287..22.332 rows=48loops=1) Sort Key: programs.id_publisher, programs.id_program -> Seq Scan on programs (cost=0.00..5.60 rows=47 width=61) (actual time=4.356..22.068rows=48 loops=1) Filter: (id_program <> 0) -> Hash (cost=3.04..3.04 rows=1 width=8) (actual time=0.033..0.033 rows=0 loops=1) -> Index Scan using idx_id_url_sat_request on sat_request sr (cost=0.00..3.04 rows=1 width=8) (actual time=0.031..0.031rows=0 loops=1) Index Cond: (id_url = 424364) Filter: ((id_user = 29416) AND (id_sat_request_status = 1)) Total runtime: 897.044 ms (35 rows) empdb=# explain analyze SELECT id_sat_request empdb-# FROM sat_request sr, empdb-# v_sc_packages_new vs empdb-# WHERE ----- JOIN ---- empdb-# sr.id_package = vs.id_package AND empdb-# --------------- empdb-# id_user = 29416 AND empdb-# id_url = 424364 AND empdb-# vs.estimated_start > now() AND empdb-# id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled'); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=19.18..96.87 rows=1 width=4) (actual time=15.576..15.576 rows=0 loops=1) -> Nested Loop (cost=19.18..93.04 rows=1 width=8) (actual time=15.569..15.569 rows=0 loops=1) -> Hash Join (cost=19.18..89.21 rows=1 width=12) (actual time=15.566..15.566 rows=0 loops=1) Hash Cond: ("outer".id_package = "inner".id_package) -> Hash Left Join (cost=16.14..80.19 rows=1196 width=4) (actual time=7.291..13.620 rows=1203 loops=1) Hash Cond: ("outer".id_package = "inner".id_package) -> Seq Scan on packages p (cost=0.00..53.98 rows=1196 width=4) (actual time=0.028..2.694 rows=1203loops=1) -> Hash (cost=14.09..14.09 rows=818 width=4) (actual time=6.707..6.707 rows=0 loops=1) -> Seq Scan on package_security ps (cost=0.00..14.09 rows=818 width=4) (actual time=0.026..4.620rows=845 loops=1) -> Hash (cost=3.04..3.04 rows=1 width=8) (actual time=0.061..0.061 rows=0 loops=1) -> Index Scan using idx_id_url_sat_request on sat_request sr (cost=0.00..3.04 rows=1 width=8) (actualtime=0.056..0.056 rows=0 loops=1) Index Cond: (id_url = 424364) Filter: ((id_user = 29416) AND (id_sat_request_status = 1)) -> Index Scan using idx_sequences_id_package on sequences (cost=0.00..3.82 rows=1 width=8) (never executed) Index Cond: ("outer".id_package = sequences.id_package) Filter: ((estimated_start IS NOT NULL) AND (date_trunc('seconds'::text, estimated_start) > now())) -> Index Scan using programs_pkey on programs (cost=0.00..3.83 rows=1 width=4) (never executed) Index Cond: (programs.id_program = "outer".id_program) Filter: (id_program <> 0) Total runtime: 16.279 ms (20 rows) The second one of course is faster, this is the second select with hashjoin disabled: empdb=# set enable_hashjoin = false; SET empdb=# explain analyze SELECT id_sat_request empdb-# FROM sat_request sr, empdb-# v_sc_packages_new vs empdb-# WHERE ----- JOIN ---- empdb-# sr.id_package = vs.id_package AND empdb-# --------------- empdb-# id_user = 29416 AND empdb-# id_url = 424364 AND empdb-# vs.estimated_start > now() AND empdb-# id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled'); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=10.62..175.83 rows=1 width=4) (actual time=0.280..0.280 rows=0 loops=1) Merge Cond: ("outer".id_package = "inner".id_package) -> Merge Left Join (cost=0.00..162.21 rows=1196 width=4) (actual time=0.188..0.188 rows=1 loops=1) Merge Cond: ("outer".id_package = "inner".id_package) -> Index Scan using packages_pkey on packages p (cost=0.00..115.51 rows=1196 width=4) (actual time=0.085..0.085rows=1 loops=1) -> Index Scan using package_security_id_package_key on package_security ps (cost=0.00..39.06 rows=818 width=4)(actual time=0.080..0.080 rows=1 loops=1) -> Sort (cost=10.62..10.62 rows=1 width=12) (actual time=0.087..0.087 rows=0 loops=1) Sort Key: sr.id_package -> Nested Loop (cost=0.00..10.61 rows=1 width=12) (actual time=0.069..0.069 rows=0 loops=1) -> Nested Loop (cost=0.00..6.77 rows=1 width=16) (actual time=0.067..0.067 rows=0 loops=1) -> Index Scan using idx_id_url_sat_request on sat_request sr (cost=0.00..3.04 rows=1 width=8) (actualtime=0.065..0.065 rows=0 loops=1) Index Cond: (id_url = 424364) Filter: ((id_user = 29416) AND (id_sat_request_status = 1)) -> Index Scan using idx_sequences_id_package on sequences (cost=0.00..3.72 rows=1 width=8) (neverexecuted) Index Cond: ("outer".id_package = sequences.id_package) Filter: ((estimated_start IS NOT NULL) AND (date_trunc('seconds'::text, estimated_start) > now())) -> Index Scan using programs_pkey on programs (cost=0.00..3.83 rows=1 width=4) (never executed) Index Cond: (programs.id_program = "outer".id_program) Filter: (id_program <> 0) Total runtime: 0.604 ms (20 rows) I see the problem is still here: Hash Left Join (cost=16.14..80.19 rows=1196 width=4) (actual time=7.291..13.620 rows=1203 loops=1) BTW, at this time the executions time seen are lower because right now is not a peak hour > BTW, I believe in 8.0 the first plan would be about as fast as the > second, because we added some code to hash join to fall out without > scanning the left input if the right input is empty. I'll take it a try if you are really interested in the results. Regards Gaetano Mendola
pgsql-performance by date: