Re: bad performances using hashjoin - Mailing list pgsql-performance

From Gaetano Mendola
Subject Re: bad performances using hashjoin
Date
Msg-id 42192F0F.30404@bigfoot.com
Whole thread Raw
In response to Re: bad performances using hashjoin  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Tom Lane wrote:
> Gaetano Mendola <mendola@bigfoot.com> writes:
>
>>If you need other info in order to improve the planner,
>
>
> ... like, say, the PG version you are using, or the definitions of the
> views involved?  It's difficult to say much of anything about this.

That is true, sorry I forgot it :-(
The engine is a 7.4.5 and these are the views definitions:

sat_request is just a table

CREATE OR REPLACE VIEW v_sc_packages AS
  SELECT *
  FROM
      v_programs           vpr,
      v_packages           vpk,
      v_sequences          vs
  WHERE
     ------------ JOIN -------------
      vpr.id_program = vs.id_program AND
      vpk.id_package = vs.id_package AND
     -------------------------------
      vs.estimated_start IS NOT NULL
  ORDER BY vs.estimated_start;


CREATE OR REPLACE VIEW v_programs AS
  SELECT *
  FROM programs
  WHERE id_program<>0
  ORDER BY id_publisher, id_program
;



CREATE OR REPLACE VIEW v_packages AS
  SELECT *
  FROM packages p LEFT OUTER JOIN package_security ps USING (id_package)
  ORDER BY p.id_publisher, p.name
;

CREATE OR REPLACE VIEW v_sequences AS
  SELECT id_package        AS id_package,
         id_program        AS id_program,
         internal_position AS internal_position,
         estimated_start   AS estimated_start
  FROM sequences
  ORDER BY id_program, internal_position
;


> However: the reason the second plan wins is because there are zero rows
> fetched from sat_request, and so the bulk of the plan is never executed
> at all.  I doubt the second plan would win if there were any matching
> sat_request rows.  If this is the case you actually need to optimize,
> probably the thing to do is to get rid of the ORDER BY clauses you
> evidently have in your views, so that there's some chance of building
> a fast-start plan.

Removed all order by from that views, this is the comparison between the two
(orderdered and not ordered):

empdb=# explain analyze SELECT id_sat_request
empdb-#    FROM sat_request sr,
empdb-#         v_sc_packages vs
empdb-#    WHERE    ----- JOIN ----
empdb-#          sr.id_package = vs.id_package AND
empdb-#             ---------------
empdb-#          id_user = 29416 AND
empdb-#          id_url  =  424364 AND
empdb-#          vs.estimated_start > now() AND
empdb-#          id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled');
                                                                            QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=280.98..284.74 rows=1 width=4) (actual time=895.344..895.344 rows=0 loops=1)
   Hash Cond: ("outer".id_package = "inner".id_package)
   ->  Subquery Scan vs  (cost=277.94..280.19 rows=301 width=4) (actual time=893.191..894.396 rows=569 loops=1)
         ->  Sort  (cost=277.94..278.69 rows=301 width=263) (actual time=893.184..893.546 rows=569 loops=1)
               Sort Key: vs.estimated_start
               ->  Hash Join  (cost=232.61..265.54 rows=301 width=263) (actual time=868.980..889.643 rows=569 loops=1)
                     Hash Cond: ("outer".id_package = "inner".id_package)
                     ->  Subquery Scan vpk  (cost=150.29..159.26 rows=1196 width=218) (actual time=822.281..834.063
rows=1203loops=1) 
                           ->  Sort  (cost=150.29..153.28 rows=1196 width=159) (actual time=822.266..823.190 rows=1203
loops=1)
                                 Sort Key: p.id_publisher, p.name
                                 ->  Hash Left Join  (cost=16.14..89.16 rows=1196 width=159) (actual
time=3.504..809.262rows=1203 loops=1) 
                                       Hash Cond: ("outer".id_package = "inner".id_package)
                                       ->  Seq Scan on packages p  (cost=0.00..53.98 rows=1196 width=143) (actual
time=0.018..13.869rows=1203 loops=1) 
                                       ->  Hash  (cost=14.09..14.09 rows=818 width=20) (actual time=2.395..2.395 rows=0
loops=1)
                                             ->  Seq Scan on package_security ps  (cost=0.00..14.09 rows=818 width=20)
(actualtime=0.020..1.520 rows=845 loops=1) 
                     ->  Hash  (cost=82.19..82.19 rows=51 width=49) (actual time=46.402..46.402 rows=0 loops=1)
                           ->  Merge Join  (cost=79.54..82.19 rows=51 width=49) (actual time=39.435..45.376 rows=569
loops=1)
                                 Merge Cond: ("outer".id_program = "inner".id_program)
                                 ->  Subquery Scan vs  (cost=70.98..72.59 rows=214 width=16) (actual
time=16.834..19.102rows=569 loops=1) 
                                       ->  Sort  (cost=70.98..71.52 rows=214 width=20) (actual time=16.824..17.338
rows=569loops=1) 
                                             Sort Key: sequences.id_program, sequences.internal_position
                                             ->  Seq Scan on sequences  (cost=0.00..62.70 rows=214 width=20) (actual
time=0.638..11.969rows=569 loops=1) 
                                                   Filter: ((estimated_start IS NOT NULL) AND
(date_trunc('seconds'::text,estimated_start) > now())) 
                                 ->  Sort  (cost=8.56..8.68 rows=47 width=37) (actual time=22.580..23.123 rows=605
loops=1)
                                       Sort Key: vpr.id_program
                                       ->  Subquery Scan vpr  (cost=6.90..7.25 rows=47 width=37) (actual
time=22.294..22.464rows=48 loops=1) 
                                             ->  Sort  (cost=6.90..7.02 rows=47 width=61) (actual time=22.287..22.332
rows=48loops=1) 
                                                   Sort Key: programs.id_publisher, programs.id_program
                                                   ->  Seq Scan on programs  (cost=0.00..5.60 rows=47 width=61) (actual
time=4.356..22.068rows=48 loops=1) 
                                                         Filter: (id_program <> 0)
   ->  Hash  (cost=3.04..3.04 rows=1 width=8) (actual time=0.033..0.033 rows=0 loops=1)
         ->  Index Scan using idx_id_url_sat_request on sat_request sr  (cost=0.00..3.04 rows=1 width=8) (actual
time=0.031..0.031rows=0 loops=1) 
               Index Cond: (id_url = 424364)
               Filter: ((id_user = 29416) AND (id_sat_request_status = 1))
 Total runtime: 897.044 ms
(35 rows)



empdb=# explain analyze SELECT id_sat_request
empdb-#    FROM sat_request sr,
empdb-#         v_sc_packages_new vs
empdb-#    WHERE    ----- JOIN ----
empdb-#          sr.id_package = vs.id_package AND
empdb-#             ---------------
empdb-#          id_user = 29416 AND
empdb-#          id_url  =  424364 AND
empdb-#          vs.estimated_start > now() AND
empdb-#          id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled');
                                                                          QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=19.18..96.87 rows=1 width=4) (actual time=15.576..15.576 rows=0 loops=1)
   ->  Nested Loop  (cost=19.18..93.04 rows=1 width=8) (actual time=15.569..15.569 rows=0 loops=1)
         ->  Hash Join  (cost=19.18..89.21 rows=1 width=12) (actual time=15.566..15.566 rows=0 loops=1)
               Hash Cond: ("outer".id_package = "inner".id_package)
               ->  Hash Left Join  (cost=16.14..80.19 rows=1196 width=4) (actual time=7.291..13.620 rows=1203 loops=1)
                     Hash Cond: ("outer".id_package = "inner".id_package)
                     ->  Seq Scan on packages p  (cost=0.00..53.98 rows=1196 width=4) (actual time=0.028..2.694
rows=1203loops=1) 
                     ->  Hash  (cost=14.09..14.09 rows=818 width=4) (actual time=6.707..6.707 rows=0 loops=1)
                           ->  Seq Scan on package_security ps  (cost=0.00..14.09 rows=818 width=4) (actual
time=0.026..4.620rows=845 loops=1) 
               ->  Hash  (cost=3.04..3.04 rows=1 width=8) (actual time=0.061..0.061 rows=0 loops=1)
                     ->  Index Scan using idx_id_url_sat_request on sat_request sr  (cost=0.00..3.04 rows=1 width=8)
(actualtime=0.056..0.056 rows=0 loops=1) 
                           Index Cond: (id_url = 424364)
                           Filter: ((id_user = 29416) AND (id_sat_request_status = 1))
         ->  Index Scan using idx_sequences_id_package on sequences  (cost=0.00..3.82 rows=1 width=8) (never executed)
               Index Cond: ("outer".id_package = sequences.id_package)
               Filter: ((estimated_start IS NOT NULL) AND (date_trunc('seconds'::text, estimated_start) > now()))
   ->  Index Scan using programs_pkey on programs  (cost=0.00..3.83 rows=1 width=4) (never executed)
         Index Cond: (programs.id_program = "outer".id_program)
         Filter: (id_program <> 0)
 Total runtime: 16.279 ms
(20 rows)




The second one of course is faster, this is the second select with hashjoin disabled:

empdb=# set enable_hashjoin = false;
SET
empdb=# explain analyze SELECT id_sat_request
empdb-#    FROM sat_request sr,
empdb-#         v_sc_packages_new vs
empdb-#    WHERE    ----- JOIN ----
empdb-#          sr.id_package = vs.id_package AND
empdb-#             ---------------
empdb-#          id_user = 29416 AND
empdb-#          id_url  =  424364 AND
empdb-#          vs.estimated_start > now() AND
empdb-#          id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled');
                                                                             QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=10.62..175.83 rows=1 width=4) (actual time=0.280..0.280 rows=0 loops=1)
   Merge Cond: ("outer".id_package = "inner".id_package)
   ->  Merge Left Join  (cost=0.00..162.21 rows=1196 width=4) (actual time=0.188..0.188 rows=1 loops=1)
         Merge Cond: ("outer".id_package = "inner".id_package)
         ->  Index Scan using packages_pkey on packages p  (cost=0.00..115.51 rows=1196 width=4) (actual
time=0.085..0.085rows=1 loops=1) 
         ->  Index Scan using package_security_id_package_key on package_security ps  (cost=0.00..39.06 rows=818
width=4)(actual time=0.080..0.080 rows=1 loops=1) 
   ->  Sort  (cost=10.62..10.62 rows=1 width=12) (actual time=0.087..0.087 rows=0 loops=1)
         Sort Key: sr.id_package
         ->  Nested Loop  (cost=0.00..10.61 rows=1 width=12) (actual time=0.069..0.069 rows=0 loops=1)
               ->  Nested Loop  (cost=0.00..6.77 rows=1 width=16) (actual time=0.067..0.067 rows=0 loops=1)
                     ->  Index Scan using idx_id_url_sat_request on sat_request sr  (cost=0.00..3.04 rows=1 width=8)
(actualtime=0.065..0.065 rows=0 loops=1) 
                           Index Cond: (id_url = 424364)
                           Filter: ((id_user = 29416) AND (id_sat_request_status = 1))
                     ->  Index Scan using idx_sequences_id_package on sequences  (cost=0.00..3.72 rows=1 width=8)
(neverexecuted) 
                           Index Cond: ("outer".id_package = sequences.id_package)
                           Filter: ((estimated_start IS NOT NULL) AND (date_trunc('seconds'::text, estimated_start) >
now()))
               ->  Index Scan using programs_pkey on programs  (cost=0.00..3.83 rows=1 width=4) (never executed)
                     Index Cond: (programs.id_program = "outer".id_program)
                     Filter: (id_program <> 0)
 Total runtime: 0.604 ms
(20 rows)

I see the problem is still here:
Hash Left Join  (cost=16.14..80.19 rows=1196 width=4) (actual time=7.291..13.620 rows=1203 loops=1)

BTW, at this time the executions time seen are lower because right now is not a peak hour

> BTW, I believe in 8.0 the first plan would be about as fast as the
> second, because we added some code to hash join to fall out without
> scanning the left input if the right input is empty.

I'll take it a try if you are really interested in the results.

Regards
Gaetano Mendola











pgsql-performance by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: Problem with 7.4.5 and webmin 1.8 in grant function
Next
From: Gaetano Mendola
Date:
Subject: Re: Effects of IDLE processes