Re: hashjoin chosen over 1000x faster plan - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: hashjoin chosen over 1000x faster plan
Date
Msg-id 470CEBB4.EE98.0025.0@wicourts.gov
Whole thread Raw
In response to Re: hashjoin chosen over 1000x faster plan  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-performance
>>> On Wed, Oct 10, 2007 at  2:52 PM, in message
<1192045945.4233.351.camel@ebony.site>, Simon Riggs <simon@2ndquadrant.com>
wrote:
>
> The fast plan is an all-or-nothing plan. It is *only* faster when the
> number of matched rows is zero. You know it is zero, but currently the
> planner doesn't, nor is it able to make use of the information when it
> has it, half thru execution. Even if we could work out the high
> probability of it being zero, we would still be left with the decision
> of whether to optimise for the zero or for the non-zero.

For a different case number which has four charges, two reopened:

 Sort  (cost=2450.27..2450.28 rows=4 width=146) (actual time=463.048..463.052 rows=4 loops=1)
   Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo"
   ->  Hash Left Join  (cost=2318.93..2450.23 rows=4 width=146) (actual time=462.857..462.995 rows=4 loops=1)
         Hash Cond: ((("CHST"."eventType")::bpchar = ("CTHE"."eventType")::bpchar) AND (("CHST"."caseType")::bpchar =
("CTHE"."caseType")::bpchar))
         ->  Nested Loop Left Join  (cost=0.00..115.67 rows=4 width=131) (actual time=0.045..0.165 rows=4 loops=1)
               ->  Index Scan using "Charge_pkey" on "Charge" "CH"  (cost=0.00..10.69 rows=4 width=112) (actual
time=0.036..0.053rows=4 loops=1) 
                     Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar = '2004CF002575'::bpchar))
               ->  Index Scan using "CaseHist_pkey" on "CaseHist" "CHST"  (cost=0.00..26.18 rows=5 width=41) (actual
time=0.018..0.019rows=0 loops=4) 
                     Index Cond: ((("CHST"."countyNo")::smallint = 13) AND (("CHST"."caseNo")::bpchar =
'2004CF002575'::bpchar)AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint)) 
         ->  Hash  (cost=2084.82..2084.82 rows=15607 width=98) (actual time=462.780..462.780 rows=15607 loops=1)
               ->  Subquery Scan "CTHE"  (cost=1630.43..2084.82 rows=15607 width=98) (actual time=355.962..433.081
rows=15607loops=1) 
                     ->  Merge Right Join  (cost=1630.43..1928.75 rows=15607 width=89) (actual time=355.960..414.249
rows=15607loops=1) 
                           Merge Cond: (((d."countyNo")::smallint = "inner"."?column9?") AND ((d."caseType")::bpchar =
"inner"."?column10?")AND ((d."eventType")::bpchar = "inner"."?column11?")) 
                           ->  Index Scan using "CaseTypeHistEventD_pkey" on "CaseTypeHistEventD" d  (cost=0.00..87.77
rows=2051width=21) (actual time=0.025..0.713 rows=434 loops=1) 
                           ->  Sort  (cost=1630.43..1669.45 rows=15607 width=76) (actual time=355.320..365.251
rows=15607loops=1) 
                                 Sort Key: (c."countyNo")::smallint, (b."caseType")::bpchar, (b."eventType")::bpchar
                                 ->  Nested Loop  (cost=0.00..543.41 rows=15607 width=76) (actual time=0.035..46.914
rows=15607loops=1) 
                                       ->  Index Scan using "ControlRecord_pkey" on "ControlRecord" c  (cost=0.00..4.27
rows=1width=2) (actual time=0.010..0.019 rows=1 loops=1) 
                                             Index Cond: (("countyNo")::smallint = 13)
                                       ->  Seq Scan on "CaseTypeHistEventB" b  (cost=0.00..383.07 rows=15607 width=74)
(actualtime=0.019..14.069 rows=15607 loops=1) 
 Total runtime: 464.588 ms
(21 rows)

With set enable_hashjoin = off:

 Sort  (cost=3404.68..3404.69 rows=4 width=146) (actual time=448.049..448.053 rows=4 loops=1)
   Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo"
   ->  Merge Left Join  (cost=3287.55..3404.64 rows=4 width=146) (actual time=447.986..448.005 rows=4 loops=1)
         Merge Cond: (("outer"."?column16?" = "inner"."?column5?") AND ("outer"."?column17?" = "inner"."?column6?"))
         ->  Sort  (cost=115.71..115.72 rows=4 width=131) (actual time=0.179..0.182 rows=4 loops=1)
               Sort Key: ("CHST"."caseType")::bpchar, ("CHST"."eventType")::bpchar
               ->  Nested Loop Left Join  (cost=0.00..115.67 rows=4 width=131) (actual time=0.051..0.139 rows=4
loops=1)
                     ->  Index Scan using "Charge_pkey" on "Charge" "CH"  (cost=0.00..10.69 rows=4 width=112) (actual
time=0.040..0.053rows=4 loops=1) 
                           Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar =
'2004CF002575'::bpchar))
                     ->  Index Scan using "CaseHist_pkey" on "CaseHist" "CHST"  (cost=0.00..26.18 rows=5 width=41)
(actualtime=0.013..0.014 rows=0 loops=4) 
                           Index Cond: ((("CHST"."countyNo")::smallint = 13) AND (("CHST"."caseNo")::bpchar =
'2004CF002575'::bpchar)AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint)) 
         ->  Sort  (cost=3171.84..3210.86 rows=15607 width=98) (actual time=446.459..446.936 rows=768 loops=1)
               Sort Key: ("CTHE"."caseType")::bpchar, ("CTHE"."eventType")::bpchar
               ->  Subquery Scan "CTHE"  (cost=1630.43..2084.82 rows=15607 width=98) (actual time=322.928..405.654
rows=15607loops=1) 
                     ->  Merge Right Join  (cost=1630.43..1928.75 rows=15607 width=89) (actual time=322.922..381.371
rows=15607loops=1) 
                           Merge Cond: (((d."countyNo")::smallint = "inner"."?column9?") AND ((d."caseType")::bpchar =
"inner"."?column10?")AND ((d."eventType")::bpchar = "inner"."?column11?")) 
                           ->  Index Scan using "CaseTypeHistEventD_pkey" on "CaseTypeHistEventD" d  (cost=0.00..87.77
rows=2051width=21) (actual time=0.024..0.734 rows=434 loops=1) 
                           ->  Sort  (cost=1630.43..1669.45 rows=15607 width=76) (actual time=322.294..332.182
rows=15607loops=1) 
                                 Sort Key: (c."countyNo")::smallint, (b."caseType")::bpchar, (b."eventType")::bpchar
                                 ->  Nested Loop  (cost=0.00..543.41 rows=15607 width=76) (actual time=0.035..45.539
rows=15607loops=1) 
                                       ->  Index Scan using "ControlRecord_pkey" on "ControlRecord" c  (cost=0.00..4.27
rows=1width=2) (actual time=0.010..0.016 rows=1 loops=1) 
                                             Index Cond: (("countyNo")::smallint = 13)
                                       ->  Seq Scan on "CaseTypeHistEventB" b  (cost=0.00..383.07 rows=15607 width=74)
(actualtime=0.019..13.754 rows=15607 loops=1) 
 Total runtime: 449.660 ms
(24 rows)

So in all cases it is faster without the hashjoin; it's just a
question of whether it is 4% faster or 1000 times faster, with a 99+%
chance of being 1000 times faster.

This may get back to a question I've always had about the wisdom of
rounding fractional reads to whole numbers.  You lose information
which might lead to better plan choices.  You can't read half a row,
but you can read one row half the time.

-Kevin



pgsql-performance by date:

Previous
From: Theo Kramer
Date:
Subject: Re: Performance problems with prepared statements
Next
From: Josh Trutwin
Date:
Subject: Re: Shared Buffer setting in postgresql.conf