Re: hashjoin chosen over 1000x faster plan - Mailing list pgsql-performance
| From | Kevin Grittner |
|---|---|
| Subject | Re: hashjoin chosen over 1000x faster plan |
| Date | |
| Msg-id | 470CEBB4.EE98.0025.0@wicourts.gov Whole thread Raw |
| In response to | Re: hashjoin chosen over 1000x faster plan (Simon Riggs <simon@2ndquadrant.com>) |
| List | pgsql-performance |
>>> On Wed, Oct 10, 2007 at 2:52 PM, in message
<1192045945.4233.351.camel@ebony.site>, Simon Riggs <simon@2ndquadrant.com>
wrote:
>
> The fast plan is an all-or-nothing plan. It is *only* faster when the
> number of matched rows is zero. You know it is zero, but currently the
> planner doesn't, nor is it able to make use of the information when it
> has it, half thru execution. Even if we could work out the high
> probability of it being zero, we would still be left with the decision
> of whether to optimise for the zero or for the non-zero.
For a different case number which has four charges, two reopened:
Sort (cost=2450.27..2450.28 rows=4 width=146) (actual time=463.048..463.052 rows=4 loops=1)
Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo"
-> Hash Left Join (cost=2318.93..2450.23 rows=4 width=146) (actual time=462.857..462.995 rows=4 loops=1)
Hash Cond: ((("CHST"."eventType")::bpchar = ("CTHE"."eventType")::bpchar) AND (("CHST"."caseType")::bpchar =
("CTHE"."caseType")::bpchar))
-> Nested Loop Left Join (cost=0.00..115.67 rows=4 width=131) (actual time=0.045..0.165 rows=4 loops=1)
-> Index Scan using "Charge_pkey" on "Charge" "CH" (cost=0.00..10.69 rows=4 width=112) (actual
time=0.036..0.053rows=4 loops=1)
Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar = '2004CF002575'::bpchar))
-> Index Scan using "CaseHist_pkey" on "CaseHist" "CHST" (cost=0.00..26.18 rows=5 width=41) (actual
time=0.018..0.019rows=0 loops=4)
Index Cond: ((("CHST"."countyNo")::smallint = 13) AND (("CHST"."caseNo")::bpchar =
'2004CF002575'::bpchar)AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint))
-> Hash (cost=2084.82..2084.82 rows=15607 width=98) (actual time=462.780..462.780 rows=15607 loops=1)
-> Subquery Scan "CTHE" (cost=1630.43..2084.82 rows=15607 width=98) (actual time=355.962..433.081
rows=15607loops=1)
-> Merge Right Join (cost=1630.43..1928.75 rows=15607 width=89) (actual time=355.960..414.249
rows=15607loops=1)
Merge Cond: (((d."countyNo")::smallint = "inner"."?column9?") AND ((d."caseType")::bpchar =
"inner"."?column10?")AND ((d."eventType")::bpchar = "inner"."?column11?"))
-> Index Scan using "CaseTypeHistEventD_pkey" on "CaseTypeHistEventD" d (cost=0.00..87.77
rows=2051width=21) (actual time=0.025..0.713 rows=434 loops=1)
-> Sort (cost=1630.43..1669.45 rows=15607 width=76) (actual time=355.320..365.251
rows=15607loops=1)
Sort Key: (c."countyNo")::smallint, (b."caseType")::bpchar, (b."eventType")::bpchar
-> Nested Loop (cost=0.00..543.41 rows=15607 width=76) (actual time=0.035..46.914
rows=15607loops=1)
-> Index Scan using "ControlRecord_pkey" on "ControlRecord" c (cost=0.00..4.27
rows=1width=2) (actual time=0.010..0.019 rows=1 loops=1)
Index Cond: (("countyNo")::smallint = 13)
-> Seq Scan on "CaseTypeHistEventB" b (cost=0.00..383.07 rows=15607 width=74)
(actualtime=0.019..14.069 rows=15607 loops=1)
Total runtime: 464.588 ms
(21 rows)
With set enable_hashjoin = off:
Sort (cost=3404.68..3404.69 rows=4 width=146) (actual time=448.049..448.053 rows=4 loops=1)
Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo"
-> Merge Left Join (cost=3287.55..3404.64 rows=4 width=146) (actual time=447.986..448.005 rows=4 loops=1)
Merge Cond: (("outer"."?column16?" = "inner"."?column5?") AND ("outer"."?column17?" = "inner"."?column6?"))
-> Sort (cost=115.71..115.72 rows=4 width=131) (actual time=0.179..0.182 rows=4 loops=1)
Sort Key: ("CHST"."caseType")::bpchar, ("CHST"."eventType")::bpchar
-> Nested Loop Left Join (cost=0.00..115.67 rows=4 width=131) (actual time=0.051..0.139 rows=4
loops=1)
-> Index Scan using "Charge_pkey" on "Charge" "CH" (cost=0.00..10.69 rows=4 width=112) (actual
time=0.040..0.053rows=4 loops=1)
Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar =
'2004CF002575'::bpchar))
-> Index Scan using "CaseHist_pkey" on "CaseHist" "CHST" (cost=0.00..26.18 rows=5 width=41)
(actualtime=0.013..0.014 rows=0 loops=4)
Index Cond: ((("CHST"."countyNo")::smallint = 13) AND (("CHST"."caseNo")::bpchar =
'2004CF002575'::bpchar)AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint))
-> Sort (cost=3171.84..3210.86 rows=15607 width=98) (actual time=446.459..446.936 rows=768 loops=1)
Sort Key: ("CTHE"."caseType")::bpchar, ("CTHE"."eventType")::bpchar
-> Subquery Scan "CTHE" (cost=1630.43..2084.82 rows=15607 width=98) (actual time=322.928..405.654
rows=15607loops=1)
-> Merge Right Join (cost=1630.43..1928.75 rows=15607 width=89) (actual time=322.922..381.371
rows=15607loops=1)
Merge Cond: (((d."countyNo")::smallint = "inner"."?column9?") AND ((d."caseType")::bpchar =
"inner"."?column10?")AND ((d."eventType")::bpchar = "inner"."?column11?"))
-> Index Scan using "CaseTypeHistEventD_pkey" on "CaseTypeHistEventD" d (cost=0.00..87.77
rows=2051width=21) (actual time=0.024..0.734 rows=434 loops=1)
-> Sort (cost=1630.43..1669.45 rows=15607 width=76) (actual time=322.294..332.182
rows=15607loops=1)
Sort Key: (c."countyNo")::smallint, (b."caseType")::bpchar, (b."eventType")::bpchar
-> Nested Loop (cost=0.00..543.41 rows=15607 width=76) (actual time=0.035..45.539
rows=15607loops=1)
-> Index Scan using "ControlRecord_pkey" on "ControlRecord" c (cost=0.00..4.27
rows=1width=2) (actual time=0.010..0.016 rows=1 loops=1)
Index Cond: (("countyNo")::smallint = 13)
-> Seq Scan on "CaseTypeHistEventB" b (cost=0.00..383.07 rows=15607 width=74)
(actualtime=0.019..13.754 rows=15607 loops=1)
Total runtime: 449.660 ms
(24 rows)
So in all cases it is faster without the hashjoin; it's just a
question of whether it is 4% faster or 1000 times faster, with a 99+%
chance of being 1000 times faster.
This may get back to a question I've always had about the wisdom of
rounding fractional reads to whole numbers. You lose information
which might lead to better plan choices. You can't read half a row,
but you can read one row half the time.
-Kevin
pgsql-performance by date: