Re: hashjoin chosen over 1000x faster plan - Mailing list pgsql-performance
From | Kevin Grittner |
---|---|
Subject | Re: hashjoin chosen over 1000x faster plan |
Date | |
Msg-id | 470CEBB4.EE98.0025.0@wicourts.gov Whole thread Raw |
In response to | Re: hashjoin chosen over 1000x faster plan (Simon Riggs <simon@2ndquadrant.com>) |
List | pgsql-performance |
>>> On Wed, Oct 10, 2007 at 2:52 PM, in message <1192045945.4233.351.camel@ebony.site>, Simon Riggs <simon@2ndquadrant.com> wrote: > > The fast plan is an all-or-nothing plan. It is *only* faster when the > number of matched rows is zero. You know it is zero, but currently the > planner doesn't, nor is it able to make use of the information when it > has it, half thru execution. Even if we could work out the high > probability of it being zero, we would still be left with the decision > of whether to optimise for the zero or for the non-zero. For a different case number which has four charges, two reopened: Sort (cost=2450.27..2450.28 rows=4 width=146) (actual time=463.048..463.052 rows=4 loops=1) Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo" -> Hash Left Join (cost=2318.93..2450.23 rows=4 width=146) (actual time=462.857..462.995 rows=4 loops=1) Hash Cond: ((("CHST"."eventType")::bpchar = ("CTHE"."eventType")::bpchar) AND (("CHST"."caseType")::bpchar = ("CTHE"."caseType")::bpchar)) -> Nested Loop Left Join (cost=0.00..115.67 rows=4 width=131) (actual time=0.045..0.165 rows=4 loops=1) -> Index Scan using "Charge_pkey" on "Charge" "CH" (cost=0.00..10.69 rows=4 width=112) (actual time=0.036..0.053rows=4 loops=1) Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar = '2004CF002575'::bpchar)) -> Index Scan using "CaseHist_pkey" on "CaseHist" "CHST" (cost=0.00..26.18 rows=5 width=41) (actual time=0.018..0.019rows=0 loops=4) Index Cond: ((("CHST"."countyNo")::smallint = 13) AND (("CHST"."caseNo")::bpchar = '2004CF002575'::bpchar)AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint)) -> Hash (cost=2084.82..2084.82 rows=15607 width=98) (actual time=462.780..462.780 rows=15607 loops=1) -> Subquery Scan "CTHE" (cost=1630.43..2084.82 rows=15607 width=98) (actual time=355.962..433.081 rows=15607loops=1) -> Merge Right Join (cost=1630.43..1928.75 rows=15607 width=89) (actual time=355.960..414.249 rows=15607loops=1) Merge Cond: (((d."countyNo")::smallint = "inner"."?column9?") AND ((d."caseType")::bpchar = "inner"."?column10?")AND ((d."eventType")::bpchar = "inner"."?column11?")) -> Index Scan using "CaseTypeHistEventD_pkey" on "CaseTypeHistEventD" d (cost=0.00..87.77 rows=2051width=21) (actual time=0.025..0.713 rows=434 loops=1) -> Sort (cost=1630.43..1669.45 rows=15607 width=76) (actual time=355.320..365.251 rows=15607loops=1) Sort Key: (c."countyNo")::smallint, (b."caseType")::bpchar, (b."eventType")::bpchar -> Nested Loop (cost=0.00..543.41 rows=15607 width=76) (actual time=0.035..46.914 rows=15607loops=1) -> Index Scan using "ControlRecord_pkey" on "ControlRecord" c (cost=0.00..4.27 rows=1width=2) (actual time=0.010..0.019 rows=1 loops=1) Index Cond: (("countyNo")::smallint = 13) -> Seq Scan on "CaseTypeHistEventB" b (cost=0.00..383.07 rows=15607 width=74) (actualtime=0.019..14.069 rows=15607 loops=1) Total runtime: 464.588 ms (21 rows) With set enable_hashjoin = off: Sort (cost=3404.68..3404.69 rows=4 width=146) (actual time=448.049..448.053 rows=4 loops=1) Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo" -> Merge Left Join (cost=3287.55..3404.64 rows=4 width=146) (actual time=447.986..448.005 rows=4 loops=1) Merge Cond: (("outer"."?column16?" = "inner"."?column5?") AND ("outer"."?column17?" = "inner"."?column6?")) -> Sort (cost=115.71..115.72 rows=4 width=131) (actual time=0.179..0.182 rows=4 loops=1) Sort Key: ("CHST"."caseType")::bpchar, ("CHST"."eventType")::bpchar -> Nested Loop Left Join (cost=0.00..115.67 rows=4 width=131) (actual time=0.051..0.139 rows=4 loops=1) -> Index Scan using "Charge_pkey" on "Charge" "CH" (cost=0.00..10.69 rows=4 width=112) (actual time=0.040..0.053rows=4 loops=1) Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar = '2004CF002575'::bpchar)) -> Index Scan using "CaseHist_pkey" on "CaseHist" "CHST" (cost=0.00..26.18 rows=5 width=41) (actualtime=0.013..0.014 rows=0 loops=4) Index Cond: ((("CHST"."countyNo")::smallint = 13) AND (("CHST"."caseNo")::bpchar = '2004CF002575'::bpchar)AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint)) -> Sort (cost=3171.84..3210.86 rows=15607 width=98) (actual time=446.459..446.936 rows=768 loops=1) Sort Key: ("CTHE"."caseType")::bpchar, ("CTHE"."eventType")::bpchar -> Subquery Scan "CTHE" (cost=1630.43..2084.82 rows=15607 width=98) (actual time=322.928..405.654 rows=15607loops=1) -> Merge Right Join (cost=1630.43..1928.75 rows=15607 width=89) (actual time=322.922..381.371 rows=15607loops=1) Merge Cond: (((d."countyNo")::smallint = "inner"."?column9?") AND ((d."caseType")::bpchar = "inner"."?column10?")AND ((d."eventType")::bpchar = "inner"."?column11?")) -> Index Scan using "CaseTypeHistEventD_pkey" on "CaseTypeHistEventD" d (cost=0.00..87.77 rows=2051width=21) (actual time=0.024..0.734 rows=434 loops=1) -> Sort (cost=1630.43..1669.45 rows=15607 width=76) (actual time=322.294..332.182 rows=15607loops=1) Sort Key: (c."countyNo")::smallint, (b."caseType")::bpchar, (b."eventType")::bpchar -> Nested Loop (cost=0.00..543.41 rows=15607 width=76) (actual time=0.035..45.539 rows=15607loops=1) -> Index Scan using "ControlRecord_pkey" on "ControlRecord" c (cost=0.00..4.27 rows=1width=2) (actual time=0.010..0.016 rows=1 loops=1) Index Cond: (("countyNo")::smallint = 13) -> Seq Scan on "CaseTypeHistEventB" b (cost=0.00..383.07 rows=15607 width=74) (actualtime=0.019..13.754 rows=15607 loops=1) Total runtime: 449.660 ms (24 rows) So in all cases it is faster without the hashjoin; it's just a question of whether it is 4% faster or 1000 times faster, with a 99+% chance of being 1000 times faster. This may get back to a question I've always had about the wisdom of rounding fractional reads to whole numbers. You lose information which might lead to better plan choices. You can't read half a row, but you can read one row half the time. -Kevin
pgsql-performance by date: