Re: EXISTS optimization - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: EXISTS optimization |
Date | |
Msg-id | 46040DAC.EE98.0025.0@wicourts.gov Whole thread Raw |
In response to | Re: EXISTS optimization (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [PERFORM] EXISTS optimization
Re: EXISTS optimization Re: [PERFORM] EXISTS optimization |
List | pgsql-hackers |
>>> On Fri, Mar 23, 2007 at 4:49 PM, in message <25339.1174686582@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> explain analyze >> SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", > "H"."userId", "H"."time" >> FROM "Adjustment" "A" >> JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND > "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo") >> WHERE "H"."tranType" = 'A' >> AND "A"."date" > DATE '2006- 01- 01' >> AND "H"."countyNo" = 66 >> AND "A"."countyNo" = 66 >> AND EXISTS >> ( >> SELECT 1 FROM "TranDetail" "D" >> WHERE "D"."tranNo" = "H"."tranNo" >> AND "D"."countyNo" = "H"."countyNo" >> AND "D"."caseNo" LIKE '2006TR%' >> ) >> ; > >> The commercial product scans the index on caseNo in TranDetail to build a > work table of unique values, then uses indexed access to the TranHeader and > then to Adjustment. > > If you want that, try rewriting the EXISTS to an IN: > > AND ("H"."tranNo", "H"."countyNo") IN > ( > SELECT "D"."tranNo", "D"."countyNo" FROM "TranDetail" "D" > WHERE "D"."caseNo" LIKE '2006TR%' > ) Nice. I get this: explain analyze SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time" FROM "Adjustment" "A" JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo") WHERE "H"."tranType" = 'A' AND "A"."date" > DATE '2006- 01- 01' AND "H"."countyNo" = 66 AND "A"."countyNo" = 66 AND ("H"."tranNo", "H"."countyNo") IN ( SELECT "D"."tranNo", "D"."countyNo" FROM "TranDetail" "D" WHERE "D"."caseNo" LIKE '2006TR%' ) ; Nested Loop (cost=27.76..36.38 rows=1 width=46) (actual time=92.999..200.398 rows=2209 loops=1) Join Filter: (("H"."tranNo")::integer = ("A"."tranNo")::integer) -> Nested Loop (cost=27.76..32.08 rows=1 width=50) (actual time=92.970..176.472 rows=2209 loops=1) -> HashAggregate (cost=27.76..27.77 rows=1 width=6) (actual time=92.765..100.810 rows=9788 loops=1) -> Index Scan using "TranDetail_TranDetCaseNo" on "TranDetail" "D" (cost=0.00..27.66 rows=20 width=6) (actualtime=0.059..60.967 rows=46301 loops=1) Index Cond: ((("caseNo")::bpchar >= '2006TR'::bpchar) AND (("caseNo")::bpchar < '2006TS'::bpchar) AND(("countyNo")::smallint = 66)) Filter: (("caseNo")::bpchar ~~ '2006TR%'::text) -> Index Scan using "TranHeader_pkey" on "TranHeader" "H" (cost=0.00..4.30 rows=1 width=46) (actual time=0.006..0.006rows=0 loops=9788) Index Cond: ((("H"."tranNo")::integer = ("D"."tranNo")::integer) AND (("H"."countyNo")::smallint = 66)) Filter: (("tranType")::bpchar = 'A'::bpchar) -> Index Scan using "Adjustment_pkey" on "Adjustment" "A" (cost=0.00..4.28 rows=1 width=22) (actual time=0.008..0.009rows=1 loops=2209) Index Cond: ((("H"."tranId")::bpchar = ("A"."adjustmentNo")::bpchar) AND (("A"."countyNo")::smallint = 66)) Filter: ((date)::date > '2006-01-01'::date) Total runtime: 201.306 ms That's the good news. The bad news is that I operate under a management portability dictate which doesn't currently allowthat syntax, since not all of the products they want to cover support it. I tried something which seems equivalent,but it is running for a very long time. I'll show it with just the explain while I wait to see how long the explainanalyze takes. explain SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time" FROM "Adjustment" "A" JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo") WHERE "H"."tranType" = 'A' AND "A"."date" > DATE '2006- 01- 01' AND "H"."countyNo" = 66 AND "A"."countyNo" = 66 AND "H"."tranNo" IN ( SELECT "D"."tranNo" FROM "TranDetail" "D" WHERE "D"."caseNo" LIKE '2006TR%' AND "D"."countyNo" = "H"."countyNo" ) ; Nested Loop (cost=0.00..181673.08 rows=1 width=46) Join Filter: (("H"."tranId")::bpchar = ("A"."adjustmentNo")::bpchar) -> Seq Scan on "Adjustment" "A" (cost=0.00..2384.27 rows=11733 width=22) Filter: (((date)::date > '2006-01-01'::date) AND (("countyNo")::smallint = 66)) -> Index Scan using "TranHeader_pkey" on "TranHeader" "H" (cost=0.00..15.27 rows=1 width=46) Index Cond: ((("H"."tranNo")::integer = ("A"."tranNo")::integer) AND (("H"."countyNo")::smallint = 66)) Filter: ((("tranType")::bpchar = 'A'::bpchar) AND (subplan)) SubPlan -> Index Scan using "TranDetail_TranDetCaseNo" on "TranDetail" "D" (cost=0.00..27.66 rows=20 width=4) Index Cond: ((("caseNo")::bpchar >= '2006TR'::bpchar) AND (("caseNo")::bpchar < '2006TS'::bpchar) AND (("countyNo")::smallint= ($0)::smallint)) Filter: (("caseNo")::bpchar ~~ '2006TR%'::text) > We don't currently try to flatten EXISTS into a unique/join plan as we > do for IN. I seem to recall not doing so when I rewrote IN planning > because I didn't think it would be exactly semantically equivalent, > but that was awhile ago. Right at the moment it seems like it ought > to be equivalent as long as the comparison operators are strict. There are a great many situations where they are exactly semantically equivalent. In fact, the commercial database productusually generates an identical plan. I could try to work out (or better yet find) a formal description of when thatequivalence holds, if someone would be up for implementing it. Barring that, I could see if management would approvesome time for me to look at submitting a patch, but I haven't looked at the code involved, so I have no idea of thescale of effort involved yet. -Kevin
pgsql-hackers by date: