Re: EXISTS optimization - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: EXISTS optimization
Date
Msg-id 46040DAC.EE98.0025.0@wicourts.gov
Whole thread Raw
In response to Re: EXISTS optimization  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PERFORM] EXISTS optimization
Re: EXISTS optimization
Re: [PERFORM] EXISTS optimization
List pgsql-hackers

>>> On Fri, Mar 23, 2007 at  4:49 PM, in message <25339.1174686582@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> explain analyze
>> SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date",
> "H"."userId", "H"."time"
>>   FROM "Adjustment" "A"
>>   JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND
> "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo")
>>   WHERE "H"."tranType" = 'A'
>>     AND "A"."date" > DATE '2006- 01- 01'
>>     AND "H"."countyNo" = 66
>>     AND "A"."countyNo" = 66
>>     AND EXISTS
>>         (
>>           SELECT 1 FROM "TranDetail" "D"
>>             WHERE "D"."tranNo" = "H"."tranNo"
>>               AND "D"."countyNo" = "H"."countyNo"
>>               AND "D"."caseNo" LIKE '2006TR%'
>>         )
>> ;
>
>> The commercial product scans the index on caseNo in TranDetail to build a
> work table of unique values, then uses indexed access to the TranHeader and
> then to Adjustment.
>
> If you want that, try rewriting the EXISTS to an IN:
>
>    AND ("H"."tranNo", "H"."countyNo") IN
>         (
>           SELECT "D"."tranNo", "D"."countyNo" FROM "TranDetail" "D"
>             WHERE "D"."caseNo" LIKE '2006TR%'
>         )

Nice.  I get this:

explain analyze
SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time"
  FROM "Adjustment" "A"
  JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" =
"A"."tranNo")
  WHERE "H"."tranType" = 'A'
    AND "A"."date" > DATE '2006- 01- 01'
    AND "H"."countyNo" = 66
    AND "A"."countyNo" = 66
    AND ("H"."tranNo", "H"."countyNo") IN
        (
          SELECT "D"."tranNo", "D"."countyNo" FROM "TranDetail" "D"
            WHERE "D"."caseNo" LIKE '2006TR%'
        )
;

 Nested Loop  (cost=27.76..36.38 rows=1 width=46) (actual time=92.999..200.398 rows=2209 loops=1)
   Join Filter: (("H"."tranNo")::integer = ("A"."tranNo")::integer)
   ->  Nested Loop  (cost=27.76..32.08 rows=1 width=50) (actual time=92.970..176.472 rows=2209 loops=1)
         ->  HashAggregate  (cost=27.76..27.77 rows=1 width=6) (actual time=92.765..100.810 rows=9788 loops=1)
               ->  Index Scan using "TranDetail_TranDetCaseNo" on "TranDetail" "D"  (cost=0.00..27.66 rows=20 width=6)
(actualtime=0.059..60.967 rows=46301 loops=1) 
                     Index Cond: ((("caseNo")::bpchar >= '2006TR'::bpchar) AND (("caseNo")::bpchar < '2006TS'::bpchar)
AND(("countyNo")::smallint = 66)) 
                     Filter: (("caseNo")::bpchar ~~ '2006TR%'::text)
         ->  Index Scan using "TranHeader_pkey" on "TranHeader" "H"  (cost=0.00..4.30 rows=1 width=46) (actual
time=0.006..0.006rows=0 loops=9788) 
               Index Cond: ((("H"."tranNo")::integer = ("D"."tranNo")::integer) AND (("H"."countyNo")::smallint = 66))
               Filter: (("tranType")::bpchar = 'A'::bpchar)
   ->  Index Scan using "Adjustment_pkey" on "Adjustment" "A"  (cost=0.00..4.28 rows=1 width=22) (actual
time=0.008..0.009rows=1 loops=2209) 
         Index Cond: ((("H"."tranId")::bpchar = ("A"."adjustmentNo")::bpchar) AND (("A"."countyNo")::smallint = 66))
         Filter: ((date)::date > '2006-01-01'::date)
 Total runtime: 201.306 ms

That's the good news.  The bad news is that I operate under a management portability dictate which doesn't currently
allowthat syntax, since not all of the products they want to cover support it.  I tried something which seems
equivalent,but it is running for a very long time.  I'll show it with just the explain while I wait to see how long the
explainanalyze takes. 

explain
SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time"
  FROM "Adjustment" "A"
  JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" =
"A"."tranNo")
  WHERE "H"."tranType" = 'A'
    AND "A"."date" > DATE '2006- 01- 01'
    AND "H"."countyNo" = 66
    AND "A"."countyNo" = 66
    AND "H"."tranNo" IN
        (
          SELECT "D"."tranNo" FROM "TranDetail" "D"
            WHERE "D"."caseNo" LIKE '2006TR%'
              AND "D"."countyNo" = "H"."countyNo"
        )
;

 Nested Loop  (cost=0.00..181673.08 rows=1 width=46)
   Join Filter: (("H"."tranId")::bpchar = ("A"."adjustmentNo")::bpchar)
   ->  Seq Scan on "Adjustment" "A"  (cost=0.00..2384.27 rows=11733 width=22)
         Filter: (((date)::date > '2006-01-01'::date) AND (("countyNo")::smallint = 66))
   ->  Index Scan using "TranHeader_pkey" on "TranHeader" "H"  (cost=0.00..15.27 rows=1 width=46)
         Index Cond: ((("H"."tranNo")::integer = ("A"."tranNo")::integer) AND (("H"."countyNo")::smallint = 66))
         Filter: ((("tranType")::bpchar = 'A'::bpchar) AND (subplan))
         SubPlan
           ->  Index Scan using "TranDetail_TranDetCaseNo" on "TranDetail" "D"  (cost=0.00..27.66 rows=20 width=4)
                 Index Cond: ((("caseNo")::bpchar >= '2006TR'::bpchar) AND (("caseNo")::bpchar < '2006TS'::bpchar) AND
(("countyNo")::smallint= ($0)::smallint)) 
                 Filter: (("caseNo")::bpchar ~~ '2006TR%'::text)

> We don't currently try to flatten EXISTS into a unique/join plan as we
> do for IN.  I seem to recall not doing so when I rewrote IN planning
> because I didn't think it would be exactly semantically equivalent,
> but that was awhile ago.  Right at the moment it seems like it ought
> to be equivalent as long as the comparison operators are strict.

There are a great many situations where they are exactly semantically equivalent.  In fact, the commercial database
productusually generates an identical plan.  I could try to work out (or better yet find) a formal description of when
thatequivalence holds, if someone would be up for implementing it.  Barring that, I could see if management would
approvesome time for me to look at submitting a patch, but I haven't looked at the code involved, so I have no idea of
thescale of effort involved yet. 

-Kevin



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [COMMITTERS] pgsql: We no longer need to palloc the VacuumStmt node; keeping it on
Next
From: Martijn van Oosterhout
Date:
Subject: Re: EXISTS optimization