Home > mailing lists
EXISTS optimization - Mailing list pgsql-hackers

From	Kevin Grittner
Subject	EXISTS optimization
Date	March 23, 2007 14:24:09
Msg-id	4603C1A3.EE98.0025.0@wicourts.gov Whole thread
Responses	Re: EXISTS optimization
List	pgsql-hackers
Tree view
I'm posting this to performance in case our workaround may be of benefit to someone with a similar issue.  I'm posting
tohackers because I hope we can improve our planner in this area so that a workaround is not necessary.  (It might make
senseto reply to one group or the other, depending on reply content.) 

We are converting from a commercial database (which shall remain unnamed here, due to license restrictions on
publishingbenchmarks).  Most queries run faster on PostgreSQL; a small number choose very poor plans and run much
longer. This particular query runs on the commercial product in 6.1s first time, 1.4s cached.  In PostgreSQL it runs in
about144s both first time and cached.  I was able to use an easy but fairly ugly rewrite (getting duplicate rows and
eliminatingthem with DISTINCT) which runs on the commercial product in 9.2s/3.0s and in PostgreSQL in 2.0s/0.7s. 

Here are the tables:

          Table "public.TranHeader"
    Column     |       Type       | Modifiers
---------------+------------------+-----------
 tranNo        | "TranNoT"        | not null
 countyNo      | "CountyNoT"      | not null
 acctPd        | "DateT"          | not null
 date          | "DateT"          | not null
 isComplete    | boolean          | not null
 tranId        | "TranIdT"        | not null
 tranType      | "TranTypeT"      | not null
 userId        | "UserIdT"        | not null
 workstationId | "WorkstationIdT" | not null
 time          | "TimeT"          |
Indexes:
    "TranHeader_pkey" PRIMARY KEY, btree ("tranNo", "countyNo")
    "TranHeader_TranAcctPeriod" UNIQUE, btree ("acctPd", "tranNo", "countyNo")
    "TranHeader_TranDate" UNIQUE, btree (date, "tranNo", "countyNo")

            Table "public.TranDetail"
     Column      |        Type        | Modifiers
-----------------+--------------------+-----------
 tranNo          | "TranNoT"          | not null
 tranDetailSeqNo | "TranDetailSeqNoT" | not null
 countyNo        | "CountyNoT"        | not null
 acctCode        | "AcctCodeT"        | not null
 amt             | "MoneyT"           | not null
 assessNo        | "TranIdT"          |
 caseNo          | "CaseNoT"          |
 citnNo          | "CitnNoT"          |
 citnViolDate    | "DateT"            |
 issAgencyNo     | "IssAgencyNoT"     |
 partyNo         | "PartyNoT"         |
 payableNo       | "PayableNoT"       |
 rcvblNo         | "RcvblNoT"         |
Indexes:
    "TranDetail_pkey" PRIMARY KEY, btree ("tranNo", "tranDetailSeqNo", "countyNo")
    "TranDetail_TranDetCaseNo" UNIQUE, btree ("caseNo", "tranNo", "tranDetailSeqNo", "countyNo")
    "TranDetail_TranDetPay" UNIQUE, btree ("payableNo", "tranNo", "tranDetailSeqNo", "countyNo")
    "TranDetail_TranDetRcvbl" UNIQUE, btree ("rcvblNo", "tranNo", "tranDetailSeqNo", "countyNo")
    "TranDetail_TranDetAcct" btree ("acctCode", "citnNo", "countyNo")

              Table "public.Adjustment"
     Column      |         Type          | Modifiers
-----------------+-----------------------+-----------
 adjustmentNo    | "TranIdT"             | not null
 countyNo        | "CountyNoT"           | not null
 date            | "DateT"               | not null
 isTranVoided    | boolean               | not null
 reasonCode      | "ReasonCodeT"         | not null
 tranNo          | "TranNoT"             | not null
 adjustsTranId   | "TranIdT"             |
 adjustsTranNo   | "TranNoT"             |
 adjustsTranType | "TranTypeT"           |
 explanation     | character varying(50) |
Indexes:
    "Adjustment_pkey" PRIMARY KEY, btree ("adjustmentNo", "countyNo")
    "Adjustment_AdjustsTranId" btree ("adjustsTranId", "adjustsTranType", "tranNo", "countyNo")
    "Adjustment_AdjustsTranNo" btree ("adjustsTranNo", "tranNo", "countyNo")
    "Adjustment_Date" btree (date, "countyNo")

Admittedly, the indexes are optimized for our query load under the commercial product, which can use the "covering
index"optimization. 

explain analyze
SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time"
  FROM "Adjustment" "A"
  JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" =
"A"."tranNo")
  WHERE "H"."tranType" = 'A'
    AND "A"."date" > DATE '2006-01-01'
    AND "H"."countyNo" = 66
    AND "A"."countyNo" = 66
    AND EXISTS
        (
          SELECT 1 FROM "TranDetail" "D"
            WHERE "D"."tranNo" = "H"."tranNo"
              AND "D"."countyNo" = "H"."countyNo"
              AND "D"."caseNo" LIKE '2006TR%'
        )
;

 Nested Loop  (cost=182.56..72736.37 rows=1 width=46) (actual time=6398.108..143631.427 rows=2205 loops=1)
   Join Filter: (("H"."tranId")::bpchar = ("A"."adjustmentNo")::bpchar)
   ->  Bitmap Heap Scan on "Adjustment" "A"  (cost=182.56..1535.69 rows=11542 width=22) (actual time=38.098..68.324
rows=12958loops=1) 
         Recheck Cond: (((date)::date > '2006-01-01'::date) AND (("countyNo")::smallint = 66))
         ->  Bitmap Index Scan on "Adjustment_Date"  (cost=0.00..179.67 rows=11542 width=0) (actual time=32.958..32.958
rows=12958loops=1) 
               Index Cond: (((date)::date > '2006-01-01'::date) AND (("countyNo")::smallint = 66))
   ->  Index Scan using "TranHeader_pkey" on "TranHeader" "H"  (cost=0.00..6.15 rows=1 width=46) (actual
time=11.073..11.074rows=0 loops=12958) 
         Index Cond: ((("H"."tranNo")::integer = ("A"."tranNo")::integer) AND (("H"."countyNo")::smallint = 66))
         Filter: ((("tranType")::bpchar = 'A'::bpchar) AND (subplan))
         SubPlan
           ->  Index Scan using "TranDetail_TranDetCaseNo" on "TranDetail" "D"  (cost=0.00..4.73 rows=1 width=0)
(actualtime=11.038..11.038 rows=0 loops=12958) 
                 Index Cond: ((("caseNo")::bpchar >= '2006TR'::bpchar) AND (("caseNo")::bpchar < '2006TS'::bpchar) AND
(("tranNo")::integer= ($0)::integer) AND (("countyNo")::smallint = ($1)::smallint)) 
                 Filter: (("caseNo")::bpchar ~~ '2006TR%'::text)
 Total runtime: 143633.838 ms

The commercial product scans the index on caseNo in TranDetail to build a work table of unique values, then uses
indexedaccess to the TranHeader and then to Adjustment.  I was able to get approximately the same plan (except the
duplicatesare eliminated at the end) in PostgreSQL by rewriting to this: 

SELECT DISTINCT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time"
  FROM "Adjustment" "A"
  JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" =
"A"."tranNo")
  JOIN "TranDetail" "D" ON ("D"."tranNo" = "H"."tranNo" AND "D"."countyNo" = "H"."countyNo" AND "D"."caseNo" LIKE
'2006TR%')
  WHERE "H"."tranType" = 'A'
    AND "A"."date" > DATE '2006-01-01'
    AND "H"."countyNo" = 66
    AND "A"."countyNo" = 66
;

 Unique  (cost=130.96..130.98 rows=1 width=46) (actual time=694.591..715.008 rows=2205 loops=1)
   ->  Sort  (cost=130.96..130.96 rows=1 width=46) (actual time=694.586..701.808 rows=16989 loops=1)
         Sort Key: "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H".date, "H"."userId", "H"."time"
         ->  Nested Loop  (cost=0.00..130.95 rows=1 width=46) (actual time=0.157..636.779 rows=16989 loops=1)
               Join Filter: (("H"."tranNo")::integer = ("A"."tranNo")::integer)
               ->  Nested Loop  (cost=0.00..113.76 rows=4 width=50) (actual time=0.131..452.544 rows=16989 loops=1)
                     ->  Index Scan using "TranDetail_TranDetCaseNo" on "TranDetail" "D"  (cost=0.00..27.57 rows=20
width=6)(actual time=0.049..83.005 rows=46293 loops=1) 
                           Index Cond: ((("caseNo")::bpchar >= '2006TR'::bpchar) AND (("caseNo")::bpchar <
'2006TS'::bpchar)AND (66 = ("countyNo")::smallint)) 
                           Filter: (("caseNo")::bpchar ~~ '2006TR%'::text)
                     ->  Index Scan using "TranHeader_pkey" on "TranHeader" "H"  (cost=0.00..4.30 rows=1 width=46)
(actualtime=0.006..0.007 rows=0 loops=46293) 
                           Index Cond: ((("D"."tranNo")::integer = ("H"."tranNo")::integer) AND
(("H"."countyNo")::smallint= 66)) 
                           Filter: (("tranType")::bpchar = 'A'::bpchar)
               ->  Index Scan using "Adjustment_pkey" on "Adjustment" "A"  (cost=0.00..4.28 rows=1 width=22) (actual
time=0.007..0.008rows=1 loops=16989) 
                     Index Cond: ((("H"."tranId")::bpchar = ("A"."adjustmentNo")::bpchar) AND
(("A"."countyNo")::smallint= 66)) 
                     Filter: ((date)::date > '2006-01-01'::date)
 Total runtime: 715.932 ms

I can't see any reason that PostgreSQL can't catch up to the other product on this optimization issue.  This usage of
DISTINCTseems a bit sloppy; I usually try to dissuade the application programmers from accumulating duplicates during
thejoins and then eliminating them in this way. 

-Kevin
pgsql-hackers by date:
From: Hugh Sasse
Date: 23 March 2007, 14:15:30
Subject: Re: Documentation access problems.
From: Alvaro Herrera
Date: 23 March 2007, 14:34:04
Subject: Re: Documentation access problems.
EXISTS optimization - Mailing list pgsql-hackers

Previous

Next