Planner reluctant to start from subquery - Mailing list pgsql-performance

From Kevin Grittner
Subject Planner reluctant to start from subquery
Date
Msg-id 43E0AA7E.EE98.0025.0@wicourts.gov
Whole thread Raw
Responses Re: Planner reluctant to start from subquery  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
We're converting from a commercial database product to PostgreSQL, and
generally things are going well.  While the licensing agreement with the
commercial vendor prohibits publication of benchmarks without their
written consent, I'll just say that on almost everything, PostgreSQL is
faster.

We do have a few queries where PostgreSQL is several orders of
magnitude slower.  It appears that the reason it is choosing a bad plan
is that it is reluctant to start from a subquery when there is an outer
join in the FROM clause.  Pasted below are four logically equivalent
queries.  The first is a much stripped down version of one of the
production queries.  The second turns the EXISTS expression into an IN
expression.  (In the full query this makes very little difference; as I
pared down the query, the planner started to do better with the IN form
before the EXISTS form.)  The third query is the fastest, but isn't
portable enough for our mixed environment.  The fourth is the best
workaround I've found, but I get a bit queasy when I have to use the
DISTINCT modifier on a query.

Any other suggestions?

-Kevin


explain analyze
SELECT "C".*, "P"."partyNo"
  FROM "Case" "C"
  JOIN "Party" "P" ON ("C"."caseNo" = "P"."caseNo" AND "C"."countyNo" =
"P"."countyNo")
  LEFT OUTER JOIN "WccaPermCaseType" "WPCT"
    ON ( "C"."caseType" = "WPCT"."caseType"
     AND "C"."countyNo" = "WPCT"."countyNo"
     AND "WPCT"."profileName" = 'PUBLIC'
       )
  WHERE ( "WPCT"."profileName" IS NOT NULL
       OR ("C"."caseType" = 'PA' AND "C"."isConfidential" = false)
        )
    AND "C"."countyNo" = 66
    AND EXISTS
    (
      SELECT *
        FROM "DocImageMetaData" "D"
        WHERE "D"."isEFiling" = true
          AND "D"."insertedDate" BETWEEN '2006-01-01' AND '2006-01-07'
          AND "D"."countyNo" = 66
          AND "D"."countyNo" = "C"."countyNo"
          AND "D"."caseNo" = "C"."caseNo"
    )
  ORDER BY "caseNo"
;

           QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=786467.94..786504.40 rows=14584 width=210) (actual
time=7391.295..7391.418 rows=51 loops=1)
   Sort Key: "C"."caseNo"
   ->  Hash Left Join  (cost=49.35..785459.30 rows=14584 width=210)
(actual time=6974.819..7390.802 rows=51 loops=1)
         Hash Cond: ((("outer"."caseType")::bpchar =
("inner"."caseType")::bpchar) AND (("outer"."countyNo")::smallint =
("inner"."countyNo")::smallint))
         Filter: (("inner"."profileName" IS NOT NULL) OR
((("outer"."caseType")::bpchar = 'PA'::bpchar) AND (NOT
"outer"."isConfidential")))
         ->  Merge Join  (cost=0.00..783366.38 rows=14584 width=210)
(actual time=6972.672..7388.329 rows=51 loops=1)
               Merge Cond: (("outer"."caseNo")::bpchar =
("inner"."caseNo")::bpchar)
               ->  Index Scan using "Case_pkey" on "Case" "C"
(cost=0.00..624268.11 rows=65025 width=208) (actual
time=4539.588..4927.730 rows=22 loops=1)
                     Index Cond: (("countyNo")::smallint = 66)
                     Filter: (subplan)
                     SubPlan
                       ->  Index Scan using "DocImageMetaData_pkey" on
"DocImageMetaData" "D"  (cost=0.00..3.89 rows=1 width=212) (actual
time=0.012..0.012 rows=0 loops=203171)
                             Index Cond: ((("countyNo")::smallint = 66)
AND (("countyNo")::smallint = ($0)::smallint) AND (("caseNo")::bpchar =
($1)::bpchar))
                             Filter: ("isEFiling" AND
(("insertedDate")::date >= '2006-01-01'::date) AND
(("insertedDate")::date <= '2006-01-07'::date))
               ->  Index Scan using "Party_pkey" on "Party" "P"
(cost=0.00..158657.86 rows=191084 width=22) (actual time=0.769..1646.381
rows=354058 loops=1)
                     Index Cond: (66 = ("countyNo")::smallint)
         ->  Hash  (cost=49.22..49.22 rows=27 width=31) (actual
time=1.919..1.919 rows=28 loops=1)
               ->  Bitmap Heap Scan on "WccaPermCaseType" "WPCT"
(cost=2.16..49.22 rows=27 width=31) (actual time=0.998..1.782 rows=28
loops=1)
                     Recheck Cond: ((("countyNo")::smallint = 66) AND
(("profileName")::text = 'PUBLIC'::text))
                     ->  Bitmap Index Scan on "WccaPermCaseType_pkey"
(cost=0.00..2.16 rows=27 width=0) (actual time=0.684..0.684 rows=28
loops=1)
                           Index Cond: ((("countyNo")::smallint = 66)
AND (("profileName")::text = 'PUBLIC'::text))
 Total runtime: 7392.577 ms
(22 rows)

explain analyze
SELECT "C".*, "P"."partyNo"
  FROM "Case" "C"
  JOIN "Party" "P" ON ("C"."caseNo" = "P"."caseNo" AND "C"."countyNo" =
"P"."countyNo")
  LEFT OUTER JOIN "WccaPermCaseType" "WPCT"
    ON ( "C"."caseType" = "WPCT"."caseType"
     AND "C"."countyNo" = "WPCT"."countyNo"
     AND "WPCT"."profileName" = 'PUBLIC'
       )
  WHERE ( "WPCT"."profileName" IS NOT NULL
       OR ("C"."caseType" = 'PA' AND "C"."isConfidential" = false)
        )
    AND "C"."countyNo" = 66
    AND "C"."caseNo" IN
    (
      SELECT "D"."caseNo"
        FROM "DocImageMetaData" "D"
        WHERE "D"."isEFiling" = true
          AND "D"."insertedDate" BETWEEN '2006-01-01' AND '2006-01-07'
          AND "D"."countyNo" = 66
    )
  ORDER BY
    "caseNo"

;

                  QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=284708.49..284708.50 rows=1 width=210) (actual
time=8962.995..8963.103 rows=51 loops=1)
   Sort Key: "C"."caseNo"
   ->  Hash Join  (cost=2359.31..284708.48 rows=1 width=210) (actual
time=8401.856..8962.606 rows=51 loops=1)
         Hash Cond: (("outer"."caseNo")::bpchar =
("inner"."caseNo")::bpchar)
         ->  Hash Left Join  (cost=49.35..282252.68 rows=29167
width=228) (actual time=32.120..8184.880 rows=312718 loops=1)
               Hash Cond: ((("outer"."caseType")::bpchar =
("inner"."caseType")::bpchar) AND (("outer"."countyNo")::smallint =
("inner"."countyNo")::smallint))
               Filter: (("inner"."profileName" IS NOT NULL) OR
((("outer"."caseType")::bpchar = 'PA'::bpchar) AND (NOT
"outer"."isConfidential")))
               ->  Merge Join  (cost=0.00..278116.34 rows=29167
width=228) (actual time=0.596..6236.238 rows=362819 loops=1)
                     Merge Cond: (("outer"."caseNo")::bpchar =
("inner"."caseNo")::bpchar)
                     ->  Index Scan using "Case_pkey" on "Case" "C"
(cost=0.00..118429.72 rows=130049 width=208) (actual
time=0.265..1303.409 rows=203171 loops=1)
                           Index Cond: (("countyNo")::smallint = 66)
                     ->  Index Scan using "Party_pkey" on "Party" "P"
(cost=0.00..158657.86 rows=191084 width=22) (actual time=0.303..2310.735
rows=362819 loops=1)
                           Index Cond: (66 = ("countyNo")::smallint)
               ->  Hash  (cost=49.22..49.22 rows=27 width=31) (actual
time=31.406..31.406 rows=28 loops=1)
                     ->  Bitmap Heap Scan on "WccaPermCaseType" "WPCT"
(cost=2.16..49.22 rows=27 width=31) (actual time=23.498..31.284 rows=28
loops=1)
                           Recheck Cond: ((("countyNo")::smallint = 66)
AND (("profileName")::text = 'PUBLIC'::text))
                           ->  Bitmap Index Scan on
"WccaPermCaseType_pkey"  (cost=0.00..2.16 rows=27 width=0) (actual
time=17.066..17.066 rows=28 loops=1)
                                 Index Cond: ((("countyNo")::smallint =
66) AND (("profileName")::text = 'PUBLIC'::text))
         ->  Hash  (cost=2309.95..2309.95 rows=1 width=18) (actual
time=24.255..24.255 rows=22 loops=1)
               ->  HashAggregate  (cost=2309.94..2309.95 rows=1
width=18) (actual time=24.132..24.185 rows=22 loops=1)
                     ->  Index Scan using
"DocImageMetaData_CountyNoInsertedDate" on "DocImageMetaData" "D"
(cost=0.00..2309.93 rows=6 width=18) (actual time=7.362..23.933 rows=29
loops=1)
                           Index Cond: ((("countyNo")::smallint = 66)
AND (("insertedDate")::date >= '2006-01-01'::date) AND
(("insertedDate")::date <= '2006-01-07'::date))
                           Filter: "isEFiling"
 Total runtime: 8964.044 ms
(24 rows)

explain analyze
SELECT "C".*, "P"."partyNo"
  FROM "Case" "C"
  JOIN "Party" "P" ON ("C"."caseNo" = "P"."caseNo" AND "C"."countyNo" =
"P"."countyNo")
  JOIN
    (
      SELECT "D"."caseNo"
        FROM "DocImageMetaData" "D"
        WHERE "D"."isEFiling" = true
          AND "D"."insertedDate" BETWEEN '2006-01-01' AND '2006-01-07'
          AND "D"."countyNo" = 66
        GROUP BY "D"."caseNo"
    ) "DD"
    ON ("DD"."caseNo" = "C"."caseNo")
  LEFT OUTER JOIN "WccaPermCaseType" "WPCT"
    ON ( "C"."caseType" = "WPCT"."caseType"
     AND "C"."countyNo" = "WPCT"."countyNo"
     AND "WPCT"."profileName" = 'PUBLIC'
       )
  WHERE ( "WPCT"."profileName" IS NOT NULL
       OR ("C"."caseType" = 'PA' AND "C"."isConfidential" = false)
        )
    AND "C"."countyNo" = 66
  ORDER BY
    "caseNo"
;

                    QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2321.49..2321.50 rows=1 width=210) (actual
time=7.753..7.859 rows=51 loops=1)
   Sort Key: "C"."caseNo"
   ->  Nested Loop Left Join  (cost=2309.94..2321.48 rows=1 width=210)
(actual time=3.982..7.369 rows=51 loops=1)
         Join Filter: (("outer"."countyNo")::smallint =
("inner"."countyNo")::smallint)
         Filter: (("inner"."profileName" IS NOT NULL) OR
((("outer"."caseType")::bpchar = 'PA'::bpchar) AND (NOT
"outer"."isConfidential")))
         ->  Nested Loop  (cost=2309.94..2317.99 rows=1 width=210)
(actual time=3.906..5.717 rows=51 loops=1)
               ->  Nested Loop  (cost=2309.94..2313.51 rows=1
width=240) (actual time=3.847..4.660 rows=22 loops=1)
                     ->  HashAggregate  (cost=2309.94..2309.95 rows=1
width=18) (actual time=3.775..3.830 rows=22 loops=1)
                           ->  Index Scan using
"DocImageMetaData_CountyNoInsertedDate" on "DocImageMetaData" "D"
(cost=0.00..2309.93 rows=6 width=18) (actual time=0.732..3.601 rows=29
loops=1)
                                 Index Cond: ((("countyNo")::smallint =
66) AND (("insertedDate")::date >= '2006-01-01'::date) AND
(("insertedDate")::date <= '2006-01-07'::date))
                                 Filter: "isEFiling"
                     ->  Index Scan using "Case_pkey" on "Case" "C"
(cost=0.00..3.53 rows=1 width=208) (actual time=0.020..0.022 rows=1
loops=22)
                           Index Cond: ((("C"."countyNo")::smallint =
66) AND (("outer"."caseNo")::bpchar = ("C"."caseNo")::bpchar))
               ->  Index Scan using "Party_pkey" on "Party" "P"
(cost=0.00..4.46 rows=2 width=22) (actual time=0.019..0.028 rows=2
loops=22)
                     Index Cond: ((66 = ("P"."countyNo")::smallint) AND
(("outer"."caseNo")::bpchar = ("P"."caseNo")::bpchar))
         ->  Index Scan using "WccaPermCaseType_ProfileName" on
"WccaPermCaseType" "WPCT"  (cost=0.00..3.47 rows=1 width=31) (actual
time=0.015..0.018 rows=1 loops=51)
               Index Cond: ((("WPCT"."profileName")::text =
'PUBLIC'::text) AND (("outer"."caseType")::bpchar =
("WPCT"."caseType")::bpchar) AND (("WPCT"."countyNo")::smallint = 66))
 Total runtime: 8.592 ms
(18 rows)

explain analyze
SELECT DISTINCT "C".*, "P"."partyNo"
  FROM "Case" "C"
  JOIN "Party" "P" ON ("P"."countyNo" = "C"."countyNo" AND "P"."caseNo"
= "C"."caseNo")
  JOIN "DocImageMetaData" "D" ON ("D"."countyNo" = "C"."countyNo" AND
"D"."caseNo" = "C"."caseNo")
  LEFT OUTER JOIN "WccaPermCaseType" "WPCT"
    ON ( "C"."caseType" = "WPCT"."caseType"
     AND "C"."countyNo" = "WPCT"."countyNo"
     AND "WPCT"."profileName" = 'PUBLIC'
       )
  WHERE ( "WPCT"."profileName" IS NOT NULL
       OR ("C"."caseType" = 'PA' AND "C"."isConfidential" = false)
        )
    AND "C"."countyNo" = 66
    AND "D"."countyNo" = 66
    AND "D"."isEFiling" = true
    AND "D"."insertedDate" BETWEEN '2006-01-01' AND '2006-01-07'
  ORDER BY
    "caseNo"
;




                                 QUERY PLAN





----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=2339.19..2339.28 rows=1 width=210) (actual
time=9.539..10.044 rows=51 loops=1)
   ->  Sort  (cost=2339.19..2339.19 rows=1 width=210) (actual
time=9.532..9.678 rows=68 loops=1)
         Sort Key: "C"."caseNo", "C"."countyNo", "C"."caseType",
"C"."filingDate", "C"."isConfidential", "C"."isDomesticViolence",
"C"."isFiledWoCtofc", "C"."lastChargeSeqNo", "C"."lastCvJgSeqNo",
"C"."lastHistSeqNo", "C"."lastPartySeqNo", "C"."lastRelSeqNo",
"C"."statusCode", "C"."bondId", "C"."branchId", "C".caption,
"C"."daCaseNo", "C"."dispCtofcNo", "C"."fileCtofcDate",
"C"."filingCtofcNo", "C"."issAgencyNo", "C"."maintCode",
"C"."oldCaseNo", "C"."plntfAgencyNo", "C"."previousRespCo",
"C"."prosAgencyNo", "C"."prosAtty", "C"."respCtofcNo",
"C"."wcisClsCode", "C"."isSeal", "C"."isExpunge",
"C"."isElectronicFiling", "C"."isPartySeal", "P"."partyNo"
         ->  Nested Loop Left Join  (cost=0.00..2339.18 rows=1
width=210) (actual time=0.857..7.901 rows=68 loops=1)
               Join Filter: (("outer"."countyNo")::smallint =
("inner"."countyNo")::smallint)
               Filter: (("inner"."profileName" IS NOT NULL) OR
((("outer"."caseType")::bpchar = 'PA'::bpchar) AND (NOT
"outer"."isConfidential")))
               ->  Nested Loop  (cost=0.00..2335.68 rows=1 width=210)
(actual time=0.786..5.784 rows=68 loops=1)
                     ->  Nested Loop  (cost=0.00..2331.20 rows=1
width=226) (actual time=0.728..4.313 rows=29 loops=1)
                           ->  Index Scan using
"DocImageMetaData_CountyNoInsertedDate" on "DocImageMetaData" "D"
(cost=0.00..2309.93 rows=6 width=20) (actual time=0.661..3.266 rows=29
loops=1)
                                 Index Cond: ((("countyNo")::smallint =
66) AND (("insertedDate")::date >= '2006-01-01'::date) AND
(("insertedDate")::date <= '2006-01-07'::date))
                                 Filter: "isEFiling"
                           ->  Index Scan using "Case_pkey" on "Case"
"C"  (cost=0.00..3.53 rows=1 width=208) (actual time=0.018..0.021 rows=1
loops=29)
                                 Index Cond:
((("C"."countyNo")::smallint = 66) AND (("outer"."caseNo")::bpchar =
("C"."caseNo")::bpchar))
                     ->  Index Scan using "Party_pkey" on "Party" "P"
(cost=0.00..4.46 rows=2 width=22) (actual time=0.018..0.027 rows=2
loops=29)
                           Index Cond: ((66 =
("P"."countyNo")::smallint) AND (("P"."caseNo")::bpchar =
("outer"."caseNo")::bpchar))
               ->  Index Scan using "WccaPermCaseType_ProfileName" on
"WccaPermCaseType" "WPCT"  (cost=0.00..3.47 rows=1 width=31) (actual
time=0.014..0.017 rows=1 loops=68)
                     Index Cond: ((("WPCT"."profileName")::text =
'PUBLIC'::text) AND (("outer"."caseType")::bpchar =
("WPCT"."caseType")::bpchar) AND (("WPCT"."countyNo")::smallint = 66))
 Total runtime: 10.748 ms
(18 rows)


pgsql-performance by date:

Previous
From: Mike Rylander
Date:
Subject: Re: Huge Data sets, simple queries
Next
From: Tom Lane
Date:
Subject: Re: Planner reluctant to start from subquery