EXISTS optimization - Mailing list pgsql-performance

From Kevin Grittner
Subject EXISTS optimization
Date
Msg-id 454B3E62.EE98.0025.0@wicourts.gov
Whole thread Raw
List pgsql-performance
To support migration of existing queries, it would be nice not to have
to rewrite EXISTS clauses as IN clauses.  Here is one example of a query
which optimizes poorly:

    DELETE FROM "CaseDispo"
        WHERE EXISTS
        (
            SELECT * FROM "Consolidation" "C"
                WHERE "C"."caseNo" = '2006CM000123'
                  AND "C"."xrefOrConsol" = 'C'
                  AND "C"."countyNo" = 30
                  AND "CaseDispo"."caseNo" = "C"."crossRefCase"
                  AND "CaseDispo"."countyNo" = "C"."countyNo"
                  AND "CaseDispo"."dispoDate" = DATE '2005-10-31'
        );

 Seq Scan on "CaseDispo"  (cost=0.00..1227660.52 rows=176084 width=6)
(actual time=501.557..501.557 rows=0 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Result  (cost=0.00..3.46 rows=1 width=48) (actual
time=0.000..0.000 rows=0 loops=352167)
           One-Time Filter: (($2)::date = '2005-10-31'::date)
           ->  Index Scan using "Consolidation_pkey" on "Consolidation"
"C"  (cost=0.00..3.46 rows=1 width=48) (actual time=0.008..0.008 rows=0
loops=84)
                 Index Cond: ((("caseNo")::bpchar =
'2006CM000123'::bpchar) AND (($0)::bpchar = ("crossRefCase")::bpchar)
AND (("countyNo")::smallint = 30) AND (($1)::smallint =
("countyNo")::smallint))
                 Filter: ("xrefOrConsol" = 'C'::bpchar)
 Total runtime: 501.631 ms
(9 rows)

To most programmers, it would be obvious that this is an exact logical
equivalent to:

    DELETE FROM "CaseDispo"
        WHERE "countyNo" = 30
          AND "dispoDate" = DATE '2005-10-31'
          AND "caseNo" IN
        (
            SELECT "crossRefCase" FROM "Consolidation" "C"
                WHERE "C"."caseNo" = '2006CM000123'
                  AND "C"."xrefOrConsol" = 'C'
                  AND "C"."countyNo" = 30
        );

 Nested Loop  (cost=7.02..10.50 rows=1 width=6) (actual
time=0.036..0.036 rows=0 loops=1)
   ->  HashAggregate  (cost=7.02..7.03 rows=1 width=18) (actual
time=0.034..0.034 rows=0 loops=1)
         ->  Index Scan using "Consolidation_pkey" on "Consolidation"
"C"  (cost=0.00..7.02 rows=1 width=18) (actual time=0.032..0.032 rows=0
loops=1)
               Index Cond: ((("caseNo")::bpchar =
'2006CM000123'::bpchar) AND (("countyNo")::smallint = 30))
               Filter: ("xrefOrConsol" = 'C'::bpchar)
   ->  Index Scan using "CaseDispo_pkey" on "CaseDispo"
(cost=0.00..3.46 rows=1 width=24) (never executed)
         Index Cond: ((("CaseDispo"."caseNo")::bpchar =
("outer"."crossRefCase")::bpchar) AND (("CaseDispo"."dispoDate")::date =
'2005-10-31'::date) AND (("CaseDispo"."countyNo")::smallint = 30))
 Total runtime: 0.109 ms
(8 rows)

On this particular query, three orders of magnitude only gets you up to
half a second, but the same thing happens on longer running queries.
And even that half a second is significant when a user has to sit there
and wait for the hourglass to clear on a regular basis.  Clearly, the
problem is not in the costing -- it recognizes the high cost of the
EXISTS form.  The problem is that it doesn't recognize that these are
logically equivalent.

Is there any work in progress to expand the set of plans examined for
an EXISTS clause?  If not, can we add such an enhancement to the TODO
list?  Do we need a good write-up on what optimizations are legal for
EXISTS?

-Kevin




pgsql-performance by date:

Previous
From: Andreas Kostyrka
Date:
Subject: Re: Context switch storm
Next
From: Stuart Bishop
Date:
Subject: Re: Slow functional indexes?