Please add EXISTS optimization to TODO list - Mailing list pgsql-hackers

From Kevin Grittner
Subject Please add EXISTS optimization to TODO list
Date
Msg-id 462B80A6020000250000CB5C@gwmta.wicourts.gov
Whole thread Raw
List pgsql-hackers
One of the few situations where I experience poor performance under PostgreSQL, compared to other, commercial
databases,is when an EXISTS predicate is used.  Actually, these often do perform quite well, but there are some
situationswhere there are optimizations available which other products detect and use, which PostgreSQL misses.
Lookingback at the mailing list archives, it appears that optimizations similar to what other products use for both IN
andEXISTS were added to the IN predicate circa 2003: 
http://archives.postgresql.org/pgsql-hackers/2003-08/msg00438.php -Make IN/NOT IN have similar performance to
EXISTS/NOTEXISTS (Tom) 
There are many situations where the EXISTS predicate and an IN predicate with a table subquery produce identical
results. After reviewing the SQL standard and thinking about it a bit, I think that the equivalence holds unless: 
(1)  the equivalent IN predicate is capable of producing a result of UNKNOWN, and
(2)  the predicate is used in a context where the difference between UNKNOWN and FALSE is significant.
One additional point: Martijn van Oosterhout pointed out in an earlier email that if the subquery contains any
non-immutablefunctions there could be a difference, although he described that issue as "minor". 
The most common case for UNKNOWN logical values is when comparisons involve a NULL on either or both sides of an
operator. In some cases, such as the one I posted about a month ago, it is quickly clear to a human reader that none of
theabove conditions exist -- the columns are all NOT NULL (so the result of the comparisons can never be UNKNOWN), they
areused in a context where UNKNOWN and FALSE produce the same results, and no non-immutable functions are invoked. 
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01408.php
The plan for the EXISTS predicate (running in 8.2.3) has a cost of 72736.37, while the logically equivalent query using
INhas a cost of 36.38 (and these do approximate reality), so if the faster option was visible to the planner, it would
bechosen.  Some would argue that I should just change the query to use IN.  There are three problems with that. 
(1)  It requires the use of a multi-value row value constructor, which is a construct not supported by all databases we
currentlyuse.  (We have a heterogeneous environment, where the same queries must run on multiple platforms.)  We have a
somewhatugly but valid query to use as a workaround which runs on all of our databases; it has a PostgreSQL cost of
130.98,which is tolerable, if not optimal. 
(2)  I have seen a number of cases where the logically equivalent EXISTS predicate performs better than the IN
predicate. The failure to recognize equivalence and to cost both approaches risks suboptimal performance.  Avoiding
thatrequires careful testing of both forms to coerce the planner into choosing the best plan. 
(3)  I have been trying to move our application programmers away from a focus on how they want to navigate through the
data,toward declaring what they want as the result.  (Some programmers routinely use cursors to navigate each table,
rowby row, based on what they think is the best plan, stuffing the data into a temporary table as they go, then
selectingfrom the temporary table, when a single SELECT statement with a few subqueries will produce the desired data.)
The current situation with these predicates diverts the focus from "what to show" back to "how to get it". 
I hate to see any queries run slower on PostgreSQL than on other databases, so I'm suggesting we address this.  We are
talkingabout an optimization that I've seen in some other products for at least 15 years. 
-Kevin



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: contrib/uuid-ossp: immutable vs. volatile
Next
From: Dave Page
Date:
Subject: Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]