Hi -
I have some questions about query equivalence. The first query below
(actually a slightly more complicated one) took much longer to run than
I expected (actually, I killed it after it ran all night). The second
one took a few minutes.
I believe these queries are exactly equivalent, but I presume the
planner doesn't know that. (I suppose another explanation is that both
plans are considered for both queries, but the cost estimates come out
differently for some reason.) I've heard some allusions to "the
rewriter", and thought it was responsible for handling (some of) these
kinds of equivalencies. I just looked at some of the documentation,
though, which makes it seem like the rewriter just handles user-defined
rules and views.
In general, there are lots of ways to express the same abstract
information need in SQL, and I assumed that there were some set of
(probably incomplete) equivalencies encoded somewhere. Is this so?
Can someone point me to any relevant documentation or discussion?
Thanks.
(By the way, gazPlaces.gazPlaceID below is a primary key, so should be
unique - I think that's required for these to be equivalent. I tested
the two queries on small data sets, and they do indeed return the same
results.)
- John D. Burger
MITRE
explain select gazPlaceID from gazPlaces
where gazPlaceID not in (select gazPlaceID from gazContainers);
QUERY PLAN
------------------------------------------------------------------------
--------
Seq Scan on gazplaces (cost=0.00..528652149268.95 rows=3278748
width=4)
Filter: (NOT (subplan))
SubPlan
-> Seq Scan on gazcontainers (cost=0.00..138723.75 rows=9004875
width=4)
explain select gazPlaceID from gazPlaces
except select gazPlaceID from gazContainers;
QUERY PLAN
------------------------------------------------------------------------
------------------------
SetOp Except (cost=2882572.91..2960384.76 rows=1556237 width=4)
-> Sort (cost=2882572.91..2921478.84 rows=15562371 width=4)
Sort Key: gazplaceid
-> Append (cost=0.00..419616.42 rows=15562371 width=4)
-> Subquery Scan "*SELECT* 1" (cost=0.00..190843.92
rows=6557496 width=4)
-> Seq Scan on gazplaces (cost=0.00..125268.96
rows=6557496 width=4)
-> Subquery Scan "*SELECT* 2" (cost=0.00..228772.50
rows=9004875 width=4)
-> Seq Scan on gazcontainers
(cost=0.00..138723.75 rows=9004875 width=4)