Thread: Equivalent queries and the planner

Equivalent queries and the planner

From
"John D. Burger"
Date:
Hi -

I have some questions about query equivalence.  The first query below
(actually a slightly more complicated one) took much longer to run than
I expected (actually, I killed it after it ran all night).  The second
one took a few minutes.

I believe these queries are exactly equivalent, but I presume the
planner doesn't know that.  (I suppose another explanation is that both
plans are considered for both queries, but the cost estimates come out
differently for some reason.)  I've heard some allusions to "the
rewriter", and thought it was responsible for handling (some of) these
kinds of equivalencies.  I just looked at some of the documentation,
though, which makes it seem like the rewriter just handles user-defined
rules and views.

In general, there are lots of ways to express the same abstract
information need in SQL, and I assumed that there were some set of
(probably incomplete) equivalencies encoded somewhere.  Is this so?
Can someone point me to any relevant documentation or discussion?

Thanks.

(By the way, gazPlaces.gazPlaceID below is a primary key, so should be
unique - I think that's required for these to be equivalent.  I tested
the two queries on small data sets, and they do indeed return the same
results.)

- John D. Burger
   MITRE


explain select gazPlaceID from gazPlaces
    where gazPlaceID not in (select gazPlaceID from gazContainers);
        QUERY PLAN
------------------------------------------------------------------------
--------
  Seq Scan on gazplaces  (cost=0.00..528652149268.95 rows=3278748
width=4)
    Filter: (NOT (subplan))
    SubPlan
      ->  Seq Scan on gazcontainers  (cost=0.00..138723.75 rows=9004875
width=4)


explain select gazPlaceID from gazPlaces
    except select gazPlaceID from gazContainers;
        QUERY PLAN
------------------------------------------------------------------------
------------------------
  SetOp Except  (cost=2882572.91..2960384.76 rows=1556237 width=4)
    ->  Sort  (cost=2882572.91..2921478.84 rows=15562371 width=4)
          Sort Key: gazplaceid
          ->  Append  (cost=0.00..419616.42 rows=15562371 width=4)
                ->  Subquery Scan "*SELECT* 1"  (cost=0.00..190843.92
rows=6557496 width=4)
                      ->  Seq Scan on gazplaces  (cost=0.00..125268.96
rows=6557496 width=4)
                ->  Subquery Scan "*SELECT* 2"  (cost=0.00..228772.50
rows=9004875 width=4)
                      ->  Seq Scan on gazcontainers
(cost=0.00..138723.75 rows=9004875 width=4)


Re: Equivalent queries and the planner

From
Martijn van Oosterhout
Date:
On Fri, Oct 14, 2005 at 09:43:57AM -0400, John D. Burger wrote:
> (By the way, gazPlaces.gazPlaceID below is a primary key, so should be
> unique - I think that's required for these to be equivalent.  I tested
> the two queries on small data sets, and they do indeed return the same
> results.)

I think that you also have to be sure that gazPlaceID is never NULL.
AFAIK the PostgreSQL doesn't track that sort of thing at all at the
planning/optimisation stage.

You're right, sometimes statements are equivalent but usually there is
some corner case (like NULLs, duplicate values, etc) that screw it up
for the general case. Teaching the planner all that is something not
yet done.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Equivalent queries and the planner

From
Neil Conway
Date:
On Fri, 2005-14-10 at 09:43 -0400, John D. Burger wrote:
> I believe these queries are exactly equivalent, but I presume the
> planner doesn't know that.

> explain select gazPlaceID from gazPlaces
>     where gazPlaceID not in (select gazPlaceID from gazContainers);

> explain select gazPlaceID from gazPlaces
>     except select gazPlaceID from gazContainers;

Yeah, query optimization for set operations is currently quite
primitive; the above transformation is not yet implemented.

> In general, there are lots of ways to express the same abstract
> information need in SQL, and I assumed that there were some set of
> (probably incomplete) equivalencies encoded somewhere.  Is this so?

I don't know of a canonical list of planner transformations. There are
some presentations on planner internals that touch on this, which is
better than nothing:

http://neilc.treehou.se/optimizer.pdf
http://conferences.oreillynet.com/presentations/os2003/lane_tom.pdf

-Neil