Re: Planner creating ineffective plans on LEFT OUTER joins - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Planner creating ineffective plans on LEFT OUTER joins
Date
Msg-id 603c8f070806260936s905b2cfp48ec1e620a084937@mail.gmail.com
Whole thread Raw
In response to Re: Planner creating ineffective plans on LEFT OUTER joins  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Planner creating ineffective plans on LEFT OUTER joins  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
> IMHO we should have a single parameter which indicates how much planning
> time we consider acceptable for this query. e.g.
>
>  optimization_level = 2 (default), varies 1-3
>
> Most automatic optimisation systems allow this kind of setting, whether
> it be a DBMS, or compilers (e.g. gcc).

It's my understanding that the philosophy of the PGDG in the past has
been to avoid putting any kind of hints into the system, focusing
rather an improving the planning of queries.  A quick Google search
turns up, for example:

http://archives.postgresql.org/pgsql-performance/2003-12/msg00181.php

Now, perhaps the thinking on this has changed, but a global knob like
this strikes me as a bad idea.  If Tom is right that improving the
plan on queries like this would result in an exponential increase in
planning time, then it's certainly important not to paint with too
broad a brush.  It would really be best to be able to tell the planner
which specific part of the query may be susceptible to this type of
optimization, because you could easily have many places in a
complicated query that would need to be analyzed, and if the planning
time is going to be a problem then we don't want to overplan the
entire query just to fix the problem in one particular spot.  And we
certainly don't want to do a whole bunch of other, unrelated,
expensive optimizations at the same time.

If one were to add a hint, I think the hint should tell the planner:
Hey, see this left join?  Well, computing the right-hand side of this
thing is going to take forever unless we get some information to help
us out.  So please do all of your limit and filter operations on the
left-hand side first, and then if you have any rows left, then
evaluate the right-hand side for just the values that matter.  i.e. in
the example query:

SELECT * FROM ab LEFT OUTER JOIN (bc JOIN cd ON bc.c = cd.d) ON ab.b =
bc.b WHERE ab.a = 20000

...please look up the rows in ab where ab.a = 20000.  If you find any,
then make a hash table of all the values you find for b among those
rows.  Then when you evaluate (bc JOIN cd ON bc.c = cd.d) you can
filter bc for rows where bc.b is in the hash table.

This might not be a good query plan in the average case, but there are
definitely instances where you might want to force this behavior.  In
fact, even if you had to do it as a nested loop (re-evaluating the bc
JOIN cd clause for each possible value of b) there are still cases
where it would be a big win.  Of course the nicest thing would be for
the planner to realize on its own that the right-hand side of the join
is going to generate a gazillion rows and the left-hand side is going
to generate one, but maybe (as Tom and the OP suggested) that is
expecting too much (though I confess I don't quite see why).

...Robert


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: [0/4] Proposal of SE-PostgreSQL patches
Next
From: "David E. Wheeler"
Date:
Subject: Re: Latest on CITEXT 2.0