Re: *_collapse_limit, geqo_threshold - Mailing list pgsql-hackers

From Robert Haas
Subject Re: *_collapse_limit, geqo_threshold
Date
Msg-id 603c8f070907102119n1428bca9j755b6c27502fc52d@mail.gmail.com
Whole thread Raw
In response to Re: *_collapse_limit, geqo_threshold  (Jaime Casanova <jcasanov@systemguards.com.ec>)
List pgsql-hackers
On Fri, Jul 10, 2009 at 2:44 PM, Jaime
Casanova<jcasanov@systemguards.com.ec> wrote:
> On Fri, Jul 10, 2009 at 10:22 AM, Robert Haas<robertmhaas@gmail.com> wrote:
>> I took a look at this and it seems that #3 can be implemented with
>> essentially no additional code (the handful of lines I added where
>> more than balanced out by some simplifications in ruleutils.c).  Of
>> course you still don't have to like it.  :-)
>>
>> Patch attached.
>>
>
> ! SELECT * FROM a INNER FORCE JOIN (b INNER FORCE JOIN c ON (b.ref =
> c.id)) ON (a.id = b.id);
>
> what's next? FORCE INDEX?
> once we implement one hint like this the complaints about the others
> will lose force

That would be pretty useless, because while it might work for trivial
cases, in a complex plan, it's almost certainly not going to do what
you want unless you specify every detail of the entire plan along with
it, which kind of defeats the purpose of using a database with a
sophisticated planner.  Actually, it seems to me that if we were to
add hints, the place to start would be with selectivity, since in my
experience bad selectivity estimates (often by 3, 4, 5, or 6 orders of
magnitude) are often the reason for a bad plan, and if you can fix
that, the rest takes care of itself - but fixing it is often very
difficult.  Of course, improving the selectivity estimator would be
even better, because time spent applying hints to queries is time that
could be better spent doing something else, and in fact one of the
reasons why I started using PostgreSQL is because complex queries Just
Work.

Constraining the join order does seem a lot less useful, because (for
example) it won't compel the planner to use a hash join instead of a
nest join, or to make a sensible decision about which relations should
be on the inner side of the join.  But I still think that it's worth
considering, because:

1. We basically already support the functionality - it's just broken.
Today, you can control the join order by setting join_collapse_limit =
1; then, the joins you want to order you specify using JOIN syntax;
the ones you don't want to order, you write as FROM items.  But all of
your outer joins will necessarily have the order forced, because
there's no way to write those as FROM items.  And if you want to use a
view that was written without this convention in mind, you're hosed.
So I think we either ought to remove it or fix it.

2. It's actually LESS code to support it completely than it is to
leave it the way it is now while removing from_collapse_limit and
replacing join_collapse_limit with enable_join_ordering.  Compare the
patch I sent earlier with the one that I'm about to send.

3. The point of this particular type of hint, at least as I see it, is
not so much to force the planner into the correct query plan as it is
to constrain the planning time.  There are very few tools available
for this today: join_collapse_limit and from_collapse_limit do it by
unexpectedly producing terrible plans, and the only other option is
GEQO.  In a case like the one posted upthread where you have ten or so
joins that can be reordered almost arbitrarily, you could shove a
single FORCE right into the middle and split it into two problems that
can be planned consecutively.  This isn't that different from what
join_collapse_limit does today, but it doesn't force a uniform
threshold on you across the board; you can apply where, when, and to
the extent necessary to control planning time for a particular query,
and you have to explicitly ask for it so you can't complain you were
ambushed if it doesn't work out.

But I can see I'm outvoted, so I give up!

...Robert


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: concurrent index builds unneeded lock?
Next
From: Robert Haas
Date:
Subject: Re: *_collapse_limit, geqo_threshold