Re: OUTER JOIN performance regression remains in 8.3beta4 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: OUTER JOIN performance regression remains in 8.3beta4
Date
Msg-id 7704.1199898820@sss.pgh.pa.us
Whole thread Raw
In response to Re: OUTER JOIN performance regression remains in 8.3beta4  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: OUTER JOIN performance regression remains in 8.3beta4
List pgsql-hackers
Gregory Stark <stark@enterprisedb.com> writes:
> "Alvaro Herrera" <alvherre@commandprompt.com> writes:
>> Would it be a good idea to keep removing redundant clauses and rethink
>> the preference for clauseful joins, going forward?

> I don't understand what's going on here. The planner is choosing one join
> order over another because one join has more join clauses than the
> other?

Not more join clauses, but any join clause at all.  We will not explore
join paths that don't have any join clause, unless forced to by lack of
any other way to form the result.

> Even
> though some of those joins are entirely redundant and have no selectivity?

You're confusing whether we explore a path (ie, cost it out) with
whether we choose it.  It's a necessary precondition, of course,
but we won't pick the path unless it looks cheapest.

Not exploring clauseless join paths is a heuristic that's needed to
avoid exponential growth of the search space in large join problems.
AFAIK every System-R-derived planner has done this.

As an example, considert1 join t2 on (...) join t3 on (...) ... join t8 on (...)
and for simplicity suppose that each ON condition relates the new
table to the immediately preceding table, and that we can't derive
any additional join conditions through transitivity.  In this situation
there are going to be only seven ways to form a two-base-relation
joinrel, as long as we allow only clauseful joins.  But there are
8*7/2 = 28 distinct ways to form a join if we consider all possible
join pairs whether they have a join clause or not.  At the
three-base-relation level there will be 12 joinrels if we only consider
clauseful pairs, or 56 if we don't.  It gets worse as you go up, and
most if not all of those additional joinrels represent entirely useless
variations on the theme of "let's stupidly compute a cartesian product
and then winnow it sometime later".

This is not to say that there is never a case where an early cartesian
product couldn't be a useful part of a plan, but rejecting them is a
darn good heuristic.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Named vs Unnamed Partitions
Next
From: Chris Browne
Date:
Subject: Re: Dynamic Partitioning using Segment Visibility Maps