Thread: Proposal: relaxing link between explicit JOINs and execution order

Proposal: relaxing link between explicit JOINs and execution order

From
Tom Lane
Date:
There's been some recent discussion about the fact that Postgres treats
explicit JOIN syntax as constraining the actual join plan, cf
http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/explicit-joins.html

This behavior was originally in there simply because of lack of time to
consider alternatives.  I now realize that it wouldn't be hard to get
the planner to do better --- basically, preprocess_jointree just has to
be willing to fold JoinExpr-under-JoinExpr into a FromExpr when the
joins are inner joins.

But in the meantime, some folks have found the present behavior to be
a feature rather than a bug, since it lets them control planning time
on many-table queries.  If we are going to change it, I think we need
some way to accommodate both camps.

What I've been toying with is inventing a GUC variable or two.  I am
thinking of defining a variable that sets the maximum size of a FromExpr
that preprocess_jointree is allowed to create by folding JoinExprs.
If this were set to 2, the behavior would be the same as before: no
collapsing of JoinExprs can occur.  If it were set to a large number,
inner JOIN syntax would never affect the planner at all.  In practice
it'd be smart to leave it at some value less than GEQO_THRESHOLD, so
that folding a large number of JOINs wouldn't leave you with a query
that takes a long time to plan or produces unpredictable plans.

There is already a need for a GUC variable to control the existing
behavior of preprocess_jointree: right now, it arbitrarily uses
GEQO_THRESHOLD/2 as the limit for the size of a FromExpr that can be
made by collapsing FromExprs together.  This ought to be a separately
settable parameter, I think.

Comments?  In particular, can anyone think of pithy names for these
variables?  The best I'd been able to come up with is MAX_JOIN_COLLAPSE
and MAX_FROM_COLLAPSE, but neither of these exactly sing...

            regards, tom lane

Re: [PERFORM] Proposal: relaxing link between explicit JOINs and execution order

From
Sean Chittenden
Date:
> There's been some recent discussion about the fact that Postgres
> treats explicit JOIN syntax as constraining the actual join plan, cf
> http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/explicit-joins.html
>
> This behavior was originally in there simply because of lack of time
> to consider alternatives.  I now realize that it wouldn't be hard to
> get the planner to do better --- basically, preprocess_jointree just
> has to be willing to fold JoinExpr-under-JoinExpr into a FromExpr
> when the joins are inner joins.
>
> But in the meantime, some folks have found the present behavior to be
> a feature rather than a bug, since it lets them control planning time
> on many-table queries.  If we are going to change it, I think we need
> some way to accommodate both camps.
[snip]
> Comments?  In particular, can anyone think of pithy names for these
> variables?  The best I'd been able to come up with is
> MAX_JOIN_COLLAPSE and MAX_FROM_COLLAPSE, but neither of these
> exactly sing...

How about something that's runtime tunable via a SET/SHOW config var?
There are some queries that I have that I haven't spent any time
tuning and would love to have the planner spend its CPU thinking about
it instead of mine.  Setting it to 2 by default, then on my tuned
queries, setting to something obscenely high so the planner won't muck
with what I know is fastest (or so I think at least).

I know this is a can of worms, but what about piggy backing on an
Oracle notation and having an inline way of setting this inside of a
comment?

SELECT /* +planner:collapse_tables=12  */ ....  ?
           ^^^^^^^ ^^^^^^^^^^^^^^^ ^^^
       system  variable        value

::shrug::  In brainstorm mode.  Anyway, a few names:

auto_order_join
auto_order_join_max
auto_reorder_table_limit
auto_collapse_join
auto_collapse_num_join
auto_join_threshold

When I'm thinking about what this variable will do for me as a DBA, I
think it will make the plan more intelligent by reordering the joins.
My $0.02.  -sc

--
Sean Chittenden

Sean Chittenden <sean@chittenden.org> writes:
> How about something that's runtime tunable via a SET/SHOW config var?

Er, that's what I was talking about.

> I know this is a can of worms, but what about piggy backing on an
> Oracle notation and having an inline way of setting this inside of a
> comment?

I don't want to go there ...

            regards, tom lane

Tom,

I am very strongly in favor of this idea.   I would personally prefer it if
the Join collapsing parmeter could be set at query time through a SET
statement, but will of course defer to the difficulty level in doing so.

> Comments?  In particular, can anyone think of pithy names for these
> variables?  The best I'd been able to come up with is MAX_JOIN_COLLAPSE
> and MAX_FROM_COLLAPSE, but neither of these exactly sing...

How about:
EXPLICIT_JOIN_MINIMUM
and
FROM_COLLAPSE_LIMIT

Just to make the two params not sound so identical?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Josh Berkus <josh@agliodbs.com> writes:
> I am very strongly in favor of this idea.   I would personally prefer it if
> the Join collapsing parmeter could be set at query time through a SET
> statement, but will of course defer to the difficulty level in doing so.

I guess I failed to make it clear that that's what I meant.  GUC
variables are those things that you can set via SET, or in the
postgresql.conf file, etc.  These values would be just as manipulable
as, say, ENABLE_SEQSCAN.

> How about:
> EXPLICIT_JOIN_MINIMUM
> and
> FROM_COLLAPSE_LIMIT

> Just to make the two params not sound so identical?

Hmm.  The two parameters would have closely related functions, so I'd
sort of think that the names *should* be pretty similar.

            regards, tom lane

Josh Berkus <josh@agliodbs.com> writes:
> How about:
> EXPLICIT_JOIN_MINIMUM
> and
> FROM_COLLAPSE_LIMIT

I've implemented this using FROM_COLLAPSE_LIMIT and JOIN_COLLAPSE_LIMIT
as the variable names.  It'd be easy enough to change if someone comes
up with better names.  You can read updated documentation at
http://developer.postgresql.org/docs/postgres/explicit-joins.html

            regards, tom lane