Re: from_collapse_limit vs. geqo_threshold - Mailing list pgsql-hackers

From Tom Lane
Subject Re: from_collapse_limit vs. geqo_threshold
Date
Msg-id 9134.1243289706@sss.pgh.pa.us
Whole thread Raw
In response to Re: from_collapse_limit vs. geqo_threshold  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: from_collapse_limit vs. geqo_threshold  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, May 21, 2009 at 7:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Josh Berkus <josh@agliodbs.com> writes:
>>> ... trying to remember why I wrote that ... what would happen if
>>> FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD?
>> 
>> I think I wrote it, not you.  The point of the advice is to keep
>> subquery collapsation (hm, what's the right noun form?  Need caffeine)
>> from turning a non-GEQO query into a GEQO one, and thus subjecting
>> you to unpredictable plans.  Maybe the resulting plans would be better
>> on average, or maybe they wouldn't, but in any case they'd be
>> unpredictable.

> That's more or less what I figured, but my real world experience is
> that pulling up subqueries and using GEQO leads to plans that are
> random but tolerable, whereas not pulling up subqueries leads to plans
> that are almost uniformly bad.

I went back and looked at the CVS history to try to refresh my memory
about how we got here.  As best I can find, there were two steps:

1. The original commit of the ability to have subqueries at all,
during 7.1 development:

2000-09-29 14:21  tgl
Subselects in FROM clause, perISO syntax: FROM (SELECT ...) [AS] alias.  (Don't forget that analias is required.)
Viewsreimplemented as expanding tosubselect-in-FROM.  Grouping, aggregates, DISTINCT in viewsactually work now (he says
optimistically). No UNION support insubselects/views yet, but I have some ideas about that. Rule-related permissions
checkingmoved out of rewriter and intoexecutor.  INITDB REQUIRED!
 

This introduced the ability to pull up subqueries, but with an arbitrary
limit of geqo_threshold/2 on the number of relations that would be
collected into a single planning problem.

2. During 7.4 development, we did this:

2003-01-25 18:10  tgl
Allow the planner to collapse explicit inner JOINs together, ratherthan necessarily following the JOIN syntax to
developthe queryplan.  The old behavior is still available by setting GUC variableJOIN_COLLAPSE_LIMIT to 1.  Also
createa GUC variableFROM_COLLAPSE_LIMIT to control the similar decision about when tocollapse sub-SELECT lists into
theirparent lists.  (This behaviorexisted already, but the limit was always GEQO_THRESHOLD/2; nowit's separately
adjustable.)

The excuse for join_collapse_limit to exist at all is largely one of
backwards compatibility.  Up to then, we had not-infrequently suggested
that people could force a desired join order by writing an explicit JOIN
nest, and eliminating that escape hatch altogether didn't seem like a
good idea.  I think from_collapse_limit was added largely on grounds of
symmetry.

Now, as to why the original commit had the geqo_threshold/2 restriction:
it was obviously not based on field experience with flattening, because
we didn't have any.  What I think it *was* based on was that GEQO sucked
really badly back then, and I wanted to avoid having it kick in for
queries that it had never kicked in for in previous releases.  Some
quick comparisons say that 7.1 in GEQO mode was about 5X slower than
HEAD (despite its planning being a lot more simplistic), and tended to
find considerably worse plans.  Some of the significant improvements
since then:

2004-01-23 18:54  tgl
Revise GEQO planner to make use of some heuristic knowledge aboutSQL, namely that it's good to join where there are
joinclausesrather than where there are not.  Also enable it to generate bushyplans at need, so that it doesn't fail in
thepresence of multipleIN clauses containing sub-joins.
 

2004-01-21 18:33  tgl
Repair error apparently introduced in the initialcoding of GUC: the default value for geqo_effort is supposed to be40,
not1.  The actual 'genetic' component of the GEQO algorithmhas been practically disabled since 7.1 because of this
mistake.
 

Also, up to 7.0 there were some nasty memory leaks in the planner and
especially in GEQO, because we didn't have the memory context mechanism.
I think those were actually fixed as of 2000-09-29, but GEQO still had a
reputation for blowing out backend memory.

Now I'm still not exactly happy with GEQO, but it's surely a lot better
than it was in the fall of 2000.  So on the whole it does seem that the
current relationships between from_collapse_limit, join_collapse_limit,
and geqo_threshold are based on obsolete information and should be
revisited.  I don't have any data at hand to suggest specific new
default values, though.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: generic options for explain
Next
From: Tom Lane
Date:
Subject: Re: generic options for explain