Thread: from_collapse_limit vs. geqo_threshold

from_collapse_limit vs. geqo_threshold

From
Robert Haas
Date:
The docs contain the following sage advice concerning from_collapse_limit:

"It is usually wise to keep this less than geqo_threshold."

I've been thinking through this advice on and off for about 2 years
and I still don't understand it.  The point of either
from_collapse_limit and geqo_threshold is to avoid exponential growth
in planning time when planning large join nests.  Either of them has
the effect of reducing planning time at the expense of possibly
getting an inferior plan.  However, it's my experience that the plans
that result when the from_collapse_limit kicks in are almost
invariably terrible when fetching only a small number of rows.  On the
other hand, the few GEQO plans I've had experience with have been
pretty reasonable.

It appears that this statement has been in our documentation since Tom
Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on
January 25, 2003 (9bf97ff426de9), but I can't find any justification
for it anywhere.  I think we either need to justify this advice, or
remove it.

...Robert


Re: from_collapse_limit vs. geqo_threshold

From
Josh Berkus
Date:
Robert,

> It appears that this statement has been in our documentation since Tom
> Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on
> January 25, 2003 (9bf97ff426de9), but I can't find any justification
> for it anywhere.  I think we either need to justify this advice, or
> remove it.

... trying to remember why I wrote that ... what would happen if 
FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD?


-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: from_collapse_limit vs. geqo_threshold

From
Robert Haas
Date:
On Thu, May 21, 2009 at 12:21 AM, Josh Berkus <josh@agliodbs.com> wrote:
>> It appears that this statement has been in our documentation since Tom
>> Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on
>> January 25, 2003 (9bf97ff426de9), but I can't find any justification
>> for it anywhere.  I think we either need to justify this advice, or
>> remove it.
>
> ... trying to remember why I wrote that ... what would happen if
> FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD?

The two variables do different things, so there's nothing particularly
magical about which one is larger AFAICS.  I believe that if you make
from_collapse_limit larger than geqo_threshold, then GEQO might be
asked to plan a query into which subqueries have been pulled up.  But
that's not obviously bad; the alternative is planning the subquery
separately and first, which at least for the very small number of
cases that I've tested seems to be quite a bit worse.

Apparently before from_collapse_limit was added the behavior existed,
but the thereshold was geqo_threshold/2.  So someone had a reason for
believing that when the join nest got too large, not pulling up
subqueries was a superior coping strategy versus invoking GEQO.  I
just don't know what the reason is, or whether it's still valid.

...Robert


Re: from_collapse_limit vs. geqo_threshold

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Robert,
>> It appears that this statement has been in our documentation since Tom
>> Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on
>> January 25, 2003 (9bf97ff426de9), but I can't find any justification
>> for it anywhere.  I think we either need to justify this advice, or
>> remove it.

> ... trying to remember why I wrote that ... what would happen if 
> FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD?

I think I wrote it, not you.  The point of the advice is to keep
subquery collapsation (hm, what's the right noun form?  Need caffeine)
from turning a non-GEQO query into a GEQO one, and thus subjecting
you to unpredictable plans.  Maybe the resulting plans would be better
on average, or maybe they wouldn't, but in any case they'd be
unpredictable.
        regards, tom lane


Re: from_collapse_limit vs. geqo_threshold

From
Robert Haas
Date:
On Thu, May 21, 2009 at 7:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Robert,
>>> It appears that this statement has been in our documentation since Tom
>>> Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on
>>> January 25, 2003 (9bf97ff426de9), but I can't find any justification
>>> for it anywhere.  I think we either need to justify this advice, or
>>> remove it.
>
>> ... trying to remember why I wrote that ... what would happen if
>> FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD?
>
> I think I wrote it, not you.  The point of the advice is to keep
> subquery collapsation (hm, what's the right noun form?  Need caffeine)
> from turning a non-GEQO query into a GEQO one, and thus subjecting
> you to unpredictable plans.  Maybe the resulting plans would be better
> on average, or maybe they wouldn't, but in any case they'd be
> unpredictable.

That's more or less what I figured, but my real world experience is
that pulling up subqueries and using GEQO leads to plans that are
random but tolerable, whereas not pulling up subqueries leads to plans
that are almost uniformly bad.  Actually, it works OK if really would
have needed to materialize the entire subquery, but otherwise it
stinks.  My real unvarnished opinion on this topic is that
from_collapse_limit is a loaded foot-gun waiting to go off.  We might
as well have an option where if the number of tables in the query
exceeds a certain threshold, we'll just sequential-scan the table
rather than considering the use of indices.  That option would
actually be better, because everyone who read the documentation would
be absolutely certain that they wanted to turn that option OFF,
whereas the behavior of from_collapse_limit is sufficiently complex
that it isn't obvious that it's a terrible idea.

...Robert


Re: from_collapse_limit vs. geqo_threshold

From
Greg Stark
Date:
Having just raised the statistics targets I wonder if we should look
at raising these two parameters too. The experience on the lists are
that when people run into either of these two limits we recommend
raising them and we've never seen anyone come back complaining that
their planning time goes through the roof.

To benchmark this like we did for the statistics target will be tricky
though. I suppose we can gather candidate queries by trolling the
lists for recommendations, though we didn't always get schemas to go
along with the queries.

-- 
greg


Re: from_collapse_limit vs. geqo_threshold

From
Robert Haas
Date:
On Thu, May 21, 2009 at 8:51 AM, Greg Stark <stark@enterprisedb.com> wrote:
> Having just raised the statistics targets I wonder if we should look
> at raising these two parameters too. The experience on the lists are
> that when people run into either of these two limits we recommend
> raising them and we've never seen anyone come back complaining that
> their planning time goes through the roof.
>
> To benchmark this like we did for the statistics target will be tricky
> though. I suppose we can gather candidate queries by trolling the
> lists for recommendations, though we didn't always get schemas to go
> along with the queries.

I think that's a good idea, too, but it's unrelated to the question of
which one should be set to the higher value.

...Robert


Re: from_collapse_limit vs. geqo_threshold

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, May 21, 2009 at 7:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Josh Berkus <josh@agliodbs.com> writes:
>>> ... trying to remember why I wrote that ... what would happen if
>>> FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD?
>> 
>> I think I wrote it, not you.  The point of the advice is to keep
>> subquery collapsation (hm, what's the right noun form?  Need caffeine)
>> from turning a non-GEQO query into a GEQO one, and thus subjecting
>> you to unpredictable plans.  Maybe the resulting plans would be better
>> on average, or maybe they wouldn't, but in any case they'd be
>> unpredictable.

> That's more or less what I figured, but my real world experience is
> that pulling up subqueries and using GEQO leads to plans that are
> random but tolerable, whereas not pulling up subqueries leads to plans
> that are almost uniformly bad.

I went back and looked at the CVS history to try to refresh my memory
about how we got here.  As best I can find, there were two steps:

1. The original commit of the ability to have subqueries at all,
during 7.1 development:

2000-09-29 14:21  tgl
Subselects in FROM clause, perISO syntax: FROM (SELECT ...) [AS] alias.  (Don't forget that analias is required.)
Viewsreimplemented as expanding tosubselect-in-FROM.  Grouping, aggregates, DISTINCT in viewsactually work now (he says
optimistically). No UNION support insubselects/views yet, but I have some ideas about that. Rule-related permissions
checkingmoved out of rewriter and intoexecutor.  INITDB REQUIRED!
 

This introduced the ability to pull up subqueries, but with an arbitrary
limit of geqo_threshold/2 on the number of relations that would be
collected into a single planning problem.

2. During 7.4 development, we did this:

2003-01-25 18:10  tgl
Allow the planner to collapse explicit inner JOINs together, ratherthan necessarily following the JOIN syntax to
developthe queryplan.  The old behavior is still available by setting GUC variableJOIN_COLLAPSE_LIMIT to 1.  Also
createa GUC variableFROM_COLLAPSE_LIMIT to control the similar decision about when tocollapse sub-SELECT lists into
theirparent lists.  (This behaviorexisted already, but the limit was always GEQO_THRESHOLD/2; nowit's separately
adjustable.)

The excuse for join_collapse_limit to exist at all is largely one of
backwards compatibility.  Up to then, we had not-infrequently suggested
that people could force a desired join order by writing an explicit JOIN
nest, and eliminating that escape hatch altogether didn't seem like a
good idea.  I think from_collapse_limit was added largely on grounds of
symmetry.

Now, as to why the original commit had the geqo_threshold/2 restriction:
it was obviously not based on field experience with flattening, because
we didn't have any.  What I think it *was* based on was that GEQO sucked
really badly back then, and I wanted to avoid having it kick in for
queries that it had never kicked in for in previous releases.  Some
quick comparisons say that 7.1 in GEQO mode was about 5X slower than
HEAD (despite its planning being a lot more simplistic), and tended to
find considerably worse plans.  Some of the significant improvements
since then:

2004-01-23 18:54  tgl
Revise GEQO planner to make use of some heuristic knowledge aboutSQL, namely that it's good to join where there are
joinclausesrather than where there are not.  Also enable it to generate bushyplans at need, so that it doesn't fail in
thepresence of multipleIN clauses containing sub-joins.
 

2004-01-21 18:33  tgl
Repair error apparently introduced in the initialcoding of GUC: the default value for geqo_effort is supposed to be40,
not1.  The actual 'genetic' component of the GEQO algorithmhas been practically disabled since 7.1 because of this
mistake.
 

Also, up to 7.0 there were some nasty memory leaks in the planner and
especially in GEQO, because we didn't have the memory context mechanism.
I think those were actually fixed as of 2000-09-29, but GEQO still had a
reputation for blowing out backend memory.

Now I'm still not exactly happy with GEQO, but it's surely a lot better
than it was in the fall of 2000.  So on the whole it does seem that the
current relationships between from_collapse_limit, join_collapse_limit,
and geqo_threshold are based on obsolete information and should be
revisited.  I don't have any data at hand to suggest specific new
default values, though.
        regards, tom lane


Re: from_collapse_limit vs. geqo_threshold

From
Robert Haas
Date:
On Mon, May 25, 2009 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now I'm still not exactly happy with GEQO, but it's surely a lot better
> than it was in the fall of 2000.  So on the whole it does seem that the
> current relationships between from_collapse_limit, join_collapse_limit,
> and geqo_threshold are based on obsolete information and should be
> revisited.  I don't have any data at hand to suggest specific new
> default values, though.

For 8.4, I'd be happy to just improve the documentation.  I think this
sentence could just be deleted from the section on
from_collapse_limit:

It is usually wise to keep this less than <xref linkend="guc-geqo-threshold">.

We could put some other explanation in place of that sentence, but I'm
not exactly sure what that explanation would say.  I guess the point
is that setting from_collapse_limit < geqo_threshold may delay GEQO
planning considerably in the face of complex subqueries, because
pulling up subqueries increases the size of the FROM list (I think).
That could be good if you want your query plans to be more
deterministic, but there's no guarantee they'll be good. Setting
from_collapse_limit > geqo_threshold is basically saying that the
standard planner will always have subqueries pulled up, so
from_collapse_limit should be based on what the pain point will be for
GEQO.

I'm not sure there's a lot of point in spelling all that out, though.
It more or less follows from the definition of the parameters.  So,
I'd be just as happy to delete the misleading hint and call it good.
But I could go either way.

For 8.5, it sounds like we need to do some testing to determine an
appropriate set of values, but I'm not exactly sure what to test.   As
a practical matter, the correct level of effort depends a lot on how
long the query figures to run.  For OLAP queries, planning times of
more than 50 ms or so start to add noticeably to the overall runtime
of the query, but if the query is expected to run for several minutes,
we'd presumably be happy to spend several seconds planning it, which
might make it feasible to use the standard planner even for very, very
big queries.

I'm not 100% convinced of the value of join_collapse_limit for
anything other than explicit control over the join order.  I have yet
to meet a PostgreSQL who thought that it was intuitive that it might
matter whether you wrote A JOIN B ON P1 JOIN C ON P2 JOIN D ON P3
[etc] or A, B, C, D, [etc] WHERE P1, P2, P3.  I suspect there are many
people who, if they knew that the latter might optimize better than
the former in some circumstances, would simply always write it in the
latter fashion, which makes the whole thing look a lot like a
concealed foot-gun, since whether or not it actually protects you
against exponential planning-time growth has a lot to do with how you
happen to like to write your queries (myself, I've switched styles in
the last few years).

...Robert


Re: from_collapse_limit vs. geqo_threshold

From
Selena Deckelmann
Date:
In the spirit of helping wrap-up 8.4 todo items...

Robert Haas wrote:
> On Mon, May 25, 2009 at 6:15 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Now I'm still not exactly happy with GEQO, but it's surely a lot better
>> than it was in the fall of 2000.  So on the whole it does seem that the
>> current relationships between from_collapse_limit, join_collapse_limit,
>> and geqo_threshold are based on obsolete information and should be
>> revisited.  I don't have any data at hand to suggest specific new
>> default values, though.
>
> For 8.4, I'd be happy to just improve the documentation.  I think this
> sentence could just be deleted from the section on
> from_collapse_limit:
>
> It is usually wise to keep this less than<xref linkend="guc-geqo-threshold">.
>
> We could put some other explanation in place of that sentence, but I'm
> not exactly sure what that explanation would say.  I guess the point
> is that setting from_collapse_limit<  geqo_threshold may delay GEQO
> planning considerably in the face of complex subqueries, because
> pulling up subqueries increases the size of the FROM list (I think).
> That could be good if you want your query plans to be more
> deterministic, but there's no guarantee they'll be good. Setting
> from_collapse_limit>  geqo_threshold is basically saying that the
> standard planner will always have subqueries pulled up, so
> from_collapse_limit should be based on what the pain point will be for
> GEQO.


My vote would be to provide some information.

Suggested revision of Robert's prose:

Because genetic query optimization may be triggered, increasing 
from_collapse_limit should be considered relative to <xref 
linkend="guc-geqo-threshold">.

-selena


-- 
Selena Deckelmann
End Point Corporation
selena@endpoint.com
503-282-2512


Re: from_collapse_limit vs. geqo_threshold

From
Robert Haas
Date:
On Mon, Jun 1, 2009 at 3:34 PM, Selena Deckelmann <selena@endpoint.com> wrote:
> In the spirit of helping wrap-up 8.4 todo items...
> Robert Haas wrote:
>> For 8.4, I'd be happy to just improve the documentation.  I think this
>> sentence could just be deleted from the section on
>> from_collapse_limit:
> My vote would be to provide some information.
>
> Suggested revision of Robert's prose:
>
> Because genetic query optimization may be triggered, increasing
> from_collapse_limit should be considered relative to <xref
> linkend="guc-geqo-threshold">.

Here's my attempt.

...Robert

Attachment

Re: from_collapse_limit vs. geqo_threshold

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jun 1, 2009 at 3:34 PM, Selena Deckelmann <selena@endpoint.com> wrote:
>> Suggested revision of Robert's prose:
>> 
>> Because genetic query optimization may be triggered, increasing
>> from_collapse_limit should be considered relative to <xref
>> linkend="guc-geqo-threshold">.

> Here's my attempt.

I applied the attached, along with some other wordsmithing.
        regards, tom lane

***************
*** 2252,2261 ****         The planner will merge sub-queries into upper queries if the         resulting
<literal>FROM</literal>list would have no more than         this many items.  Smaller values reduce planning time but
might
!         yield inferior query plans.  The default is eight.  It is usually
!         wise to keep this less than <xref linkend="guc-geqo-threshold">.         For more information see <xref
linkend="explicit-joins">.       </para>       </listitem>      </varlistentry> 
 
--- 2261,2275 ----         The planner will merge sub-queries into upper queries if the         resulting
<literal>FROM</literal>list would have no more than         this many items.  Smaller values reduce planning time but
might
!         yield inferior query plans.  The default is eight.         For more information see <xref
linkend="explicit-joins">.       </para>
 
+ 
+        <para>
+         Setting this value to <xref linkend="guc-geqo-threshold"> or more
+         may trigger use of the GEQO planner, resulting in nondeterministic
+         plans.  See <xref linkend="runtime-config-query-geqo">.
+        </para>       </listitem>      </varlistentry>