Re: from_collapse_limit vs. geqo_threshold - Mailing list pgsql-hackers

From Robert Haas
Subject Re: from_collapse_limit vs. geqo_threshold
Date
Msg-id 603c8f070905251800g5b86d2dav26eca7f417d15dbf@mail.gmail.com
Whole thread Raw
In response to Re: from_collapse_limit vs. geqo_threshold  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: from_collapse_limit vs. geqo_threshold  (Selena Deckelmann <selena@endpoint.com>)
List pgsql-hackers
On Mon, May 25, 2009 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now I'm still not exactly happy with GEQO, but it's surely a lot better
> than it was in the fall of 2000.  So on the whole it does seem that the
> current relationships between from_collapse_limit, join_collapse_limit,
> and geqo_threshold are based on obsolete information and should be
> revisited.  I don't have any data at hand to suggest specific new
> default values, though.

For 8.4, I'd be happy to just improve the documentation.  I think this
sentence could just be deleted from the section on
from_collapse_limit:

It is usually wise to keep this less than <xref linkend="guc-geqo-threshold">.

We could put some other explanation in place of that sentence, but I'm
not exactly sure what that explanation would say.  I guess the point
is that setting from_collapse_limit < geqo_threshold may delay GEQO
planning considerably in the face of complex subqueries, because
pulling up subqueries increases the size of the FROM list (I think).
That could be good if you want your query plans to be more
deterministic, but there's no guarantee they'll be good. Setting
from_collapse_limit > geqo_threshold is basically saying that the
standard planner will always have subqueries pulled up, so
from_collapse_limit should be based on what the pain point will be for
GEQO.

I'm not sure there's a lot of point in spelling all that out, though.
It more or less follows from the definition of the parameters.  So,
I'd be just as happy to delete the misleading hint and call it good.
But I could go either way.

For 8.5, it sounds like we need to do some testing to determine an
appropriate set of values, but I'm not exactly sure what to test.   As
a practical matter, the correct level of effort depends a lot on how
long the query figures to run.  For OLAP queries, planning times of
more than 50 ms or so start to add noticeably to the overall runtime
of the query, but if the query is expected to run for several minutes,
we'd presumably be happy to spend several seconds planning it, which
might make it feasible to use the standard planner even for very, very
big queries.

I'm not 100% convinced of the value of join_collapse_limit for
anything other than explicit control over the join order.  I have yet
to meet a PostgreSQL who thought that it was intuitive that it might
matter whether you wrote A JOIN B ON P1 JOIN C ON P2 JOIN D ON P3
[etc] or A, B, C, D, [etc] WHERE P1, P2, P3.  I suspect there are many
people who, if they knew that the latter might optimize better than
the former in some circumstances, would simply always write it in the
latter fashion, which makes the whole thing look a lot like a
concealed foot-gun, since whether or not it actually protects you
against exponential planning-time growth has a lot to do with how you
happen to like to write your queries (myself, I've switched styles in
the last few years).

...Robert


pgsql-hackers by date:

Previous
From: Gevik Babakhani
Date:
Subject: Re: usability of pg_get_function_arguments
Next
From: Robert Haas
Date:
Subject: Re: generic options for explain