Thread: Review remove {join,from}_collapse_limit, add enable_join_ordering

Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Andres Freund
Date:
Hi Robert, Hi all,

The patch applies cleanly and works as intended - no surprise here. After the 
changes the documentation is at least as easy to understand as before and the 
code changes look sensible
Also not surprisingly that's not the area I expected problems I guess ;-)

For performance testing I replayed query logs from sites I easily could get my 
hands on (3 different, halfway interesting ones). I found no relevant 
differences on the first site which is sensible because 
{from,join}_collapse_limit wasn't reached anyway.
More interesting are the queries from the two sites having reporting queries:

On the first, simpler, schema I found on average 30% plan time increase and 40% 
execution time decrease. Most of the queries stayed the same, only a few 
changed radically (in both directions). No big differences between geqo=on/off.

The queries on the second reporting schema unfortunately are different. Its the 
one were I copied the crazy example I attached in the original thread.
With geqo=off a good part of the queries used daily use too much memory to plan 
sensibly and geqo=on outright fails with: 
"Error: Failed to make a valid plan"
on some.
I stopped trying to make performance measurements there.
Noticeable even some plans which were plannable in reasonable time before now 
are problematic with enable_join_ordering=false!

I agree that those queries are crazy, but I am not sure how many of those are 
out there...

So, while I think the changes are principally a good idea, as 
{from,join}_collapse_limit are a bit confusing options, I personally! do not 
think geqo is ready for it today, especially as the benefit is relatively 
small.

If I am the only one having access to such complicated queries its fine - I am 
working on the sites query generation/schema anyway. 

Could perhaps some other people having complicated queries check how they work 
out with those changes? It should be enough to check with a very big 
{join,from}_collapse_limit? Kevin?

I have also to admit that I somewhat like the current behaviour in theory. 
Currently you can have a view with hand-optimized JOIN order which will not 
get inlined and/or reordered use it together with something unoptimized and 
the unoptimized part will be reordered in many cases...

I found it somewhat hard to review a patch were my meaning was biased from 
beginning. As Tom listed himself listed himself as a reviewer I will happiliy 
(err?) concede to his and your judgement.


Andres


Andres Freund <andres@anarazel.de> writes:
> The queries on the second reporting schema unfortunately are different. Its the 
> one were I copied the crazy example I attached in the original thread.
> With geqo=off a good part of the queries used daily use too much memory to plan 
> sensibly and geqo=on outright fails with: 
> "Error: Failed to make a valid plan"
> on some.

We're not going to be able to fix this unless you show us examples.

> Noticeable even some plans which were plannable in reasonable time before now 
> are problematic with enable_join_ordering=false!

And this even more so --- it doesn't make any sense at all.

> So, while I think the changes are principally a good idea, as 
> {from,join}_collapse_limit are a bit confusing options, I personally! do not 
> think geqo is ready for it today, especially as the benefit is relatively 
> small.

In general I think that any such bugs are there anyway and need to be
fixed anyway.
        regards, tom lane


Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Andres Freund
Date:
On Thursday 16 July 2009 15:13:02 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > The queries on the second reporting schema unfortunately are different.
> > Its the one were I copied the crazy example I attached in the original
> > thread. With geqo=off a good part of the queries used daily use too much
> > memory to plan sensibly and geqo=on outright fails with:
> > "Error: Failed to make a valid plan"
> > on some.
> We're not going to be able to fix this unless you show us examples.
In the other thread I attached a similar to the real schema + example query. 
Not enough? And why?

> > Noticeable even some plans which were plannable in reasonable time before
> > now are problematic with enable_join_ordering=false!
> And this even more so --- it doesn't make any sense at all.
Why? With a high from_collapse_limit more subqueries get inlined - before 
inlining they get planned separately.

> > So, while I think the changes are principally a good idea, as
> > {from,join}_collapse_limit are a bit confusing options, I personally! do
> > not think geqo is ready for it today, especially as the benefit is
> > relatively small.
> In general I think that any such bugs are there anyway and need to be
> fixed anyway.
Understandable.

Andres


Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Andres Freund
Date:
On Thursday 16 July 2009 15:18:01 Andres Freund wrote:
> On Thursday 16 July 2009 15:13:02 Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > The queries on the second reporting schema unfortunately are different.
> > > Its the one were I copied the crazy example I attached in the original
> > > thread. With geqo=off a good part of the queries used daily use too
> > > much memory to plan sensibly and geqo=on outright fails with:
> > > "Error: Failed to make a valid plan"
> > > on some.
> >
> > We're not going to be able to fix this unless you show us examples.
>
> In the other thread I attached a similar to the real schema + example
> query. Not enough? And why?
For reference:
http://archives.postgresql.org/message-
id/200907091700.43411.andres@anarazel.de

Andres


Andres Freund <andres@anarazel.de> writes:
> On Thursday 16 July 2009 15:13:02 Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> "Error: Failed to make a valid plan"

>> We're not going to be able to fix this unless you show us examples.

> In the other thread I attached a similar to the real schema + example query. 
> Not enough? And why?

I tried the example query and couldn't get "Failed to make a valid plan"
out of it ... what settings do you need for that?

However, I do observe that this seems a sufficient counterexample
against the theory that we can just remove the collapse limits and let
GEQO save us on very complex queries.  On my machine, the example query
takes about 22 seconds to plan using CVS HEAD w/ all default settings.
If I set both collapse_limit variables to very high values (I used 999),
it takes ... um ... not sure; I gave up waiting after half an hour.
I also tried with geqo_effort reduced to the minimum of 1, but that
didn't produce a plan in reasonable time either (I gave up after ten
minutes).  So if we remove the collapse limits, Postgres will completely
fail on this query --- the only way out would be enable_join_ordering =
off, which is hardly likely to produce a decent plan.

Maybe we should leave the collapse_limit logic alone and address
Robert's gripes by just raising the default values a lot (I'm thinking
100 or so).  That way there's an escape hatch for anyone who has
pathological queries to deal with --- just dial the settings down.

>>> Noticeable even some plans which were plannable in reasonable time before
>>> now are problematic with enable_join_ordering=false!

>> And this even more so --- it doesn't make any sense at all.

> Why? With a high from_collapse_limit more subqueries get inlined - before 
> inlining they get planned separately.

Okay, I misunderstood which two cases you were comparing there.
        regards, tom lane


Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Greg Stark
Date:
On Thu, Jul 16, 2009 at 4:16 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> However, I do observe that this seems a sufficient counterexample
> against the theory that we can just remove the collapse limits and let
> GEQO save us on very complex queries.  On my machine, the example query
> takes about 22 seconds to plan using CVS HEAD w/ all default settings.
> If I set both collapse_limit variables to very high values (I used 999),
> it takes ... um ... not sure; I gave up waiting after half an hour.

What's the point of GEQO if it doesn't guarantee to produce the
optimal plana and *also* doesn't guarantee to produce some plan, any
plan, within some reasonable amount of time? Either we need to fix
that or else I don't see what it's buying us over our regular planner
which also might not produce a plan within a reasonable amount of time
but at least if it does it'll be the right plan.

--
greg
http://mit.edu/~gsstark/resume.pdf


I wrote:
> If I set both collapse_limit variables to very high values (I used 999),
> it takes ... um ... not sure; I gave up waiting after half an hour.
> I also tried with geqo_effort reduced to the minimum of 1, but that
> didn't produce a plan in reasonable time either (I gave up after ten
> minutes).

After I gave up letting the machine be idle to get a fair timing,
I turned on oprofile monitoring.  It looks a bit interesting:

samples  %        image name               symbol name
886498   53.8090  postgres                 have_relevant_eclass_joinclause
460596   27.9574  postgres                 bms_overlap
142764    8.6655  postgres                 bms_is_subset
126274    7.6646  postgres                 have_join_order_restriction
14205     0.8622  postgres                 list_nth_cell
2721      0.1652  postgres                 generate_join_implied_equalities
2445      0.1484  libc-2.9.so              memset
2202      0.1337  postgres                 have_relevant_joinclause
1678      0.1019  postgres                 make_canonical_pathkey
1648      0.1000  postgres                 pfree
884       0.0537  postgres                 bms_union
762       0.0463  postgres                 gimme_tree
660       0.0401  libc-2.9.so              memcpy
571       0.0347  postgres                 AllocSetFree
475       0.0288  postgres                 AllocSetAlloc
431       0.0262  postgres                 has_relevant_eclass_joinclause
389       0.0236  postgres                 check_list_invariants
260       0.0158  postgres                 join_is_legal
238       0.0144  postgres                 bms_copy

So maybe a redesign of the equivalence-class joinclause mechanism is in
order.  Still, this is unlikely to fix the fundamental issue that the
time for large join problems grows nonlinearly.
        regards, tom lane


Re: Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Kenneth Marshall
Date:
On Thu, Jul 16, 2009 at 04:27:39PM +0100, Greg Stark wrote:
> On Thu, Jul 16, 2009 at 4:16 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> > However, I do observe that this seems a sufficient counterexample
> > against the theory that we can just remove the collapse limits and let
> > GEQO save us on very complex queries. ?On my machine, the example query
> > takes about 22 seconds to plan using CVS HEAD w/ all default settings.
> > If I set both collapse_limit variables to very high values (I used 999),
> > it takes ... um ... not sure; I gave up waiting after half an hour.
> 
> What's the point of GEQO if it doesn't guarantee to produce the
> optimal plana and *also* doesn't guarantee to produce some plan, any
> plan, within some reasonable amount of time? Either we need to fix
> that or else I don't see what it's buying us over our regular planner
> which also might not produce a plan within a reasonable amount of time
> but at least if it does it'll be the right plan.
> 

I do agree that we should have an actually time limit cap for
GEQO that would have it return the best plan so far at that time.
Then you can at least bound your planning time.

Regards,
Ken


Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Andres Freund
Date:
On Thursday 16 July 2009 17:16:31 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On Thursday 16 July 2009 15:13:02 Tom Lane wrote:
> >> Andres Freund <andres@anarazel.de> writes:
> >>> "Error: Failed to make a valid plan"
> >>
> >> We're not going to be able to fix this unless you show us examples.
> >
> > In the other thread I attached a similar to the real schema + example
> > query. Not enough? And why?
>
> I tried the example query and couldn't get "Failed to make a valid plan"
> out of it ... what settings do you need for that?
It unfortunately depends on settings and luck. This dependence on luck was the 
reason why I liked geqo to behave "somewhat" deterministically...

With  {join,from}_collapse_limit = 100 it seems to be triggered reliably. With 
lower values it seems harder trigger, with bigger it simply takes too long to 
even get there.

Efficiencywise using geqo with higher limits nearly all time is spent in:

geqo
gimme_tree 
have_join_order_restriction
has_legal_joinclause
have_relevant_joinclause
have_relevant_eclass (30% self)
bms_overlap (50%self)

I am not yet fully understanding geqo, but it looks like there are some 
possibilities to improve this. 
Although such efficiency improvements would no not explain the completely 
failing plans...

Do you have an idea which kind of plans benefit most from using geqo? I had a 
somewhat hard time finding any query were geqo was substantially faster than 
the standard join search.

That also somewhat explains why I saw improvements with 64bit bitmapsets...


> However, I do observe that this seems a sufficient counterexample
> against the theory that we can just remove the collapse limits and let
> GEQO save us on very complex queries.  On my machine, the example query
> takes about 22 seconds to plan using CVS HEAD w/ all default settings.
> If I set both collapse_limit variables to very high values (I used 999),
> it takes ... um ... not sure; I gave up waiting after half an hour.
> I also tried with geqo_effort reduced to the minimum of 1, but that
> didn't produce a plan in reasonable time either (I gave up after ten
> minutes).  So if we remove the collapse limits, Postgres will completely
> fail on this query --- the only way out would be enable_join_ordering =
> off, which is hardly likely to produce a decent plan.
> Maybe we should leave the collapse_limit logic alone and address
> Robert's gripes by just raising the default values a lot (I'm thinking
> 100 or so).  That way there's an escape hatch for anyone who has
> pathological queries to deal with --- just dial the settings down.
Yes, I think thats sensible. I don't know if there are any queries out there 
that benefit from a higher limits.

Andres



Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Greg Stark
Date:
On Thu, Jul 16, 2009 at 4:32 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> samples  %        image name               symbol name
> 886498   53.8090  postgres                 have_relevant_eclass_joinclause
> 460596   27.9574  postgres                 bms_overlap
>
> So maybe a redesign of the equivalence-class joinclause mechanism is in
> order.  Still, this is unlikely to fix the fundamental issue that the
> time for large join problems grows nonlinearly.

Perhaps it's GEQO's fault that it's using these functions
inappropriately, calling them often to calculate these answers
whenever it needs them instead of looking once for join clauses and
then optimizing based on the results. But I've never actually looked
at geqo, mabe that's inherent in the design?


--
greg
http://mit.edu/~gsstark/resume.pdf


Greg Stark <gsstark@mit.edu> writes:
> On Thu, Jul 16, 2009 at 4:32 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> So maybe a redesign of the equivalence-class joinclause mechanism is in
>> order. �Still, this is unlikely to fix the fundamental issue that the
>> time for large join problems grows nonlinearly.

> Perhaps it's GEQO's fault that it's using these functions
> inappropriately, calling them often to calculate these answers
> whenever it needs them instead of looking once for join clauses and
> then optimizing based on the results. But I've never actually looked
> at geqo, mabe that's inherent in the design?

geqo isn't doing anything the regular planner wouldn't do under similar
conditions.  It might well be that better caching is the answer to this
particular problem, but I don't have time to look closer today.
        regards, tom lane


Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Andres Freund
Date:
On Thursday 16 July 2009 17:27:39 Greg Stark wrote:
> On Thu, Jul 16, 2009 at 4:16 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> > However, I do observe that this seems a sufficient counterexample
> > against the theory that we can just remove the collapse limits and let
> > GEQO save us on very complex queries.  On my machine, the example query
> > takes about 22 seconds to plan using CVS HEAD w/ all default settings.
> > If I set both collapse_limit variables to very high values (I used 999),
> > it takes ... um ... not sure; I gave up waiting after half an hour.
> What's the point of GEQO if it doesn't guarantee to produce the
> optimal plana and *also* doesn't guarantee to produce some plan, any
> plan, within some reasonable amount of time? Either we need to fix
> that or else I don't see what it's buying us over our regular planner
> which also might not produce a plan within a reasonable amount of time
> but at least if it does it'll be the right plan.
Well, I could not find a plan where it errored out with the old limits. So one 
could argue its just not adapted.
Although I also could not find a single case where geqo was relevantly faster 
with the default settings even if it was used.
The default settings currently make it relatively hard to trigger geqo at all.


Andres


Andres Freund <andres@anarazel.de> writes:
> The default settings currently make it relatively hard to trigger geqo at all.

Yes, and that was intentional.  One of the implications of what we're
discussing here is that geqo would get used a lot more for "typical
complex queries" (if there is any such thing as a typical one).  So
it's fully to be expected that the fallout would be pressure to improve
geqo in various ways.

Given that we are at the start of the development cycle, that prospect
doesn't scare me --- there's plenty of time to fix whatever needs
fixing.  However, I am leaning to the feeling that I don't want to be
putting people in a position where they have no alternative but to use
geqo.  So adjusting rather than removing the collapse limits is seeming
like a good idea.
        regards, tom lane


Andres Freund <andres@anarazel.de> writes:
> On Thursday 16 July 2009 17:16:31 Tom Lane wrote:
>> I tried the example query and couldn't get "Failed to make a valid plan"
>> out of it ... what settings do you need for that?

> It unfortunately depends on settings and luck. This dependence on luck was the 
> reason why I liked geqo to behave "somewhat" deterministically...

> With  {join,from}_collapse_limit = 100 it seems to be triggered reliably. With 
> lower values it seems harder trigger, with bigger it simply takes too long to 
> even get there.

OK, I see it at 100.  Would you confirm that what you get is the failure
in random_init_pool (geqo_pool.c) not the identically-phrased message
elsewhere?  (If you have VERBOSITY = verbose you should see the error
location info.)
        regards, tom lane


Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Andres Freund
Date:
On Thursday 16 July 2009 18:23:06 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On Thursday 16 July 2009 17:16:31 Tom Lane wrote:
> >> I tried the example query and couldn't get "Failed to make a valid plan"
> >> out of it ... what settings do you need for that?
> >
> > It unfortunately depends on settings and luck. This dependence on luck
> > was the reason why I liked geqo to behave "somewhat" deterministically...
> >
> > With  {join,from}_collapse_limit = 100 it seems to be triggered reliably.
> > With lower values it seems harder trigger, with bigger it simply takes
> > too long to even get there.
>
> OK, I see it at 100.  Would you confirm that what you get is the failure
> in random_init_pool (geqo_pool.c) not the identically-phrased message
> elsewhere?  (If you have VERBOSITY = verbose you should see the error
> location info.)
Yes. I should have seen that. Its not exactly surprising...

Andres


Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Andres Freund
Date:
On Thursday 16 July 2009 17:59:58 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > The default settings currently make it relatively hard to trigger geqo at
> > all.
>
> Yes, and that was intentional.  One of the implications of what we're
> discussing here is that geqo would get used a lot more for "typical
> complex queries" (if there is any such thing as a typical one).  So
> it's fully to be expected that the fallout would be pressure to improve
> geqo in various ways.
>
> Given that we are at the start of the development cycle, that prospect
> doesn't scare me --- there's plenty of time to fix whatever needs
> fixing.  However, I am leaning to the feeling that I don't want to be
> putting people in a position where they have no alternative but to use
> geqo.  So adjusting rather than removing the collapse limits is seeming
> like a good idea.
Hm. I see a, a bit more fundamental problem with geqo:
I tried several queries, and I found not a single one, where the whole 
genetical process did any significant improvments to the 'worth'.
It seems that always the best variant out of the pool is either the path 
choosen in the end, or at least the cost difference is _really_ low.


Andres


Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Robert Haas
Date:
On Thu, Jul 16, 2009 at 12:49 PM, Andres Freund<andres@anarazel.de> wrote:
> On Thursday 16 July 2009 17:59:58 Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>> > The default settings currently make it relatively hard to trigger geqo at
>> > all.
>>
>> Yes, and that was intentional.  One of the implications of what we're
>> discussing here is that geqo would get used a lot more for "typical
>> complex queries" (if there is any such thing as a typical one).  So
>> it's fully to be expected that the fallout would be pressure to improve
>> geqo in various ways.
>>
>> Given that we are at the start of the development cycle, that prospect
>> doesn't scare me --- there's plenty of time to fix whatever needs
>> fixing.  However, I am leaning to the feeling that I don't want to be
>> putting people in a position where they have no alternative but to use
>> geqo.  So adjusting rather than removing the collapse limits is seeming
>> like a good idea.
> Hm. I see a, a bit more fundamental problem with geqo:
> I tried several queries, and I found not a single one, where the whole
> genetical process did any significant improvments to the 'worth'.
> It seems that always the best variant out of the pool is either the path
> choosen in the end, or at least the cost difference is _really_ low.

Ouch.  Did you insert some debugging code to get that information, or
how did you come to that conclusion?

...Robert


Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Robert Haas
Date:
On Thu, Jul 16, 2009 at 11:32 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> If I set both collapse_limit variables to very high values (I used 999),
>> it takes ... um ... not sure; I gave up waiting after half an hour.
>> I also tried with geqo_effort reduced to the minimum of 1, but that
>> didn't produce a plan in reasonable time either (I gave up after ten
>> minutes).
>
> After I gave up letting the machine be idle to get a fair timing,
> I turned on oprofile monitoring.  It looks a bit interesting:

That is interesting, but there's not really enough detail here to see
what is going on.  I'm more interested in what the high-level
functions are doing that's causing these guys to be called so many
times.  As Greg says, if the planning time curve for GEQO isn't better
than the one for the standard planner, it's the epitome of pointless.

> So maybe a redesign of the equivalence-class joinclause mechanism is in
> order.  Still, this is unlikely to fix the fundamental issue that the
> time for large join problems grows nonlinearly.

Nonlinear is one thing, but this looks more like exponential.  I
understand that the standard planner is exponential; GEQO should not
be.

...Robert


Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Andres Freund
Date:
On Thursday 16 July 2009 19:13:55 Robert Haas wrote:
> On Thu, Jul 16, 2009 at 12:49 PM, Andres Freund<andres@anarazel.de> wrote:
> > On Thursday 16 July 2009 17:59:58 Tom Lane wrote:
> >> Andres Freund <andres@anarazel.de> writes:
> >> > The default settings currently make it relatively hard to trigger geqo
> >> > at all.
> >>
> >> Yes, and that was intentional.  One of the implications of what we're
> >> discussing here is that geqo would get used a lot more for "typical
> >> complex queries" (if there is any such thing as a typical one).  So
> >> it's fully to be expected that the fallout would be pressure to improve
> >> geqo in various ways.
> >>
> >> Given that we are at the start of the development cycle, that prospect
> >> doesn't scare me --- there's plenty of time to fix whatever needs
> >> fixing.  However, I am leaning to the feeling that I don't want to be
> >> putting people in a position where they have no alternative but to use
> >> geqo.  So adjusting rather than removing the collapse limits is seeming
> >> like a good idea.
> >
> > Hm. I see a, a bit more fundamental problem with geqo:
> > I tried several queries, and I found not a single one, where the whole
> > genetical process did any significant improvments to the 'worth'.
> > It seems that always the best variant out of the pool is either the path
> > choosen in the end, or at least the cost difference is _really_ low.
> Ouch.  Did you insert some debugging code to get that information, or
> how did you come to that conclusion?
Yes, I enabled GEQO_DEBUG and added some more debugging output.  

Btw, a higher generation count does not change that.

Andres


Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jul 16, 2009 at 11:32 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> So maybe a redesign of the equivalence-class joinclause mechanism is in
>> order. �Still, this is unlikely to fix the fundamental issue that the
>> time for large join problems grows nonlinearly.

> Nonlinear is one thing, but this looks more like exponential.  I
> understand that the standard planner is exponential; GEQO should not
> be.

Well, the equivclass code is new as of 8.3.  It's possible that this
got broken relatively recently ...
        regards, tom lane


Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Andres Freund
Date:
On Thursday 16 July 2009 19:22:30 Robert Haas wrote:
> On Thu, Jul 16, 2009 at 11:32 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> > I wrote:
> >> If I set both collapse_limit variables to very high values (I used 999),
> >> it takes ... um ... not sure; I gave up waiting after half an hour.
> >> I also tried with geqo_effort reduced to the minimum of 1, but that
> >> didn't produce a plan in reasonable time either (I gave up after ten
> >> minutes).
> >
> > After I gave up letting the machine be idle to get a fair timing,
> > I turned on oprofile monitoring.  It looks a bit interesting:
> That is interesting, but there's not really enough detail here to see
> what is going on.  I'm more interested in what the high-level
> functions are doing that's causing these guys to be called so many
> times.  As Greg says, if the planning time curve for GEQO isn't better
> than the one for the standard planner, it's the epitome of pointless.
It is not the actual genetic searching I now found out (or more precisely, 
read the trace correctly).

At the start of the query GEQO fills a pool with random paths through the 
searchspace. Unfortunately a random path is not very likely to succeed. So it 
checks and checks and...

Thats why that problem is not visible with a simple join out of 100 or so 
tables - all paths are valid there...

Andres



Re: Review remove {join,from}_collapse_limit, add enable_join_ordering

From
Kenneth Marshall
Date:
On Thu, Jul 16, 2009 at 06:49:08PM +0200, Andres Freund wrote:
> On Thursday 16 July 2009 17:59:58 Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > The default settings currently make it relatively hard to trigger geqo at
> > > all.
> >
> > Yes, and that was intentional.  One of the implications of what we're
> > discussing here is that geqo would get used a lot more for "typical
> > complex queries" (if there is any such thing as a typical one).  So
> > it's fully to be expected that the fallout would be pressure to improve
> > geqo in various ways.
> >
> > Given that we are at the start of the development cycle, that prospect
> > doesn't scare me --- there's plenty of time to fix whatever needs
> > fixing.  However, I am leaning to the feeling that I don't want to be
> > putting people in a position where they have no alternative but to use
> > geqo.  So adjusting rather than removing the collapse limits is seeming
> > like a good idea.
> Hm. I see a, a bit more fundamental problem with geqo:
> I tried several queries, and I found not a single one, where the whole 
> genetical process did any significant improvments to the 'worth'.
> It seems that always the best variant out of the pool is either the path 
> choosen in the end, or at least the cost difference is _really_ low.
> 
> 
> Andres
> 

Hi Andres,

From some of my reading of the literature on join order
optimization via random sampling, such as what would establish
the initial GEQO pool, there is a very good possibility of having
a "pretty good" plan in the first pool, especially for our larger
initial pool sizes of 100-1000. And in fact, the final plan has
a good chance of being of approximately the same cost as a member
of the initial pool. Uniform sampling alone can give you a close
to optimum plan 80% of the time with an initial sample size of
100. And using biased sampling raises that to 99% or better.

Regards,
Ken