On Sun, Jul 19, 2009 at 4:00 PM, Tom Lane <tgl@postgresql.org> wrote:
> Log Message:
> -----------
> Rewrite GEQO's gimme_tree function so that it always finds a legal join
> sequence, even when the input "tour" doesn't lead directly to such a sequence.
> The stack logic that was added in 2004 only supported cases where relations
> that had to be joined to each other (due to join order restrictions) were
> adjacent in the tour. However, relying on a random search to figure that out
> is tremendously inefficient in large join problems, and could even fail
> completely (leading to "failed to make a valid plan" errors) if
> random_init_pool ran out of patience. It seems better to make the
> tour-to-plan transformation a little bit fuzzier so that every tour can form
> a legal plan, even though this means that apparently different tours will
> sometimes yield the same plan.
>
> In the same vein, get rid of the logic that knew that tours (a,b,c,d,...)
> are the same as tours (b,a,c,d,...), and therefore insisted the latter
> are invalid. The chance of generating two tours that differ only in
> this way isn't that high, and throwing out 50% of possible tours to
> avoid such duplication seems more likely to waste valuable genetic-
> refinement generations than to do anything useful.
>
> This leaves us with no cases in which geqo_eval will deem a tour invalid,
> so get rid of assorted kluges that tried to deal with such cases, in
> particular the undocumented assumption that DBL_MAX is an impossible
> plan cost.
>
> This is all per testing of Robert Haas' lets-remove-the-collapse-limits
> patch. That idea has crashed and burned, at least for now, but we still
> got something useful out of it.
>
> It's possible we should back-patch this change, since the "failed to make a
> valid plan" error can happen in existing releases; but I'd rather not until
> it has gotten more testing.
I think I just ran smack dab into this bug on 8.3.8 (RPM:
postgresql-8.3.8-1.fc10.i386). I had a query that wasn't coming out
very well with the default settings so I raised the collapse limits
and let GEQO have a crack at it. This was not a rousing success.
It didn't actually fail, but it did this sort of thing for a real long
time.
#0 0x08192c6b in bms_overlap ()
#1 0x081b6046 in have_join_order_restriction ()
#2 0x081a9438 in gimme_tree ()
#3 0x081a9515 in geqo_eval ()
#4 0x081a9d6c in random_init_pool ()
#5 0x081a9728 in geqo ()
#6 0x081abf8d in ?? ()
#7 0x081be4c3 in query_planner ()
#8 0x081beedb in ?? ()
#9 0x081c00ce in subquery_planner ()
#10 0x081c057c in standard_planner ()
#11 0x08207caa in pg_plan_query ()
#12 0x08207df4 in pg_plan_queries ()
#13 0x082083d4 in ?? ()
#14 0x08209b3f in PostgresMain ()
#15 0x081dc1e5 in ?? ()
#16 0x081dd20a in PostmasterMain ()
#17 0x08190f96 in main ()
Nothing makes you appreciate a query that takes 3564.709 ms to run
like one that takes 272302.799 ms...
...Robert