Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a |
Date | |
Msg-id | 603c8f070911271523n4be4d1e3m9df602a7716d9608@mail.gmail.com Whole thread Raw |
In response to | Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
Re: Re: [COMMITTERS] pgsql: Rewrite GEQO`s gimme_tree function so that it always finds a |
List | pgsql-hackers |
On Fri, Nov 27, 2009 at 3:05 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Nov 9, 2009 at 1:42 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Nov 9, 2009 at 1:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> On Mon, Nov 9, 2009 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> Too bad you don't have debug symbols ... it'd be interesting to see >>>>> how long that list is. >>> >>>> I stopped it a couple of times. Lengths of list1, list2 respectively: >>> >>>> 8876, 20 >>>> 14649, 18 >>>> 15334, 10 >>>> 17148, 18 >>>> 18173, 18 >>> >>> Yowza. 18000 distinct paths for one relation? Could we see the test >>> case? >> >> Well, the test case isn't simple, and I'm not sure that my employer >> would be pleased if I posted it to a public mailing list. The general >> thrust of it is that [...] > > Test case attached. Load this into an empty database and then: > > set geqo to off; > set join_collapse_limit to 100; > set from_collapse_limit to 100; > select * from foo_view order by name; > > I guess it's somewhat unsurprising that you can make the planner get > into trouble with the above settings - we've been over that ground > before. But it might be possibly interesting that such a simple > schema is capable of generating so many paths. This breaks 40,000 > without finishing. Hey, wait a minute. Unless I'm missing something, the items being accumulated here are not paths (as Tom said upthread and I naively believed), but RelOptInfos. It looks like we create a RelOptInfo for every legal join order, so this is going to happen on any large-ish query where the joins can be reordered relatively freely. I don't think there's any easy fix for this. When the joins can be reordered relatively freely, the number of legal join orders is roughly n!. It might be possible to reduce the overhead associated with a single one of those join orderings (which consists of a RelOptInfo, some supporting data structures, and however many paths) but given the explosive growth of n! that won't buy very much at all, and it won't make the lists shorter in any case. Playing around with this a bit, I was easily able to get 2-second planing times on 15 table join, 6-second planning times on a 16 table join and 30-second planning times on a 17 table join. This makes it hard to support raising the GEQO threshold, as most recently suggested by Greg Sabino Mullane here (and previously by me on an earlier thread): http://archives.postgresql.org/pgsql-hackers/2009-11/msg01106.php What sucks about the current geqo_threshold mechanism is that the number of tables is a fairly coarse predictor of the number of legal join orders. On some queries it is close to n! - on others it is far less. It would be nice if we had a way to either (1) approximate the number of legal join orders before we start planning the query, and base the GEQO decision on that number rather than on the number of tables, or (2) always start with the standard (exhaustive) planner, and switch to some kind of approximation algorithm (GEQO or otherwise) if we find that to be impractical. But neither of these seems at all straightforward. ...Robert
pgsql-hackers by date: