Thread: Bunch o' dead code in GEQO
The GEQO planner module contains six different recombination algorithms, only one of which is actually used --- the others are ifdef'd out, and have been ever since we got the code. Does anyone see a reason not to prune the deadwood? regards, tom lane
On Wed, 21 Jan 2004, Tom Lane wrote: > The GEQO planner module contains six different recombination algorithms, > only one of which is actually used --- the others are ifdef'd out, and > have been ever since we got the code. Does anyone see a reason not to > prune the deadwood? considering the recent discussion about REALLY slow query planning by the GEQO module, it might be worth testing each one to see which works best before lopping them off. But I don't do anything that needs GEQO, so for me, it doesn't really matter.
"scott.marlowe" <scott.marlowe@ihs.com> writes: > On Wed, 21 Jan 2004, Tom Lane wrote: >> The GEQO planner module contains six different recombination algorithms, > considering the recent discussion about REALLY slow query planning by the > GEQO module, it might be worth testing each one to see which works best > before lopping them off. I'm assuming that the original author of the GEQO code already did that testing ... regards, tom lane
On Thu, 22 Jan 2004, Tom Lane wrote: > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > On Wed, 21 Jan 2004, Tom Lane wrote: > >> The GEQO planner module contains six different recombination algorithms, > > > considering the recent discussion about REALLY slow query planning by the > > GEQO module, it might be worth testing each one to see which works best > > before lopping them off. > > I'm assuming that the original author of the GEQO code already did that > testing ... Hmmm. I was figuring he wasn't sure so he left them in for other people to test. Is this a part of the code that eats up much time, or something simple and fast that isn't part of the "GEQO takes 8 seconds to plan" problem?
"scott.marlowe" <scott.marlowe@ihs.com> writes: > On Thu, 22 Jan 2004, Tom Lane wrote: >> I'm assuming that the original author of the GEQO code already did that >> testing ... > Hmmm. I was figuring he wasn't sure so he left them in for other people > to test. Is this a part of the code that eats up much time, or something > simple and fast that isn't part of the "GEQO takes 8 seconds to plan" > problem? Well, the basic plan of the GEQO code is Step 1: generate a bunch of possible join paths at random. Step 2: randomly select a pair of paths from the current population, generate a new path that is some combination of these, and push it back into the population, dropping the worst path from the population. Repeat for a bunch of generations. Step 3: take the best path in the final population. The different recombination algorithms simply are different ways of generating a "child" path given two "parent" paths in step 2. Changing them wouldn't affect the runtime noticeably at all --- the primary cost is in evaluating each generated path, which is why the runtime is approximately the sum of the population size (step 1) and the number of generations (step 2). Possibly a different recombiner would give a better chance of finding a good plan, but I'm unconvinced. Arguably the recombiners that are there are all wrong anyway, since they were all invented to solve Traveling Salesman problems, which this is not quite. The only way we can do much about the runtime is to reduce the default population size. With the current default parameters, a large query will have population size 1024 and 400 generations, so about two-thirds of the runtime is in generating the initial random population; if we can't make a dent in that then we're not going to gain much. The question is what will this do to the average quality of the selected plans. One thing I'm thinking about is trying to improve the quality of the initial population by immediately discarding any really bad plans (for instance, use a heuristic that pays attention to which relations are linked by WHERE clauses). regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > I'm assuming that the original author of the GEQO code already did > that testing ... Removing the code without bothering to verify this assumption is a little unwise, IMHO: given the low quality of the rest of the GEQO code, I wouldn't be surprised to learn that the present default is not optimal in some or all circumstances. -Neil
Neil Conway <neilc@samurai.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> I'm assuming that the original author of the GEQO code already did >> that testing ... > Removing the code without bothering to verify this assumption is a > little unwise, IMHO: Fair enough. I did a little bit of poking around and it seems that ERX (edge-recombination crossover) is still considered one of the best available choices for solving Traveling Salesman problems via genetic optimization. Now there is the little issue that our problem isn't really TSP and might behave a bit differently, but I see no evidence to suggest that one of the other recombinator methods would do better. They were all designed to do TSP. The big problem here is that I don't see any practical method of obtaining indisputable proof that one method is better than another. Where are we going to find a representative test set of dozen-or-more- way SQL join queries? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Where are we going to find a representative test set of > dozen-or-more- way SQL join queries? Interesting that you should mention that. I've been thinking for a while that we need a much more extensive test suite for the query optimizer. This would allow us to more easily spot regressions in the optimizer, to get quantifiable data on the effect of optimizer improvements and optimizations, and it might end up being a good general-purpose performance benchmark as well. As far as getting good lotsa-join queries, I think we can either: (1) generate the queries programmatically For example, star-schema join queries might be tractable via this method. One nice benefit of generating the queries viathis method is that it should allow us to scale the number of joins pretty easily. One downside might be that we wouldn'tget the kind of diversity of queries that #2 might provide. (2) get the queries manually This would involve either writing schema and a bunch of queries for an "example app" (a la the Java Web Store), or gettinga sanitized version of the schema & common queries used by a few large PG users. The latter might be the betterway to go... We could do both, of course, which might be the way to go. Any thoughts? -Neil P.S. Unfortunately, I'm sufficiently busy right now that I won't be able to do any work on this any time soon -- I just wanted to toss out some ideas because I really think it's worth doing. Anyone who's interested is more than welcome to get started.
Neil Conway <neilc@samurai.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Where are we going to find a representative test set of >> dozen-or-more- way SQL join queries? > As far as getting good lotsa-join queries, I think we can either: > (1) generate the queries programmatically > For example, star-schema join queries might be tractable via this > method. Sure, we can generate umpteen thousand star joins in no time, but they are all the same problem. I don't think this is particularly helpful either for finding optimizer bugs or for making choices about performance issues. An example of the kind of thing I'm worried about: I realized just yesterday that GEQO is actively broken in 7.4 because it cannot generate "bushy" plans. As of 7.4 there are cases involving IN constructs where the only valid plans are bushy. For example, in the regression database: regression=# set geqo_threshold to 3; SET regression=# explain select * from tenk1 where regression-# unique1 in (select unique2 from tenk1 t2, int4_tbl t3 where hundred = f1) and regression-# unique2 in (select unique1 from tenk1 t4, int4_tbl t5 where hundred = f1); ERROR: failed to make a valid plan You could test star joins all day long and not find that bug. > (2) get the queries manually > This would involve either writing schema and a bunch of queries for > an "example app" (a la the Java Web Store), or getting a sanitized > version of the schema & common queries used by a few large PG > users. The latter might be the better way to go... The only thing I'd really trust is a sampling of complex queries from different real-world applications. This will probably be hard to get, and we can only hope to have dozens of queries not hundreds or thousands. We will also need to think about how we will get the pg_statistic entries to correspond to the real-world situations. regards, tom lane