Thread: GEQO status?
What is the status of the genetic algorithm query optimizer? Is this supposed to work well on many-table joins, or has it fallen out of favor or in disrepair? [I'm needing to optimize some large, many-table-join queries and wondering time spent configuring/understanding geqo would be fruitful...] Regards, Ed Loehr
> What is the status of the genetic algorithm query optimizer? Is this > supposed to work well on many-table joins, or has it fallen out of favor > or in disrepair? [I'm needing to optimize some large, many-table-join > queries and wondering time spent configuring/understanding geqo would be > fruitful...] It is the only techique we have to achieve adequate performance on many-table joins. It has received little work recently, but that may be due to having received no complaints or discussions that I can recall. - Thomas
Thomas Lockhart wrote: > > > What is the status of the genetic algorithm query optimizer? Is this > > supposed to work well on many-table joins, or has it fallen out of favor > > or in disrepair? [I'm needing to optimize some large, many-table-join > > queries and wondering time spent configuring/understanding geqo would be > > fruitful...] > > It is the only techique we have to achieve adequate performance on > many-table joins. It has received little work recently, but that may be > due to having received no complaints or discussions that I can recall. I'm having some trouble, not sure its related to GEQO. Is there a PGOPTIONS flag to turn it off to attempt isolate the problem? Ed
Ed Loehr <eloehr@austin.rr.com> writes: > What is the status of the genetic algorithm query optimizer? Is this > supposed to work well on many-table joins, or has it fallen out of favor > or in disrepair? It's supposed to work ;-). I'm not sure that the default parameters are optimal, however. If you experiment with other settings, please post your results. regards, tom lane
Tom Lane wrote: > > Ed Loehr <eloehr@austin.rr.com> writes: > > What is the status of the genetic algorithm query optimizer? Is this > > supposed to work well on many-table joins, or has it fallen out of favor > > or in disrepair? > > It's supposed to work ;-). I'm not sure that the default parameters are > optimal, however. If you experiment with other settings, please post your > results. Query time dropped from many minutes to 13 seconds on a 12-table join with a little tweaking from the default params: My $PGDATA/pg_geqo: ------------------- Pool_Size 1024 # Effort high Generations 100 Random_Seed 330418761 Selection_Bias 2.00 Similar performance with Generations setting of 800 derived from Effort. Regards, Ed Loehr
Thomas Lockhart wrote: > > > What is the status of the genetic algorithm query optimizer? Is this > > supposed to work well on many-table joins, or has it fallen out of favor > > or in disrepair? [I'm needing to optimize some large, many-table-join > > queries and wondering time spent configuring/understanding geqo would be > > fruitful...] > > It is the only techique we have to achieve adequate performance on > many-table joins. It has received little work recently, but that may be > due to having received no complaints or discussions that I can recall. At risk of being off-topic here, is there a reason why GEQO is off by default in the ODBC driver (postdrv.exe)? I vaguely recall something about this from a year ago, but can't find it. Regards, Ed Loehr
Ed Loehr <eloehr@austin.rr.com> writes: > is there a reason why GEQO is off by > default in the ODBC driver (postdrv.exe)? There may once have been a good reason for that, but it sounds like a mighty bad idea nowadays. AFAICT ODBC's default setting has been that way for as long as ODBC has been in our CVS tree, so no way to know who chose to do that, when, or why. regards, tom lane
Ed Loehr writes: > What is the status of the genetic algorithm query optimizer? Is this > supposed to work well on many-table joins, or has it fallen out of favor > or in disrepair? [I'm needing to optimize some large, many-table-join > queries and wondering time spent configuring/understanding geqo would be > fruitful...] I've seen a number of bug reports that would indicate to me the GEQO works less than perfectly. I vividly recall how, while working on my own code, mere additions of dummy clauses like '... AND 5=5' altered query results in seemingly random ways. That was admittedly quite a while ago, but the GEQO code hasn't changed since. I'd advise you to be *very* careful. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut <peter_e@gmx.net> writes: > I've seen a number of bug reports that would indicate to me the GEQO works > less than perfectly. I vividly recall how, while working on my own code, > mere additions of dummy clauses like '... AND 5=5' altered query results > in seemingly random ways. The choices made by GEQO are intentionally random, so I would expect variation in tuple output order even for repetitions of the identical query. If you got a semantically different result, that would indeed be a bug. But it would most likely be a bug in the core planner, since GEQO has essentially no influence over whether the produced plan is correct or not. GEQO merely forces specific choices of join order. All else is in the core planner. > That was admittedly quite a while ago, but the > GEQO code hasn't changed since. The planner has changed quite markedly over the past couple releases, so I don't put a lot of stock in old anecdotes. Let's see a test case. regards, tom lane
Tom Lane wrote: > > The choices made by GEQO are intentionally random, so I would expect > variation in tuple output order even for repetitions of the identical > query. If you got a semantically different result, that would indeed > be a bug. But it would most likely be a bug in the core planner, since > GEQO has essentially no influence over whether the produced plan is > correct or not. GEQO merely forces specific choices of join order. > All else is in the core planner. You can remove the randomness by setting the Seed configuration value, if the docs are correct. Regards, Ed Loehr
Ed Loehr <eloehr@austin.rr.com> writes: > You can remove the randomness by setting the Seed configuration value, True, but that's not the default setup. regards, tom lane
> I would set Seed per default. Even worse than a bad query path > is an unpredictable query path. I see no argument, that a random Seed > would buy us anything. This kindof bugs me -- if you leave it stuck at a preset value, it makes it possible to never delve into parts of solution space that have more optimal solutions than the default seed will ever find. Surely a random seed would make the solution better on average over a large number of queries. On the other hand, if the system as a whole is working well, it should not matter what the seed is. - Andrew