Thread: GEQO status?

GEQO status?

From
Ed Loehr
Date:
What is the status of the genetic algorithm query optimizer?  Is this
supposed to work well on many-table joins, or has it fallen out of favor
or in disrepair?  [I'm needing to optimize some large, many-table-join
queries and wondering time spent configuring/understanding geqo would be
fruitful...]

Regards,
Ed Loehr


Re: GEQO status?

From
Thomas Lockhart
Date:
> What is the status of the genetic algorithm query optimizer?  Is this
> supposed to work well on many-table joins, or has it fallen out of favor
> or in disrepair?  [I'm needing to optimize some large, many-table-join
> queries and wondering time spent configuring/understanding geqo would be
> fruitful...]

It is the only techique we have to achieve adequate performance on
many-table joins. It has received little work recently, but that may be
due to having received no complaints or discussions that I can recall.
                  - Thomas


Re: GEQO status?

From
Ed Loehr
Date:
Thomas Lockhart wrote:
> 
> > What is the status of the genetic algorithm query optimizer?  Is this
> > supposed to work well on many-table joins, or has it fallen out of favor
> > or in disrepair?  [I'm needing to optimize some large, many-table-join
> > queries and wondering time spent configuring/understanding geqo would be
> > fruitful...]
> 
> It is the only techique we have to achieve adequate performance on
> many-table joins. It has received little work recently, but that may be
> due to having received no complaints or discussions that I can recall.

I'm having some trouble, not sure its related to GEQO.  Is there a
PGOPTIONS flag to turn it off to attempt isolate the problem?

Ed


Re: GEQO status?

From
Tom Lane
Date:
Ed Loehr <eloehr@austin.rr.com> writes:
> What is the status of the genetic algorithm query optimizer?  Is this
> supposed to work well on many-table joins, or has it fallen out of favor
> or in disrepair?

It's supposed to work ;-).  I'm not sure that the default parameters are
optimal, however.  If you experiment with other settings, please post your
results.
        regards, tom lane


Re: GEQO status?

From
Ed Loehr
Date:
Tom Lane wrote:
> 
> Ed Loehr <eloehr@austin.rr.com> writes:
> > What is the status of the genetic algorithm query optimizer?  Is this
> > supposed to work well on many-table joins, or has it fallen out of favor
> > or in disrepair?
> 
> It's supposed to work ;-).  I'm not sure that the default parameters are
> optimal, however.  If you experiment with other settings, please post your
> results.

Query time dropped from many minutes to 13 seconds on a 12-table join
with a little tweaking from the default params:

My $PGDATA/pg_geqo:
-------------------
Pool_Size            1024
# Effort               high
Generations          100
Random_Seed          330418761
Selection_Bias       2.00

Similar performance with Generations setting of 800 derived from Effort.

Regards,
Ed Loehr


Re: GEQO status?

From
Ed Loehr
Date:
Thomas Lockhart wrote:
> 
> > What is the status of the genetic algorithm query optimizer?  Is this
> > supposed to work well on many-table joins, or has it fallen out of favor
> > or in disrepair?  [I'm needing to optimize some large, many-table-join
> > queries and wondering time spent configuring/understanding geqo would be
> > fruitful...]
> 
> It is the only techique we have to achieve adequate performance on
> many-table joins. It has received little work recently, but that may be
> due to having received no complaints or discussions that I can recall.

At risk of being off-topic here, is there a reason why GEQO is off by
default in the ODBC driver (postdrv.exe)?  I vaguely recall something
about this from a year ago, but can't find it.

Regards,
Ed Loehr


Re: GEQO status?

From
Tom Lane
Date:
Ed Loehr <eloehr@austin.rr.com> writes:
> is there a reason why GEQO is off by
> default in the ODBC driver (postdrv.exe)?

There may once have been a good reason for that, but it sounds like a
mighty bad idea nowadays.

AFAICT ODBC's default setting has been that way for as long as ODBC has
been in our CVS tree, so no way to know who chose to do that, when, or
why.
        regards, tom lane


Re: GEQO status?

From
Peter Eisentraut
Date:
Ed Loehr writes:

> What is the status of the genetic algorithm query optimizer?  Is this
> supposed to work well on many-table joins, or has it fallen out of favor
> or in disrepair?  [I'm needing to optimize some large, many-table-join
> queries and wondering time spent configuring/understanding geqo would be
> fruitful...]

I've seen a number of bug reports that would indicate to me the GEQO works
less than perfectly.  I vividly recall how, while working on my own code,
mere additions of dummy clauses like '... AND 5=5' altered query results
in seemingly random ways.  That was admittedly quite a while ago, but the
GEQO code hasn't changed since.  I'd advise you to be *very* careful.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: GEQO status?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I've seen a number of bug reports that would indicate to me the GEQO works
> less than perfectly.  I vividly recall how, while working on my own code,
> mere additions of dummy clauses like '... AND 5=5' altered query results
> in seemingly random ways.

The choices made by GEQO are intentionally random, so I would expect
variation in tuple output order even for repetitions of the identical
query.  If you got a semantically different result, that would indeed
be a bug.  But it would most likely be a bug in the core planner, since
GEQO has essentially no influence over whether the produced plan is
correct or not.  GEQO merely forces specific choices of join order.
All else is in the core planner.

> That was admittedly quite a while ago, but the
> GEQO code hasn't changed since.

The planner has changed quite markedly over the past couple releases,
so I don't put a lot of stock in old anecdotes.  Let's see a test case.
        regards, tom lane


Re: GEQO status?

From
Ed Loehr
Date:
Tom Lane wrote:
> 
> The choices made by GEQO are intentionally random, so I would expect
> variation in tuple output order even for repetitions of the identical
> query.  If you got a semantically different result, that would indeed
> be a bug.  But it would most likely be a bug in the core planner, since
> GEQO has essentially no influence over whether the produced plan is
> correct or not.  GEQO merely forces specific choices of join order.
> All else is in the core planner.

You can remove the randomness by setting the Seed configuration value, if
the docs are correct.

Regards,
Ed Loehr


Re: GEQO status?

From
Tom Lane
Date:
Ed Loehr <eloehr@austin.rr.com> writes:
> You can remove the randomness by setting the Seed configuration value,

True, but that's not the default setup.
        regards, tom lane


Re: GEQO status?

From
"Andrew Snow"
Date:

> I would set Seed per default. Even worse than a bad query path
> is an unpredictable query path. I see no argument, that a random Seed
> would buy us anything.

This kindof bugs me -- if you leave it stuck at a preset value, it makes it
possible to never delve into parts of solution space that have more optimal
solutions than the default seed will ever find.  Surely a random seed would
make the solution better on average over a large number of queries.

On the other hand, if the system as a whole is working well, it should not
matter what the seed is.


- Andrew