Thread: Bunch o' dead code in GEQO

Bunch o' dead code in GEQO

From

Tom Lane

Date:

22 January 2004, 00:16:18

The GEQO planner module contains six different recombination algorithms,
only one of which is actually used --- the others are ifdef'd out, and
have been ever since we got the code.  Does anyone see a reason not to
prune the deadwood?
        regards, tom lane

Re: Bunch o' dead code in GEQO

From

"scott.marlowe"

Date:

22 January 2004, 11:25:17

On Wed, 21 Jan 2004, Tom Lane wrote:

> The GEQO planner module contains six different recombination algorithms,
> only one of which is actually used --- the others are ifdef'd out, and
> have been ever since we got the code.  Does anyone see a reason not to
> prune the deadwood?

considering the recent discussion about REALLY slow query planning by the 
GEQO module, it might be worth testing each one to see which works best 
before lopping them off.

But I don't do anything that needs GEQO, so for me, it doesn't really 
matter.

Re: Bunch o' dead code in GEQO

From

Tom Lane

Date:

22 January 2004, 12:08:04

"scott.marlowe" <scott.marlowe@ihs.com> writes:
> On Wed, 21 Jan 2004, Tom Lane wrote:
>> The GEQO planner module contains six different recombination algorithms,

> considering the recent discussion about REALLY slow query planning by the 
> GEQO module, it might be worth testing each one to see which works best 
> before lopping them off.

I'm assuming that the original author of the GEQO code already did that
testing ...
        regards, tom lane

Re: Bunch o' dead code in GEQO

From

"scott.marlowe"

Date:

22 January 2004, 12:36:18

On Thu, 22 Jan 2004, Tom Lane wrote:

> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > On Wed, 21 Jan 2004, Tom Lane wrote:
> >> The GEQO planner module contains six different recombination algorithms,
> 
> > considering the recent discussion about REALLY slow query planning by the 
> > GEQO module, it might be worth testing each one to see which works best 
> > before lopping them off.
> 
> I'm assuming that the original author of the GEQO code already did that
> testing ...

Hmmm.  I was figuring he wasn't sure so he left them in for other people 
to test.  Is this a part of the code that eats up much time, or something 
simple and fast that isn't part of the "GEQO takes 8 seconds to plan" 
problem?

Re: Bunch o' dead code in GEQO

From

Tom Lane

Date:

22 January 2004, 12:52:29

"scott.marlowe" <scott.marlowe@ihs.com> writes:
> On Thu, 22 Jan 2004, Tom Lane wrote:
>> I'm assuming that the original author of the GEQO code already did that
>> testing ...

> Hmmm.  I was figuring he wasn't sure so he left them in for other people 
> to test.  Is this a part of the code that eats up much time, or something 
> simple and fast that isn't part of the "GEQO takes 8 seconds to plan" 
> problem?

Well, the basic plan of the GEQO code is

Step 1: generate a bunch of possible join paths at random.

Step 2: randomly select a pair of paths from the current population,
generate a new path that is some combination of these, and push it back
into the population, dropping the worst path from the population.
Repeat for a bunch of generations.

Step 3: take the best path in the final population.

The different recombination algorithms simply are different ways of
generating a "child" path given two "parent" paths in step 2.  Changing
them wouldn't affect the runtime noticeably at all --- the primary cost
is in evaluating each generated path, which is why the runtime is 
approximately the sum of the population size (step 1) and the number
of generations (step 2).  Possibly a different recombiner would give a
better chance of finding a good plan, but I'm unconvinced.  Arguably the
recombiners that are there are all wrong anyway, since they were all
invented to solve Traveling Salesman problems, which this is not quite.

The only way we can do much about the runtime is to reduce the default
population size.  With the current default parameters, a large query
will have population size 1024 and 400 generations, so about two-thirds
of the runtime is in generating the initial random population; if we
can't make a dent in that then we're not going to gain much.  The
question is what will this do to the average quality of the selected
plans.  One thing I'm thinking about is trying to improve the quality of
the initial population by immediately discarding any really bad plans
(for instance, use a heuristic that pays attention to which relations
are linked by WHERE clauses).
        regards, tom lane

Re: Bunch o' dead code in GEQO

From

Neil Conway

Date:

22 January 2004, 14:23:04

Tom Lane <tgl@sss.pgh.pa.us> writes:
> I'm assuming that the original author of the GEQO code already did
> that testing ...

Removing the code without bothering to verify this assumption is a
little unwise, IMHO: given the low quality of the rest of the GEQO
code, I wouldn't be surprised to learn that the present default is not
optimal in some or all circumstances.

-Neil

Re: Bunch o' dead code in GEQO

From

Tom Lane

Date:

23 January 2004, 01:11:20

Neil Conway <neilc@samurai.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I'm assuming that the original author of the GEQO code already did
>> that testing ...

> Removing the code without bothering to verify this assumption is a
> little unwise, IMHO:

Fair enough.  I did a little bit of poking around and it seems that
ERX (edge-recombination crossover) is still considered one of the best
available choices for solving Traveling Salesman problems via genetic
optimization.  Now there is the little issue that our problem isn't
really TSP and might behave a bit differently, but I see no evidence
to suggest that one of the other recombinator methods would do better.
They were all designed to do TSP.

The big problem here is that I don't see any practical method of
obtaining indisputable proof that one method is better than another.
Where are we going to find a representative test set of dozen-or-more-
way SQL join queries?
        regards, tom lane

Re: Bunch o' dead code in GEQO

From

Neil Conway

Date:

23 January 2004, 01:35:13

Tom Lane <tgl@sss.pgh.pa.us> writes:
> Where are we going to find a representative test set of
> dozen-or-more- way SQL join queries?

Interesting that you should mention that. I've been thinking for a
while that we need a much more extensive test suite for the query
optimizer. This would allow us to more easily spot regressions in the
optimizer, to get quantifiable data on the effect of optimizer
improvements and optimizations, and it might end up being a good
general-purpose performance benchmark as well.

As far as getting good lotsa-join queries, I think we can either:
(1) generate the queries programmatically
For example, star-schema join queries might be tractable via this method. One nice benefit of generating the queries
viathis method is that it should allow us to scale the number of joins pretty easily. One downside might be that we
wouldn'tget the kind of diversity of queries that #2 might provide.

(2) get the queries manually
This would involve either writing schema and a bunch of queries for an "example app" (a la the Java Web Store), or
gettinga sanitized version of the schema & common queries used by a few large PG users. The latter might be the
betterway to go...

We could do both, of course, which might be the way to go.

Any thoughts?

-Neil

P.S. Unfortunately, I'm sufficiently busy right now that I won't be
able to do any work on this any time soon -- I just wanted to toss out
some ideas because I really think it's worth doing. Anyone who's
interested is more than welcome to get started.

Re: Bunch o' dead code in GEQO

From

Tom Lane

Date:

23 January 2004, 01:51:35

Neil Conway <neilc@samurai.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Where are we going to find a representative test set of
>> dozen-or-more- way SQL join queries?

> As far as getting good lotsa-join queries, I think we can either:

>    (1) generate the queries programmatically

>    For example, star-schema join queries might be tractable via this
>    method.

Sure, we can generate umpteen thousand star joins in no time, but they
are all the same problem.  I don't think this is particularly helpful
either for finding optimizer bugs or for making choices about
performance issues.

An example of the kind of thing I'm worried about: I realized just
yesterday that GEQO is actively broken in 7.4 because it cannot generate
"bushy" plans.  As of 7.4 there are cases involving IN constructs where
the only valid plans are bushy.  For example, in the regression database:

regression=# set geqo_threshold to 3;
SET
regression=# explain select * from tenk1 where
regression-# unique1 in (select unique2 from tenk1 t2, int4_tbl t3 where hundred = f1) and
regression-# unique2 in (select unique1 from tenk1 t4, int4_tbl t5 where hundred = f1);
ERROR:  failed to make a valid plan

You could test star joins all day long and not find that bug.

>    (2) get the queries manually

>    This would involve either writing schema and a bunch of queries for
>    an "example app" (a la the Java Web Store), or getting a sanitized
>    version of the schema & common queries used by a few large PG
>    users. The latter might be the better way to go...

The only thing I'd really trust is a sampling of complex queries from
different real-world applications.  This will probably be hard to get,
and we can only hope to have dozens of queries not hundreds or
thousands.  We will also need to think about how we will get the
pg_statistic entries to correspond to the real-world situations.
        regards, tom lane