Re: Bunch o' dead code in GEQO - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Bunch o' dead code in GEQO
Date
Msg-id 24457.1074837057@sss.pgh.pa.us
Whole thread Raw
In response to Re: Bunch o' dead code in GEQO  (Neil Conway <neilc@samurai.com>)
List pgsql-hackers
Neil Conway <neilc@samurai.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Where are we going to find a representative test set of
>> dozen-or-more- way SQL join queries?

> As far as getting good lotsa-join queries, I think we can either:

>    (1) generate the queries programmatically

>    For example, star-schema join queries might be tractable via this
>    method.

Sure, we can generate umpteen thousand star joins in no time, but they
are all the same problem.  I don't think this is particularly helpful
either for finding optimizer bugs or for making choices about
performance issues.

An example of the kind of thing I'm worried about: I realized just
yesterday that GEQO is actively broken in 7.4 because it cannot generate
"bushy" plans.  As of 7.4 there are cases involving IN constructs where
the only valid plans are bushy.  For example, in the regression database:

regression=# set geqo_threshold to 3;
SET
regression=# explain select * from tenk1 where
regression-# unique1 in (select unique2 from tenk1 t2, int4_tbl t3 where hundred = f1) and
regression-# unique2 in (select unique1 from tenk1 t4, int4_tbl t5 where hundred = f1);
ERROR:  failed to make a valid plan

You could test star joins all day long and not find that bug.

>    (2) get the queries manually

>    This would involve either writing schema and a bunch of queries for
>    an "example app" (a la the Java Web Store), or getting a sanitized
>    version of the schema & common queries used by a few large PG
>    users. The latter might be the better way to go...

The only thing I'd really trust is a sampling of complex queries from
different real-world applications.  This will probably be hard to get,
and we can only hope to have dozens of queries not hundreds or
thousands.  We will also need to think about how we will get the
pg_statistic entries to correspond to the real-world situations.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: RFC: bufmgr locking changes
Next
From: David Garamond
Date:
Subject: Re: [pgsql-hackers-win32] What's left?