Re: [HACKERS] nested loops in joins, ambiguous rewrite rules - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] nested loops in joins, ambiguous rewrite rules
Date
Msg-id 7950.917809074@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] nested loops in joins, ambiguous rewrite rules  (Bruce Momjian <maillist@candle.pha.pa.us>)
Responses Re: [HACKERS] nested loops in joins, ambiguous rewrite rules  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
>> Well isn't that something!  Thanks so much for your help!
>> I set the GEQO variable to 4 and now the 11.5 minute query
>> executes in 6 seconds with this query plan:

Maybe this was already obvious to everyone else, but I just now got it
through my head that Charlie was not complaining about the time to
*execute* his seven-way join, but about the time to *plan* it.

(In other words, EXPLAIN was taking about as long as actually doing
the SELECT, right?)

The problem here is explained in the Postgres "Developer's Guide",
in the chapter titled "Genetic Query Optimization in Database Systems".
The regular planner/optimizer does a nearly exhaustive search through
all possible ways to execute the query --- and the number of possible
ways grows exponentially in the number of joins.

So you need a heuristic shortcut when there are lots of tables.  It may
not find the best possible execution plan, but it should find a pretty
good one in not too much time.  That's what the GEQO code does.

>> Are there any recommendations about what value *ought* to be
>> set for GEQO?  It seems to me like 'ON=8' is pretty high

Bruce Momjian <maillist@candle.pha.pa.us> writes:
> We have toyed with lowering it to 6.  I think someone said that was too
> low, but obviously, some queries need a value of 6 or lower.  I don't
> understand when that is the case, though.  Any comments from anyone?

The docs say that the number being used is just the number of base
tables mentioned in the query.  I wonder whether that is the right
number to look at.  In particular, the number of cases that the
exhaustive searcher would have to look at is not only dependent on the
number of underlying tables, but also on how many indexes there are to
consider using.

Perhaps if we develop a slightly more complex measure to compare against
the GEQO threshold, we can do better in deciding when to kick in GEQO.
A brain-dead suggestion is number of tables + number of indexes ...
does anyone know whether that is a good measure of the number of cases
considered in the optimizer?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Patches
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] nested loops in joins, ambiguous rewrite rules