Thread: Optimizing Multiply Joins ???

Optimizing Multiply Joins ???

From
Meszaros Attila
Date:
Hi all,

We are building a sophisticated and flexible database structure and thus,
we have quite complicated and longish queries containing lots of joins.

Using only a few test records in our structure we have performed some
measures, and it is hard to interpret the results.

Until we join no more than 10 tables the response time is below 0.2 s.
joining the 11th table comes with a dramatic change: response time
usually grows up to 5-7 s, 

I'we read the related pages of the documentation, and found the
description of the default and the genetic optimizer too. And also found
the story about the german knowledge-based system project where longer
queries than 10 joins were also too slow.

But I think (hope) we could have a solution, because all of our
complex joins are following foreign keys. 

If we could give some hints to the planner about foreign keys, it
should not generate plenty of unusable plans for the optimizer.


Here I send an example:

the query:
select        h.literal as division,        j.literal as soatype,        e.username,        e.password,
c.objectid
 
from        o_division as a        join o_soa as b                on b.divisionobjectid=a.objectid        join
o_soainstanceas c                on c.soaobjectid=b.objectid        join o_staff_rdl_soainstance_role_ as d
  on d.soainstanceobjectid=c.objectid        join o_electronic as e                on e.pointerobjectid=d.objectid
 join o_soatype as f                on f.objectid=b.soatypeobjectid        join o_meaning as g                on
g.objectid=a.name       join o_meaning_rndl_language_role_ as h                on h.meaningobjectid=g.objectid
     and h.languageobjectid=100001        join o_meaning as i                on i.objectid=f.name        join
o_meaning_rndl_language_role_as j                on j.meaningobjectid=i.objectid                and
j.languageobjectid=100001      join o_staff as k                on k.objectid=d.staffobjectid       join
o_externalcontributoras l                on l.pointerobjectid=k.objectid
 


the structure behind it:
[the arrows are representing the foreign keys.]
a -> g <- h^|b -> f -> i <- j^|c^|d -> k <- l^|e

results of this query:join from a to j takes 0.2 s      a to k takes 4.8 s      a to l takes 5.2 s
I have examined the output of explain in all 3 cases, and I havethe feeling that the planner simply forgets the best
solutionsin2nd and 3rd case.
 

If this is not enough info for the answer I can send the tables, their
contents, the output of the optimizer..... or whatever you need for the
answer (including beer :)

Attila



Re: Optimizing Multiply Joins ???

From
Tom Lane
Date:
Meszaros Attila <tilla@draconis.elte.hu> writes:
> Until we join no more than 10 tables the response time is below 0.2 s.
> joining the 11th table comes with a dramatic change: response time
> usually grows up to 5-7 s, 

That's interesting; apparently the regular optimizer is faster than the
GEQO optimizer for your style of query.  Try increasing the GEQO
threshold (pg_option "geqo_rels") to more than 11.

>     I have examined the output of explain in all 3 cases, and I have
>     the feeling that the planner simply forgets the best solutions
>     in 2nd and 3rd case.

The GEQO planner does not guarantee to find an optimal solution, it
just does a random search through a limited number of possible solutions
and uses the best one it happened across.

> But I think (hope) we could have a solution, because all of our
> complex joins are following foreign keys. 

Actually, as the 7.1 code currently stands, a query that uses explicit
JOIN operators like yours does will always be implemented in exactly
the given join order, with no searching.  I haven't quite decided if
that's a bug or a feature ...
        regards, tom lane


Re: Optimizing Multiply Joins ???

From
Meszaros Attila
Date:
Hi,     

> That's interesting; apparently the regular optimizer is faster than the
> GEQO optimizer for your style of query.  Try increasing the GEQO
> threshold (pg_option "geqo_rels") to more than 11.       Changing this option in a psql session with 'set' has really
helped      Thanks.the results have changed this way:nr tables     plain(s):  more indices(2):  geqo off(s):10:
0.2       0.3        0.311:        4.8        5.9        0.812:        5.2        7.0        1.8
 
       But it seems, geqo_rels option is not parsed from the pg_optionfile.       
> Actually, as the 7.1 code currently stands, a query that uses explicit
> JOIN operators like yours does will always be implemented in exactly
> the given join order, with no searching.  I haven't quite decided if
> that's a bug or a feature ...       Great !!       If possible, it should be left in the codebase at least as an
option later...       Because we could have dozens(hundreds) of preoptimized queries.
 

Attila




Re: Optimizing Multiply Joins ???

From
Tom Lane
Date:
Meszaros Attila <tilla@draconis.elte.hu> writes:
>         Changing this option in a psql session with 'set' has really helped
>
>         But it seems, geqo_rels option is not parsed from the pg_option
>     file.

My mistake.  In 7.0 the GEQO options are set in a file "pg_geqo" in the
$PGDATA directory.  (There should be a prototype file "pg_geqo.sample"
there for you to copy and edit.)

Peter Eisentraut has cleaned up the option handling for 7.1 so that GEQO
options are handled like all the others...
        regards, tom lane


Re: Optimizing Multiply Joins ???

From
Meszaros Attila
Date:
Hi,

> Actually, as the 7.1 code currently stands, a query that uses explicit
> JOIN operators like yours does will always be implemented in exactly
> the given join order, with no searching.  I haven't quite decided if
> that's a bug or a feature ...
Do you mean a "linear binary tree" like this is executed?
     /\    /\ f   /\ e  /\ d /\ ca  b
Or can we have some variations on the graph (with the samepreorder run: a,b,c,d,e,f) like this:
    /\               /  \              /   /\         /\  d /\        a /\  e  f         b  c            
We could give hints on the joins with brackets this way:((a,(b,c)),(d,(e,f)))
I'm not sure which version of standards allows to bracket joins,but I know sybase accepts the above form.
How difficult it looks to hack the parser to accept this form, and pass the meaning to the planner?

Attila




Re: Optimizing Multiply Joins ???

From
Tom Lane
Date:
Meszaros Attila <tilla@draconis.elte.hu> writes:
>> Actually, as the 7.1 code currently stands, a query that uses explicit
>> JOIN operators like yours does will always be implemented in exactly
>> the given join order, with no searching.  I haven't quite decided if
>> that's a bug or a feature ...

>     Do you mean a "linear binary tree" like this is executed?

>          /\
>         /\ f
>        /\ e
>       /\ d
>      /\ c
>     a  b

If that's what you write, yes.  You can parenthesize the JOIN clauses
any way you like, though, and the 7.1 planner will follow that structure.
For example

SELECT ... FROM (a CROSS JOIN b) CROSS JOIN (c CROSS JOIN d) WHERE ...

is semantically the same as FROM a,b,c,d, but given this JOIN form
the planner will only consider plans that join a to b and join c to d
and finally join those results.  With the "FROM a,b,c,d" form it will
do a search through all possible join orders, same as before.  Also,
you can mix styles:

SELECT ... FROM a, b, c CROSS JOIN d WHERE ...

which forces c and d to be joined first, but lets the planner have its
head about what to do next.

>     I'm not sure which version of standards allows to bracket joins,
>     but I know sybase accepts the above form.

SQL92 says
        <joined table> ::=               <cross join>             | <qualified join>             | <left paren> <joined
table><right paren>
 

so anything that claims to accept SQL92 had better allow parentheses
around JOIN expressions...
        regards, tom lane


Re: Optimizing Multiply Joins ???

From
Meszaros Attila
Date:
Hi,

> If that's what you write, yes.  You can parenthesize the JOIN clauses
> any way you like, though, and the 7.1 planner will follow that structure.
Thanks for the exhaustive answer.
Can I test this feature in the current snapshot?

Attila



Re: Optimizing Multiply Joins ???

From
Tom Lane
Date:
Meszaros Attila <tilla@draconis.elte.hu> writes:
>     Can I test this feature in the current snapshot?

Sure.  But see my message to pghackers on Tuesday for notes about what's
not working yet in the JOIN support.
        regards, tom lane