Thread: Optimizing Multiply Joins ???
Hi all, We are building a sophisticated and flexible database structure and thus, we have quite complicated and longish queries containing lots of joins. Using only a few test records in our structure we have performed some measures, and it is hard to interpret the results. Until we join no more than 10 tables the response time is below 0.2 s. joining the 11th table comes with a dramatic change: response time usually grows up to 5-7 s, I'we read the related pages of the documentation, and found the description of the default and the genetic optimizer too. And also found the story about the german knowledge-based system project where longer queries than 10 joins were also too slow. But I think (hope) we could have a solution, because all of our complex joins are following foreign keys. If we could give some hints to the planner about foreign keys, it should not generate plenty of unusable plans for the optimizer. Here I send an example: the query: select h.literal as division, j.literal as soatype, e.username, e.password, c.objectid from o_division as a join o_soa as b on b.divisionobjectid=a.objectid join o_soainstanceas c on c.soaobjectid=b.objectid join o_staff_rdl_soainstance_role_ as d on d.soainstanceobjectid=c.objectid join o_electronic as e on e.pointerobjectid=d.objectid join o_soatype as f on f.objectid=b.soatypeobjectid join o_meaning as g on g.objectid=a.name join o_meaning_rndl_language_role_ as h on h.meaningobjectid=g.objectid and h.languageobjectid=100001 join o_meaning as i on i.objectid=f.name join o_meaning_rndl_language_role_as j on j.meaningobjectid=i.objectid and j.languageobjectid=100001 join o_staff as k on k.objectid=d.staffobjectid join o_externalcontributoras l on l.pointerobjectid=k.objectid the structure behind it: [the arrows are representing the foreign keys.] a -> g <- h^|b -> f -> i <- j^|c^|d -> k <- l^|e results of this query:join from a to j takes 0.2 s a to k takes 4.8 s a to l takes 5.2 s I have examined the output of explain in all 3 cases, and I havethe feeling that the planner simply forgets the best solutionsin2nd and 3rd case. If this is not enough info for the answer I can send the tables, their contents, the output of the optimizer..... or whatever you need for the answer (including beer :) Attila
Meszaros Attila <tilla@draconis.elte.hu> writes: > Until we join no more than 10 tables the response time is below 0.2 s. > joining the 11th table comes with a dramatic change: response time > usually grows up to 5-7 s, That's interesting; apparently the regular optimizer is faster than the GEQO optimizer for your style of query. Try increasing the GEQO threshold (pg_option "geqo_rels") to more than 11. > I have examined the output of explain in all 3 cases, and I have > the feeling that the planner simply forgets the best solutions > in 2nd and 3rd case. The GEQO planner does not guarantee to find an optimal solution, it just does a random search through a limited number of possible solutions and uses the best one it happened across. > But I think (hope) we could have a solution, because all of our > complex joins are following foreign keys. Actually, as the 7.1 code currently stands, a query that uses explicit JOIN operators like yours does will always be implemented in exactly the given join order, with no searching. I haven't quite decided if that's a bug or a feature ... regards, tom lane
Hi, > That's interesting; apparently the regular optimizer is faster than the > GEQO optimizer for your style of query. Try increasing the GEQO > threshold (pg_option "geqo_rels") to more than 11. Changing this option in a psql session with 'set' has really helped Thanks.the results have changed this way:nr tables plain(s): more indices(2): geqo off(s):10: 0.2 0.3 0.311: 4.8 5.9 0.812: 5.2 7.0 1.8 But it seems, geqo_rels option is not parsed from the pg_optionfile. > Actually, as the 7.1 code currently stands, a query that uses explicit > JOIN operators like yours does will always be implemented in exactly > the given join order, with no searching. I haven't quite decided if > that's a bug or a feature ... Great !! If possible, it should be left in the codebase at least as an option later... Because we could have dozens(hundreds) of preoptimized queries. Attila
Meszaros Attila <tilla@draconis.elte.hu> writes: > Changing this option in a psql session with 'set' has really helped > > But it seems, geqo_rels option is not parsed from the pg_option > file. My mistake. In 7.0 the GEQO options are set in a file "pg_geqo" in the $PGDATA directory. (There should be a prototype file "pg_geqo.sample" there for you to copy and edit.) Peter Eisentraut has cleaned up the option handling for 7.1 so that GEQO options are handled like all the others... regards, tom lane
Hi, > Actually, as the 7.1 code currently stands, a query that uses explicit > JOIN operators like yours does will always be implemented in exactly > the given join order, with no searching. I haven't quite decided if > that's a bug or a feature ... Do you mean a "linear binary tree" like this is executed? /\ /\ f /\ e /\ d /\ ca b Or can we have some variations on the graph (with the samepreorder run: a,b,c,d,e,f) like this: /\ / \ / /\ /\ d /\ a /\ e f b c We could give hints on the joins with brackets this way:((a,(b,c)),(d,(e,f))) I'm not sure which version of standards allows to bracket joins,but I know sybase accepts the above form. How difficult it looks to hack the parser to accept this form, and pass the meaning to the planner? Attila
Meszaros Attila <tilla@draconis.elte.hu> writes: >> Actually, as the 7.1 code currently stands, a query that uses explicit >> JOIN operators like yours does will always be implemented in exactly >> the given join order, with no searching. I haven't quite decided if >> that's a bug or a feature ... > Do you mean a "linear binary tree" like this is executed? > /\ > /\ f > /\ e > /\ d > /\ c > a b If that's what you write, yes. You can parenthesize the JOIN clauses any way you like, though, and the 7.1 planner will follow that structure. For example SELECT ... FROM (a CROSS JOIN b) CROSS JOIN (c CROSS JOIN d) WHERE ... is semantically the same as FROM a,b,c,d, but given this JOIN form the planner will only consider plans that join a to b and join c to d and finally join those results. With the "FROM a,b,c,d" form it will do a search through all possible join orders, same as before. Also, you can mix styles: SELECT ... FROM a, b, c CROSS JOIN d WHERE ... which forces c and d to be joined first, but lets the planner have its head about what to do next. > I'm not sure which version of standards allows to bracket joins, > but I know sybase accepts the above form. SQL92 says <joined table> ::= <cross join> | <qualified join> | <left paren> <joined table><right paren> so anything that claims to accept SQL92 had better allow parentheses around JOIN expressions... regards, tom lane
Hi, > If that's what you write, yes. You can parenthesize the JOIN clauses > any way you like, though, and the 7.1 planner will follow that structure. Thanks for the exhaustive answer. Can I test this feature in the current snapshot? Attila
Meszaros Attila <tilla@draconis.elte.hu> writes: > Can I test this feature in the current snapshot? Sure. But see my message to pghackers on Tuesday for notes about what's not working yet in the JOIN support. regards, tom lane