On Tuesday 07 July 2009 17:40:50 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I cannot reasonably plan some queries with join_collapse_limit set to 20.
> > At least not without setting the geqo limit very low and a geqo_effort to
> > a low value.
> > So I would definitely not agree that removing j_c_l is a good idea.
> Can you show some specific examples? All of this discussion seems like
> speculation in a vacuum ...
As similar wishes came up multiple times now I started to create a schema I
may present which is sufficiently similar to show the same effects.
I had to cut down the complexity of the schema considerably - both for easier
understanding and easier writing of the demo schema.
I also have a moderately complex demo query similar to really used ones.
Autogenerated (GUI) queries do not use views like I did in the example one but
it seemed easier to play around with query size this way.
Also the real queries often have way much more conditions than the one I
present here.
Also I have not "tuned" the queries here in any way, the join order is not
optimized (like in the real application), but I don't think that does matter
for the purpose of this discussion
The queries itself only sketch what they are intended for and query many
fictional datapoints, but again I dont think this is a problem.
Is it helpfull this way?
Some numbers about the query_2.sql are attached. Short overview:
- a low from_collapse_limit is deadly
- a high from_collapse_limit is not costly here
- geqo_effort basically changes nothing
- geqo changes basically nothing
- with a higher join_collapse_limit (12) geqo=on costs quite a bit! (factor
20!). I double checked. At other times I get 'failed to make a valid plan'
The numbers are all 8.5 as of today.
Some explanations about the schema:
- It uses surrogate keys everywhere as the real schema employs some form of
row level, label based access checking (covert channel issues)
- The real schema uses partitions - I don't think they would be interesting
here?
- its definitely not the most beautiful schema I have seen, but I have to admit
that I cannot think of a much nicer one which serves the different purposes as
well- somewhat complex queries- new "information_set"'s and "information"'s are added frequently- automated and manual
dataentry has to work with such additions- GUI query tool needs to work in face of such changes
- I have seen similar schemas multiple times now.
- The original schema employs materialized views in parts (both for execution
and planning speed)
- The queries are crazy, but people definitely create/use them.
Andres