Thread: Query plan for very large number of joins

Query plan for very large number of joins

From
Date:
Hi,

I am using PostgreSQL (7.4) with a schema that was generated automatically (using hibernate).
The schema consists of about 650 relations. One particular query (also generated
automatically) consists of left joining approximately 350 tables. At this
stage, most tables are empty and those with values have less than 50 entries.
The query takes about 90 seconds to execute (on a P4, 2.6Ghz).

All of the relations have a primary key which is indexed and all of the joins
are on foreign keys which are explicitly declared. I've checked the obvious
tunables (effective_cache_size, shared_memory and sort_buffer) but changing
these has had no effect. The system has a total of 750MB RAM, I've varied
the shared memory up to 256MB and the sort buffer up to 128MB without affecting
the performance.

Running the query as a JDBC prepared statement indicates that the query optimiser
is spending a negligable amount of time on the task (~ 36 ms) compared to
the executor (~ 90 seconds). The output of EXPLAIN indicates (AFAICT) that
all of the joins are of type "Nested Loop Left Join" and all of the scans
are of type "Seq Scan". I have refrained from posting the query and the
query plan since these are 80K and 100K apiece but if anyone wants to see
them I can certainly forward them on.

My (uninformed) suspicion is that the optimiser has failed over to the default
plan on the basis of the number of tables in the join. My question is, is
there anyone out there using PostgreSQL with this size of schema? Is there
anything that can be done to bring about the order of magnitude increase
in speed that I need?

Thanks for your help,
 -phil


I'm using Vodafone Mail - to get your free mobile email account go to http://www.vodafone.ie
Use of Vodafone Mail is subject to Terms and Conditions  http://www.vodafone.ie/terms/website



Re: Query plan for very large number of joins

From
Richard Huxton
Date:
philb@vodafone.ie wrote:
> Hi,
>
> I am using PostgreSQL (7.4) with a schema that was generated
> automatically (using hibernate). The schema consists of about 650
> relations. One particular query (also generated automatically)
> consists of left joining approximately 350 tables.

May I be the first to offer an "ouch"!

> At this stage, most tables are empty and those with values have less
> than 50 entries. The query takes about 90 seconds to execute (on a
> P4, 2.6Ghz).
>
> All of the relations have a primary key which is indexed and all of
> the joins are on foreign keys which are explicitly declared. I've
> checked the obvious tunables (effective_cache_size, shared_memory and
> sort_buffer) but changing these has had no effect. The system has a
> total of 750MB RAM, I've varied the shared memory up to 256MB and the
> sort buffer up to 128MB without affecting the performance.

The sort-mem is the only thing I can see helping with a single query.

> Running the query as a JDBC prepared statement indicates that the
> query optimiser is spending a negligable amount of time on the task
> (~ 36 ms) compared to the executor (~ 90 seconds). The output of
> EXPLAIN indicates (AFAICT) that all of the joins are of type "Nested
> Loop Left Join" and all of the scans are of type "Seq Scan". I have
> refrained from posting the query and the query plan since these are
> 80K and 100K apiece but if anyone wants to see them I can certainly
> forward them on.

Well, if most tables are small then a seq-scan makes sense. Does it look
like it's estimating the number of rows badly anywhere? I'm not sure the
list will accept attachments that large - is it possible to upload them
somewhere accessible?

> My (uninformed) suspicion is that the optimiser has failed over to
> the default plan on the basis of the number of tables in the join. My
> question is, is there anyone out there using PostgreSQL with this
> size of schema? Is there anything that can be done to bring about the
> order of magnitude increase in speed that I need?

Well - the genetic planner must surely be kicking in here (see the
run-time configuration chapter of the manuals, query-planning,
geqo_threshold). However, I'm not sure how much leeway there is in
planning a largely left-joined query.

It could be there's some overhead in the executor that's only noticable
with hundreds of tables involved, you're running at about 0.25 secs per
join.

I take it you have no control over the schema or query, so there's not
much fiddling you can do. You've tried sort_mem, so there are only two
things I can think of:
1. Try the various enable_xxx config settings and see if disabling
seq-scan or the relevant join-type does anything (I'm not sure it will)
2. Try against 8.0 - there may be some improvement there.

Other people on this list have experience on larger systems than me, so
they may be able to help more.
--
   Richard Huxton
   Archonet Ltd

Re: Query plan for very large number of joins

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
> philb@vodafone.ie wrote:
>> I am using PostgreSQL (7.4) with a schema that was generated
>> automatically (using hibernate). The schema consists of about 650
>> relations. One particular query (also generated automatically)
>> consists of left joining approximately 350 tables.

> May I be the first to offer an "ouch"!

Seconded.

> However, I'm not sure how much leeway there is in
> planning a largely left-joined query.

Not much.  The best hope for a better result is to order the LEFT JOIN
clauses in a way that will produce a good plan.

One thought is that I am not sure I believe the conclusion that planning
is taking only 36 ms; even realizing that the exclusive use of left
joins eliminates options for join order, there are still quite a lot of
plans to consider.  You should try both EXPLAIN and EXPLAIN ANALYZE
from psql and see how long each takes.  It'd also be interesting to keep
an eye on how large the backend process grows while doing this --- maybe
it's being driven into swap.

Also: I'm not sure there *is* such a thing as a good plan for a 350-way
join.  It may be time to reconsider your data representation.  If
Hibernate really forces this on you, it may be time to reconsider your
choice of tool.

            regards, tom lane

Re: Query plan for very large number of joins

From
PFC
Date:

> I am using PostgreSQL (7.4) with a schema that was generated
> automatically (using hibernate).
> The schema consists of about 650 relations. One particular query (also
> generated
> automatically) consists of left joining approximately 350 tables. At this

    Just out of curiosity, what application is this ?
    And what are the reasons for so many tables ...and especially such a
query ?
    Not criticizing, but curious.

Re: Query plan for very large number of joins

From
Sebastian Hennebrueder
Date:

Tom Lane schrieb:

>Richard Huxton <dev@archonet.com> writes:
>
>
>>philb@vodafone.ie wrote:
>>
>>
>>>I am using PostgreSQL (7.4) with a schema that was generated
>>>automatically (using hibernate). The schema consists of about 650
>>>relations. One particular query (also generated automatically)
>>>consists of left joining approximately 350 tables.
>>>
>>>
>
>
>
>>May I be the first to offer an "ouch"!
>>
>>
>
>Seconded.
>
>
>
>>However, I'm not sure how much leeway there is in
>>planning a largely left-joined query.
>>
>>
>
>Not much.  The best hope for a better result is to order the LEFT JOIN
>clauses in a way that will produce a good plan.
>
>
If this is the best way, you should consider to use an sql query and not
the hibernate ql language in this case. This is possible with Hibernate!
I suppose you could also consider a view in Postgre and let Hibernate
read from this view. This is also possible.

>One thought is that I am not sure I believe the conclusion that planning
>is taking only 36 ms; even realizing that the exclusive use of left
>joins eliminates options for join order, there are still quite a lot of
>plans to consider.  You should try both EXPLAIN and EXPLAIN ANALYZE
>from psql and see how long each takes.  It'd also be interesting to keep
>an eye on how large the backend process grows while doing this --- maybe
>it's being driven into swap.
>
>Also: I'm not sure there *is* such a thing as a good plan for a 350-way
>join.  It may be time to reconsider your data representation.  If
>Hibernate really forces this on you, it may be time to reconsider your
>choice of tool.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>
>
>

--
Kind Regards / Viele Grüße

Sebastian Hennebrueder

-----
http://www.laliluna.de/tutorials.html
Tutorials for Java, Struts, JavaServer Faces, JSP, Hibernate, EJB and more.


Re: Query plan for very large number of joins

From
Date:

>>> I am using PostgreSQL (7.4) with a schema that was generated
>>> automatically (using hibernate). The schema consists of about 650
>>> relations. One particular query (also generated automatically)
>>> consists of left joining approximately 350 tables.

[snip]

>One thought is that I am not sure I believe the conclusion that planning
>is taking only 36 ms; even realizing that the exclusive use of left
>joins eliminates options for join order, there are still quite a lot of
>plans to consider.  You should try both EXPLAIN and EXPLAIN ANALYZE
>from psql and see how long each takes.  It'd also be interesting to keep
>an eye on how large the backend process grows while doing this --- maybe
>it's being driven into swap.


Thanks for the suggestion. I've timed both the EXPLAIN and the EXPLAIN ANALYZE operations.
Both operations took 1m 37s. The analyze output indicates that the query
execution time was 950ms. This doesn't square with the JDBC prepareStatement
executing in 36ms. My guess is that the prepare was actually a no-op but
I haven't found anything about this yet.

So, is it correct to interpret this as the query planner taking an awful long
time? Is it possible to force the query planner to adopt a specific strategy
and not search for alternatives (I'm aware of the noXX options, it's the
reverse logic that I'm thinking of here). Alternatively, is there some way
to check if the query planner is bottlenecking on a specific resource?

Finally, PFC was asking about the nature of the application, it's not a
specific application just a generic bit of infrastructure consisting of
a transformation of the UBL schema. Despite being fairly restricted in scope,
the schema is highly denormalized hence the large number of tables.

Thanks for all your help.
 -phil







I'm using Vodafone Mail - to get your free mobile email account go to http://www.vodafone.ie
Use of Vodafone Mail is subject to Terms and Conditions  http://www.vodafone.ie/terms/website



Re: Query plan for very large number of joins

From
Tom Lane
Date:
<philb@vodafone.ie> writes:
> Thanks for the suggestion. I've timed both the EXPLAIN and the EXPLAIN ANALYZE operations.
> Both operations took 1m 37s. The analyze output indicates that the query
> execution time was 950ms. This doesn't square with the JDBC prepareStatement
> executing in 36ms. My guess is that the prepare was actually a no-op but
> I haven't found anything about this yet.

Only in very recent JDBCs does prepareStatement do much of anything.

> So, is it correct to interpret this as the query planner taking an
> awful long time?

Looks that way.

> Is it possible to force the query planner to adopt a specific strategy
> and not search for alternatives (I'm aware of the noXX options, it's the
> reverse logic that I'm thinking of here).

There's no positive forcing method.  But you could probably save some
time by disabling both mergejoin and hashjoin, now that you know it's
going to end up picking nestloop for each join anyway.  Even more
important: are you sure that *every* one of the joins is a LEFT JOIN?
Even a couple of regular joins will let it fool around choosing
different join orders.

> Alternatively, is there some way to check if the query planner is
> bottlenecking on a specific resource?

I think it would be interesting to try profiling it.  I'm not really
expecting to find anything easily-fixable, but you never know.  From
what you said before, the database is not all that large --- would
you be willing to send me a database dump and the text of the query
off-list?

            regards, tom lane

Re: Query plan for very large number of joins

From
Date:
Anyone following this thread might be interested to know that disabling
the merge and hash joins (as suggested below) resulted in the execution
time dropping from ~90 seconds to ~35 seconds. Disabling GEQO has brought
about a marginal reduction (~1 second, pretty much within the the margin
of error)

Tom, a quick grep indicates that all of the joins are left joins so there's no
scope for tweaking there. I'll send you the schema + query offlist, anyone
else curious about it, let me know.

Thanks again,
 -phil



><philb@vodafone.ie> writes:
>> Thanks for the suggestion. I've timed both the EXPLAIN and the EXPLAIN
ANALYZE operations.
>> Both operations took 1m 37s. The analyze output indicates that the query
>> execution time was 950ms. This doesn't square with the JDBC prepareStatement
>> executing in 36ms. My guess is that the prepare was actually a no-op
but
>> I haven't found anything about this yet.
>
>Only in very recent JDBCs does prepareStatement do much of anything.
>
>> So, is it correct to interpret this as the query planner taking an
>> awful long time?
>
>Looks that way.
>
>> Is it possible to force the query planner to adopt a specific strategy
>> and not search for alternatives (I'm aware of the noXX options, it's
the
>> reverse logic that I'm thinking of here).
>
>There's no positive forcing method.  But you could probably save some
>time by disabling both mergejoin and hashjoin, now that you know it's
>going to end up picking nestloop for each join anyway.  Even more
>important: are you sure that *every* one of the joins is a LEFT JOIN?
>Even a couple of regular joins will let it fool around choosing
>different join orders.
>
>> Alternatively, is there some way to check if the query planner is
>> bottlenecking on a specific resource?
>
>I think it would be interesting to try profiling it.  I'm not really
>expecting to find anything easily-fixable, but you never know.  From
>what you said before, the database is not all that large --- would
>you be willing to send me a database dump and the text of the query
>off-list?



I'm using Vodafone Mail - to get your free mobile email account go to http://www.vodafone.ie
Use of Vodafone Mail is subject to Terms and Conditions  http://www.vodafone.ie/terms/website



Re: Query plan for very large number of joins

From
Tom Lane
Date:
<philb@vodafone.ie> writes:
> I've attached the schema and query text, hopefully it will be of some use
> to you. Note that both are taken from the HyperUBL project
> (https://hyperubl.dev.java.net/).  Sadly, at this stage I think it's
> time for me to try alternatives to either Hibernate or Postgresql.

Thanks.  Profiling on 7.4 I get this for an EXPLAIN (after vacuum
analyzing the database):

  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ks/call  Ks/call  name
 61.66    618.81   618.81 2244505819     0.00     0.00  compare_path_costs
 15.01    769.44   150.63  1204882     0.00     0.00  add_path
  8.08    850.57    81.13   772077     0.00     0.00  nth
  3.76    888.27    37.70  1113598     0.00     0.00  nconc
  2.59    914.30    26.03   233051     0.00     0.00  find_joininfo_node
  2.23    936.70    22.40 30659124     0.00     0.00  bms_equal
  1.14    948.14    11.44 39823463     0.00     0.00  equal
  0.77    955.84     7.70    83300     0.00     0.00  find_base_rel

This is with no special planner settings.  Obviously the problem is that
it's considering way too many different paths.  We did do something
about that in 8.0 (basically, throw away paths with "nearly the same"
cost) ... but the bottom line didn't improve a whole lot.  CVS tip
profile for the same case is

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 38.37    176.41   176.41 53344348     0.00     0.00  list_nth_cell
 35.26    338.52   162.11   196481     0.00     0.00  get_rte_attribute_is_dropped
  5.42    363.44    24.92   233051     0.00     0.00  find_joininfo_node
  4.72    385.14    21.70 30659416     0.00     0.00  bms_equal
  4.09    403.95    18.81 53344348     0.00     0.00  list_nth
  2.31    414.58    10.63 37347920     0.00     0.00  equal
  1.40    421.03     6.45    83299     0.00     0.00  find_base_rel
  1.08    426.01     4.98   617917     0.00     0.00  SearchCatCache
  0.90    430.13     4.12  5771640     0.00     0.00  AllocSetAlloc

The get_rte_attribute_is_dropped calls (and list_nth/list_nth_cell,
which are mostly being called from there) arise from a rather hastily
added patch that prevents failure when a JOIN list in a stored view
refers to a since-dropped column of an underlying relation.  I had not
realized that that check could have O(N^2) behavior in deeply nested
joins, but it does.  Obviously we'll have to rethink that.

After that it looks like the next hotspot is find_joininfo_node
(and bms_equal which is mostly getting called from there).  We could
maybe fix that by rethinking the joininfo data structure --- right now
it's a collection of simple Lists, which betrays the planner's Lispy
heritage ;-).  Again, that's not something I've ever seen at the top
of a profile before --- there may be some O(N^2) behavior involved
here too, but I've not analyzed it in detail.

It does look like 8.0 would be about a factor of 2 faster for you
than 7.4, but the real fix will probably have to wait for 8.1.

Also: the 8.0 problem is definitely an O(N^2) type of deal, which means
if you could reduce the depth of nesting by a factor of 2 the cost would
go down 4x.  You said this was an automatically generated query, so
there may not be much you can do about it, but if you could parenthesize
the FROM list a bit more intelligently the problem would virtually go
away.  What you have is effectively

    FROM ((((a left join b) left join c) left join d) ....

so the nesting goes all the way down.  With something like

    FROM ((a left join b) left join c ...)
             left join
             ((d left join e) left join f ...)

the max nesting depth would be halved.  I don't understand your schema
at all so I'm not sure what an appropriate nesting might look like, but
maybe there is a short-term workaround to be found there.  (This will
*not* help on 7.4, as the bottleneck there is completely different.)

            regards, tom lane

Re: Query plan for very large number of joins

From
Simon Riggs
Date:
On Fri, 2005-06-03 at 13:22 +0100, philb@vodafone.ie wrote:
>
> >>> I am using PostgreSQL (7.4) with a schema that was generated
> >>> automatically (using hibernate). The schema consists of about 650
> >>> relations. One particular query (also generated automatically)
> >>> consists of left joining approximately 350 tables.

> Despite being fairly restricted in scope,
> the schema is highly denormalized hence the large number of tables.

Do you mean normalized? Or do you mean you've pushed the superclass
details down onto each of the leaf classes?

I guess I'm interested in what type of modelling led you to have so many
tables in the first place?

Gotta say, never seen 350 table join before in a real app.

Wouldn't it be possible to smooth out the model and end up with less
tables? Or simply break things up somewhere slightly down from the root
of the class hierarchy?

Best Regards, Simon Riggs




Re: Query plan for very large number of joins

From
Date:
>> Despite being fairly restricted in scope,
>> the schema is highly denormalized hence the large number of tables.
>
>Do you mean normalized? Or do you mean you've pushed the superclass
>details down onto each of the leaf classes?

Sorry, I meant normalized, typing faster than I'm thinking here:) The schema
was generated by hyperjaxb, a combination of Hibernate and JAXB. This allows
one to go from XSD -> Object model -> Persistance in a single step. I'm
just getting the hang of Hibernate so I don't know how flexible its' strategy
is. Obviously though, the emphasis is on correctness first so while the
same result could possibly be achieved more quickly with many smaller queries,
it probably considers that it's up to the DBMS to handle optimisation (not
unreasonably either I guess)

Since the entire process from the XSD onwards is automated, there's no scope
for tweaking either the OR mapping code or the DB schema itself except for
isolated troubleshooting purposes. The XSD set in question is the UBL schema
published by OASIS which has about 650 relations, I thought it would be
nice to have this as a standard component in future development.

Regards,
 -phil


>
>I guess I'm interested in what type of modelling led you to have so many
>tables in the first place?
>
>Gotta say, never seen 350 table join before in a real app.
>
>Wouldn't it be possible to smooth out the model and end up with less
>tables? Or simply break things up somewhere slightly down from the root
>of the class hierarchy?
>
>Best Regards, Simon Riggs


I'm using Vodafone Mail - to get your free mobile email account go to http://www.vodafone.ie
Use of Vodafone Mail is subject to Terms and Conditions  http://www.vodafone.ie/terms/website