Thread: Query plan for very large number of joins
Hi, I am using PostgreSQL (7.4) with a schema that was generated automatically (using hibernate). The schema consists of about 650 relations. One particular query (also generated automatically) consists of left joining approximately 350 tables. At this stage, most tables are empty and those with values have less than 50 entries. The query takes about 90 seconds to execute (on a P4, 2.6Ghz). All of the relations have a primary key which is indexed and all of the joins are on foreign keys which are explicitly declared. I've checked the obvious tunables (effective_cache_size, shared_memory and sort_buffer) but changing these has had no effect. The system has a total of 750MB RAM, I've varied the shared memory up to 256MB and the sort buffer up to 128MB without affecting the performance. Running the query as a JDBC prepared statement indicates that the query optimiser is spending a negligable amount of time on the task (~ 36 ms) compared to the executor (~ 90 seconds). The output of EXPLAIN indicates (AFAICT) that all of the joins are of type "Nested Loop Left Join" and all of the scans are of type "Seq Scan". I have refrained from posting the query and the query plan since these are 80K and 100K apiece but if anyone wants to see them I can certainly forward them on. My (uninformed) suspicion is that the optimiser has failed over to the default plan on the basis of the number of tables in the join. My question is, is there anyone out there using PostgreSQL with this size of schema? Is there anything that can be done to bring about the order of magnitude increase in speed that I need? Thanks for your help, -phil I'm using Vodafone Mail - to get your free mobile email account go to http://www.vodafone.ie Use of Vodafone Mail is subject to Terms and Conditions http://www.vodafone.ie/terms/website
philb@vodafone.ie wrote: > Hi, > > I am using PostgreSQL (7.4) with a schema that was generated > automatically (using hibernate). The schema consists of about 650 > relations. One particular query (also generated automatically) > consists of left joining approximately 350 tables. May I be the first to offer an "ouch"! > At this stage, most tables are empty and those with values have less > than 50 entries. The query takes about 90 seconds to execute (on a > P4, 2.6Ghz). > > All of the relations have a primary key which is indexed and all of > the joins are on foreign keys which are explicitly declared. I've > checked the obvious tunables (effective_cache_size, shared_memory and > sort_buffer) but changing these has had no effect. The system has a > total of 750MB RAM, I've varied the shared memory up to 256MB and the > sort buffer up to 128MB without affecting the performance. The sort-mem is the only thing I can see helping with a single query. > Running the query as a JDBC prepared statement indicates that the > query optimiser is spending a negligable amount of time on the task > (~ 36 ms) compared to the executor (~ 90 seconds). The output of > EXPLAIN indicates (AFAICT) that all of the joins are of type "Nested > Loop Left Join" and all of the scans are of type "Seq Scan". I have > refrained from posting the query and the query plan since these are > 80K and 100K apiece but if anyone wants to see them I can certainly > forward them on. Well, if most tables are small then a seq-scan makes sense. Does it look like it's estimating the number of rows badly anywhere? I'm not sure the list will accept attachments that large - is it possible to upload them somewhere accessible? > My (uninformed) suspicion is that the optimiser has failed over to > the default plan on the basis of the number of tables in the join. My > question is, is there anyone out there using PostgreSQL with this > size of schema? Is there anything that can be done to bring about the > order of magnitude increase in speed that I need? Well - the genetic planner must surely be kicking in here (see the run-time configuration chapter of the manuals, query-planning, geqo_threshold). However, I'm not sure how much leeway there is in planning a largely left-joined query. It could be there's some overhead in the executor that's only noticable with hundreds of tables involved, you're running at about 0.25 secs per join. I take it you have no control over the schema or query, so there's not much fiddling you can do. You've tried sort_mem, so there are only two things I can think of: 1. Try the various enable_xxx config settings and see if disabling seq-scan or the relevant join-type does anything (I'm not sure it will) 2. Try against 8.0 - there may be some improvement there. Other people on this list have experience on larger systems than me, so they may be able to help more. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > philb@vodafone.ie wrote: >> I am using PostgreSQL (7.4) with a schema that was generated >> automatically (using hibernate). The schema consists of about 650 >> relations. One particular query (also generated automatically) >> consists of left joining approximately 350 tables. > May I be the first to offer an "ouch"! Seconded. > However, I'm not sure how much leeway there is in > planning a largely left-joined query. Not much. The best hope for a better result is to order the LEFT JOIN clauses in a way that will produce a good plan. One thought is that I am not sure I believe the conclusion that planning is taking only 36 ms; even realizing that the exclusive use of left joins eliminates options for join order, there are still quite a lot of plans to consider. You should try both EXPLAIN and EXPLAIN ANALYZE from psql and see how long each takes. It'd also be interesting to keep an eye on how large the backend process grows while doing this --- maybe it's being driven into swap. Also: I'm not sure there *is* such a thing as a good plan for a 350-way join. It may be time to reconsider your data representation. If Hibernate really forces this on you, it may be time to reconsider your choice of tool. regards, tom lane
> I am using PostgreSQL (7.4) with a schema that was generated > automatically (using hibernate). > The schema consists of about 650 relations. One particular query (also > generated > automatically) consists of left joining approximately 350 tables. At this Just out of curiosity, what application is this ? And what are the reasons for so many tables ...and especially such a query ? Not criticizing, but curious.
Tom Lane schrieb: >Richard Huxton <dev@archonet.com> writes: > > >>philb@vodafone.ie wrote: >> >> >>>I am using PostgreSQL (7.4) with a schema that was generated >>>automatically (using hibernate). The schema consists of about 650 >>>relations. One particular query (also generated automatically) >>>consists of left joining approximately 350 tables. >>> >>> > > > >>May I be the first to offer an "ouch"! >> >> > >Seconded. > > > >>However, I'm not sure how much leeway there is in >>planning a largely left-joined query. >> >> > >Not much. The best hope for a better result is to order the LEFT JOIN >clauses in a way that will produce a good plan. > > If this is the best way, you should consider to use an sql query and not the hibernate ql language in this case. This is possible with Hibernate! I suppose you could also consider a view in Postgre and let Hibernate read from this view. This is also possible. >One thought is that I am not sure I believe the conclusion that planning >is taking only 36 ms; even realizing that the exclusive use of left >joins eliminates options for join order, there are still quite a lot of >plans to consider. You should try both EXPLAIN and EXPLAIN ANALYZE >from psql and see how long each takes. It'd also be interesting to keep >an eye on how large the backend process grows while doing this --- maybe >it's being driven into swap. > >Also: I'm not sure there *is* such a thing as a good plan for a 350-way >join. It may be time to reconsider your data representation. If >Hibernate really forces this on you, it may be time to reconsider your >choice of tool. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > > > -- Kind Regards / Viele Grüße Sebastian Hennebrueder ----- http://www.laliluna.de/tutorials.html Tutorials for Java, Struts, JavaServer Faces, JSP, Hibernate, EJB and more.
>>> I am using PostgreSQL (7.4) with a schema that was generated >>> automatically (using hibernate). The schema consists of about 650 >>> relations. One particular query (also generated automatically) >>> consists of left joining approximately 350 tables. [snip] >One thought is that I am not sure I believe the conclusion that planning >is taking only 36 ms; even realizing that the exclusive use of left >joins eliminates options for join order, there are still quite a lot of >plans to consider. You should try both EXPLAIN and EXPLAIN ANALYZE >from psql and see how long each takes. It'd also be interesting to keep >an eye on how large the backend process grows while doing this --- maybe >it's being driven into swap. Thanks for the suggestion. I've timed both the EXPLAIN and the EXPLAIN ANALYZE operations. Both operations took 1m 37s. The analyze output indicates that the query execution time was 950ms. This doesn't square with the JDBC prepareStatement executing in 36ms. My guess is that the prepare was actually a no-op but I haven't found anything about this yet. So, is it correct to interpret this as the query planner taking an awful long time? Is it possible to force the query planner to adopt a specific strategy and not search for alternatives (I'm aware of the noXX options, it's the reverse logic that I'm thinking of here). Alternatively, is there some way to check if the query planner is bottlenecking on a specific resource? Finally, PFC was asking about the nature of the application, it's not a specific application just a generic bit of infrastructure consisting of a transformation of the UBL schema. Despite being fairly restricted in scope, the schema is highly denormalized hence the large number of tables. Thanks for all your help. -phil I'm using Vodafone Mail - to get your free mobile email account go to http://www.vodafone.ie Use of Vodafone Mail is subject to Terms and Conditions http://www.vodafone.ie/terms/website
<philb@vodafone.ie> writes: > Thanks for the suggestion. I've timed both the EXPLAIN and the EXPLAIN ANALYZE operations. > Both operations took 1m 37s. The analyze output indicates that the query > execution time was 950ms. This doesn't square with the JDBC prepareStatement > executing in 36ms. My guess is that the prepare was actually a no-op but > I haven't found anything about this yet. Only in very recent JDBCs does prepareStatement do much of anything. > So, is it correct to interpret this as the query planner taking an > awful long time? Looks that way. > Is it possible to force the query planner to adopt a specific strategy > and not search for alternatives (I'm aware of the noXX options, it's the > reverse logic that I'm thinking of here). There's no positive forcing method. But you could probably save some time by disabling both mergejoin and hashjoin, now that you know it's going to end up picking nestloop for each join anyway. Even more important: are you sure that *every* one of the joins is a LEFT JOIN? Even a couple of regular joins will let it fool around choosing different join orders. > Alternatively, is there some way to check if the query planner is > bottlenecking on a specific resource? I think it would be interesting to try profiling it. I'm not really expecting to find anything easily-fixable, but you never know. From what you said before, the database is not all that large --- would you be willing to send me a database dump and the text of the query off-list? regards, tom lane
Anyone following this thread might be interested to know that disabling the merge and hash joins (as suggested below) resulted in the execution time dropping from ~90 seconds to ~35 seconds. Disabling GEQO has brought about a marginal reduction (~1 second, pretty much within the the margin of error) Tom, a quick grep indicates that all of the joins are left joins so there's no scope for tweaking there. I'll send you the schema + query offlist, anyone else curious about it, let me know. Thanks again, -phil ><philb@vodafone.ie> writes: >> Thanks for the suggestion. I've timed both the EXPLAIN and the EXPLAIN ANALYZE operations. >> Both operations took 1m 37s. The analyze output indicates that the query >> execution time was 950ms. This doesn't square with the JDBC prepareStatement >> executing in 36ms. My guess is that the prepare was actually a no-op but >> I haven't found anything about this yet. > >Only in very recent JDBCs does prepareStatement do much of anything. > >> So, is it correct to interpret this as the query planner taking an >> awful long time? > >Looks that way. > >> Is it possible to force the query planner to adopt a specific strategy >> and not search for alternatives (I'm aware of the noXX options, it's the >> reverse logic that I'm thinking of here). > >There's no positive forcing method. But you could probably save some >time by disabling both mergejoin and hashjoin, now that you know it's >going to end up picking nestloop for each join anyway. Even more >important: are you sure that *every* one of the joins is a LEFT JOIN? >Even a couple of regular joins will let it fool around choosing >different join orders. > >> Alternatively, is there some way to check if the query planner is >> bottlenecking on a specific resource? > >I think it would be interesting to try profiling it. I'm not really >expecting to find anything easily-fixable, but you never know. From >what you said before, the database is not all that large --- would >you be willing to send me a database dump and the text of the query >off-list? I'm using Vodafone Mail - to get your free mobile email account go to http://www.vodafone.ie Use of Vodafone Mail is subject to Terms and Conditions http://www.vodafone.ie/terms/website
<philb@vodafone.ie> writes: > I've attached the schema and query text, hopefully it will be of some use > to you. Note that both are taken from the HyperUBL project > (https://hyperubl.dev.java.net/). Sadly, at this stage I think it's > time for me to try alternatives to either Hibernate or Postgresql. Thanks. Profiling on 7.4 I get this for an EXPLAIN (after vacuum analyzing the database): % cumulative self self total time seconds seconds calls Ks/call Ks/call name 61.66 618.81 618.81 2244505819 0.00 0.00 compare_path_costs 15.01 769.44 150.63 1204882 0.00 0.00 add_path 8.08 850.57 81.13 772077 0.00 0.00 nth 3.76 888.27 37.70 1113598 0.00 0.00 nconc 2.59 914.30 26.03 233051 0.00 0.00 find_joininfo_node 2.23 936.70 22.40 30659124 0.00 0.00 bms_equal 1.14 948.14 11.44 39823463 0.00 0.00 equal 0.77 955.84 7.70 83300 0.00 0.00 find_base_rel This is with no special planner settings. Obviously the problem is that it's considering way too many different paths. We did do something about that in 8.0 (basically, throw away paths with "nearly the same" cost) ... but the bottom line didn't improve a whole lot. CVS tip profile for the same case is % cumulative self self total time seconds seconds calls s/call s/call name 38.37 176.41 176.41 53344348 0.00 0.00 list_nth_cell 35.26 338.52 162.11 196481 0.00 0.00 get_rte_attribute_is_dropped 5.42 363.44 24.92 233051 0.00 0.00 find_joininfo_node 4.72 385.14 21.70 30659416 0.00 0.00 bms_equal 4.09 403.95 18.81 53344348 0.00 0.00 list_nth 2.31 414.58 10.63 37347920 0.00 0.00 equal 1.40 421.03 6.45 83299 0.00 0.00 find_base_rel 1.08 426.01 4.98 617917 0.00 0.00 SearchCatCache 0.90 430.13 4.12 5771640 0.00 0.00 AllocSetAlloc The get_rte_attribute_is_dropped calls (and list_nth/list_nth_cell, which are mostly being called from there) arise from a rather hastily added patch that prevents failure when a JOIN list in a stored view refers to a since-dropped column of an underlying relation. I had not realized that that check could have O(N^2) behavior in deeply nested joins, but it does. Obviously we'll have to rethink that. After that it looks like the next hotspot is find_joininfo_node (and bms_equal which is mostly getting called from there). We could maybe fix that by rethinking the joininfo data structure --- right now it's a collection of simple Lists, which betrays the planner's Lispy heritage ;-). Again, that's not something I've ever seen at the top of a profile before --- there may be some O(N^2) behavior involved here too, but I've not analyzed it in detail. It does look like 8.0 would be about a factor of 2 faster for you than 7.4, but the real fix will probably have to wait for 8.1. Also: the 8.0 problem is definitely an O(N^2) type of deal, which means if you could reduce the depth of nesting by a factor of 2 the cost would go down 4x. You said this was an automatically generated query, so there may not be much you can do about it, but if you could parenthesize the FROM list a bit more intelligently the problem would virtually go away. What you have is effectively FROM ((((a left join b) left join c) left join d) .... so the nesting goes all the way down. With something like FROM ((a left join b) left join c ...) left join ((d left join e) left join f ...) the max nesting depth would be halved. I don't understand your schema at all so I'm not sure what an appropriate nesting might look like, but maybe there is a short-term workaround to be found there. (This will *not* help on 7.4, as the bottleneck there is completely different.) regards, tom lane
On Fri, 2005-06-03 at 13:22 +0100, philb@vodafone.ie wrote: > > >>> I am using PostgreSQL (7.4) with a schema that was generated > >>> automatically (using hibernate). The schema consists of about 650 > >>> relations. One particular query (also generated automatically) > >>> consists of left joining approximately 350 tables. > Despite being fairly restricted in scope, > the schema is highly denormalized hence the large number of tables. Do you mean normalized? Or do you mean you've pushed the superclass details down onto each of the leaf classes? I guess I'm interested in what type of modelling led you to have so many tables in the first place? Gotta say, never seen 350 table join before in a real app. Wouldn't it be possible to smooth out the model and end up with less tables? Or simply break things up somewhere slightly down from the root of the class hierarchy? Best Regards, Simon Riggs
>> Despite being fairly restricted in scope, >> the schema is highly denormalized hence the large number of tables. > >Do you mean normalized? Or do you mean you've pushed the superclass >details down onto each of the leaf classes? Sorry, I meant normalized, typing faster than I'm thinking here:) The schema was generated by hyperjaxb, a combination of Hibernate and JAXB. This allows one to go from XSD -> Object model -> Persistance in a single step. I'm just getting the hang of Hibernate so I don't know how flexible its' strategy is. Obviously though, the emphasis is on correctness first so while the same result could possibly be achieved more quickly with many smaller queries, it probably considers that it's up to the DBMS to handle optimisation (not unreasonably either I guess) Since the entire process from the XSD onwards is automated, there's no scope for tweaking either the OR mapping code or the DB schema itself except for isolated troubleshooting purposes. The XSD set in question is the UBL schema published by OASIS which has about 650 relations, I thought it would be nice to have this as a standard component in future development. Regards, -phil > >I guess I'm interested in what type of modelling led you to have so many >tables in the first place? > >Gotta say, never seen 350 table join before in a real app. > >Wouldn't it be possible to smooth out the model and end up with less >tables? Or simply break things up somewhere slightly down from the root >of the class hierarchy? > >Best Regards, Simon Riggs I'm using Vodafone Mail - to get your free mobile email account go to http://www.vodafone.ie Use of Vodafone Mail is subject to Terms and Conditions http://www.vodafone.ie/terms/website