Re: Query plan for very large number of joins - Mailing list pgsql-performance
From | Richard Huxton |
---|---|
Subject | Re: Query plan for very large number of joins |
Date | |
Msg-id | 429F2D94.3030805@archonet.com Whole thread Raw |
In response to | Query plan for very large number of joins (<philb@vodafone.ie>) |
Responses |
Re: Query plan for very large number of joins
|
List | pgsql-performance |
philb@vodafone.ie wrote: > Hi, > > I am using PostgreSQL (7.4) with a schema that was generated > automatically (using hibernate). The schema consists of about 650 > relations. One particular query (also generated automatically) > consists of left joining approximately 350 tables. May I be the first to offer an "ouch"! > At this stage, most tables are empty and those with values have less > than 50 entries. The query takes about 90 seconds to execute (on a > P4, 2.6Ghz). > > All of the relations have a primary key which is indexed and all of > the joins are on foreign keys which are explicitly declared. I've > checked the obvious tunables (effective_cache_size, shared_memory and > sort_buffer) but changing these has had no effect. The system has a > total of 750MB RAM, I've varied the shared memory up to 256MB and the > sort buffer up to 128MB without affecting the performance. The sort-mem is the only thing I can see helping with a single query. > Running the query as a JDBC prepared statement indicates that the > query optimiser is spending a negligable amount of time on the task > (~ 36 ms) compared to the executor (~ 90 seconds). The output of > EXPLAIN indicates (AFAICT) that all of the joins are of type "Nested > Loop Left Join" and all of the scans are of type "Seq Scan". I have > refrained from posting the query and the query plan since these are > 80K and 100K apiece but if anyone wants to see them I can certainly > forward them on. Well, if most tables are small then a seq-scan makes sense. Does it look like it's estimating the number of rows badly anywhere? I'm not sure the list will accept attachments that large - is it possible to upload them somewhere accessible? > My (uninformed) suspicion is that the optimiser has failed over to > the default plan on the basis of the number of tables in the join. My > question is, is there anyone out there using PostgreSQL with this > size of schema? Is there anything that can be done to bring about the > order of magnitude increase in speed that I need? Well - the genetic planner must surely be kicking in here (see the run-time configuration chapter of the manuals, query-planning, geqo_threshold). However, I'm not sure how much leeway there is in planning a largely left-joined query. It could be there's some overhead in the executor that's only noticable with hundreds of tables involved, you're running at about 0.25 secs per join. I take it you have no control over the schema or query, so there's not much fiddling you can do. You've tried sort_mem, so there are only two things I can think of: 1. Try the various enable_xxx config settings and see if disabling seq-scan or the relevant join-type does anything (I'm not sure it will) 2. Try against 8.0 - there may be some improvement there. Other people on this list have experience on larger systems than me, so they may be able to help more. -- Richard Huxton Archonet Ltd
pgsql-performance by date: