Re: Query plan for very large number of joins - Mailing list pgsql-performance

From Richard Huxton
Subject Re: Query plan for very large number of joins
Date
Msg-id 429F2D94.3030805@archonet.com
Whole thread Raw
In response to Query plan for very large number of joins  (<philb@vodafone.ie>)
Responses Re: Query plan for very large number of joins
List pgsql-performance
philb@vodafone.ie wrote:
> Hi,
>
> I am using PostgreSQL (7.4) with a schema that was generated
> automatically (using hibernate). The schema consists of about 650
> relations. One particular query (also generated automatically)
> consists of left joining approximately 350 tables.

May I be the first to offer an "ouch"!

> At this stage, most tables are empty and those with values have less
> than 50 entries. The query takes about 90 seconds to execute (on a
> P4, 2.6Ghz).
>
> All of the relations have a primary key which is indexed and all of
> the joins are on foreign keys which are explicitly declared. I've
> checked the obvious tunables (effective_cache_size, shared_memory and
> sort_buffer) but changing these has had no effect. The system has a
> total of 750MB RAM, I've varied the shared memory up to 256MB and the
> sort buffer up to 128MB without affecting the performance.

The sort-mem is the only thing I can see helping with a single query.

> Running the query as a JDBC prepared statement indicates that the
> query optimiser is spending a negligable amount of time on the task
> (~ 36 ms) compared to the executor (~ 90 seconds). The output of
> EXPLAIN indicates (AFAICT) that all of the joins are of type "Nested
> Loop Left Join" and all of the scans are of type "Seq Scan". I have
> refrained from posting the query and the query plan since these are
> 80K and 100K apiece but if anyone wants to see them I can certainly
> forward them on.

Well, if most tables are small then a seq-scan makes sense. Does it look
like it's estimating the number of rows badly anywhere? I'm not sure the
list will accept attachments that large - is it possible to upload them
somewhere accessible?

> My (uninformed) suspicion is that the optimiser has failed over to
> the default plan on the basis of the number of tables in the join. My
> question is, is there anyone out there using PostgreSQL with this
> size of schema? Is there anything that can be done to bring about the
> order of magnitude increase in speed that I need?

Well - the genetic planner must surely be kicking in here (see the
run-time configuration chapter of the manuals, query-planning,
geqo_threshold). However, I'm not sure how much leeway there is in
planning a largely left-joined query.

It could be there's some overhead in the executor that's only noticable
with hundreds of tables involved, you're running at about 0.25 secs per
join.

I take it you have no control over the schema or query, so there's not
much fiddling you can do. You've tried sort_mem, so there are only two
things I can think of:
1. Try the various enable_xxx config settings and see if disabling
seq-scan or the relevant join-type does anything (I'm not sure it will)
2. Try against 8.0 - there may be some improvement there.

Other people on this list have experience on larger systems than me, so
they may be able to help more.
--
   Richard Huxton
   Archonet Ltd

pgsql-performance by date:

Previous
From: ohp@pyrenet.fr
Date:
Subject: Re: SURVEY: who is running postgresql on 8 or more CPUs?
Next
From: "Matthew T. O'Connor"
Date:
Subject: Re: How to avoid database bloat