Re: Tuning/performance issue... - Mailing list pgsql-performance
From | Tom Lane |
---|---|
Subject | Re: Tuning/performance issue... |
Date | |
Msg-id | 25573.1064982530@sss.pgh.pa.us Whole thread Raw |
In response to | Tuning/performance issue... (David Griffiths <dgriffiths@boats.com>) |
List | pgsql-performance |
David Griffiths <dgriffiths@boats.com> writes: > ... FROM commercial_entity, country, user_account, > address_list LEFT JOIN state_province ON address_list.state_province_id > = state_province.state_province_id > LEFT JOIN contact_info ON address_list.contact_info_id = > contact_info.contact_info_id > WHERE ... I believe what you're getting burnt by is that PG's planner interprets this as forcing the address_list * state_province * contact_info join to be done before it joins those tables to commercial_entity, country, and user_account --- for discussion see http://www.postgresql.org/docs/7.3/static/explicit-joins.html Unfortunately your WHERE-clause restriction conditions are on address_list, commercial_entity, and user_account; and it seems the address_list constraint is very weak. So the plan ends up forming a large fraction of the address_list * state_province * contact_info join, only to throw it away again when there's no matching rows selected from commercial_entity and user_account. The actual runtime and actual row counts from the EXPLAIN ANALYZE output show that this is what's happening. The most efficient way to handle this query would probably be to join the three tables with restrictions first, and then join the other tables to those. You could force this with not too much rewriting using something like (untested, but I think it's right) ... FROM commercial_entity CROSS JOIN user_account CROSS JOIN address_list LEFT JOIN state_province ON address_list.state_province_id = state_province.state_province_id LEFT JOIN contact_info ON address_list.contact_info_id = contact_info.contact_info_id CROSS JOIN country WHERE ... The explicit JOINs associate left-to-right, so this gives the intended join order. (In your original query, explicit JOIN binds more tightly than commas do.) The reason PG's planner doesn't discover this join order for itself is that it's written to not attempt to re-order outer joins from the syntactically defined ordering. In general, such reordering would change the results. It is possible to analyze the query and prove that certain reorderings are valid (don't change the results), but we don't currently have code to do that. > As a reference, our production Oracle database (exactly the same > hardware, but RAID-mirroring) with way more load can handle the query in > 1-2 seconds. I have MySQL 4.0.14 with InnoDB on the same machine > (shutdown when I am testing Postgres, and visa versa) and it does the > query in 0.20 seconds. I'm prepared to believe that Oracle contains code that actually does the analysis about which outer-join reorderings are valid, and is then able to find the right join order by deduction. The last I heard about MySQL, they have no join-order analysis at all; they unconditionally interpret this type of query left-to-right, ie as ... FROM ((((commercial_entity CROSS JOIN country) CROSS JOIN user_account) CROSS JOIN address_list) LEFT JOIN state_province ON ...) LEFT JOIN contact_info ON ... WHERE ... This is clearly at odds with the SQL spec's syntactically defined join order semantics. It's possible that it always yields the same results as the spec requires, but I'm not at all sure about that. In any case this strategy is certainly not "better" than ours, it just performs poorly on a different set of queries. Would I be out of line to speculate that your query was previously tuned to work well in MySQL? regards, tom lane
pgsql-performance by date: