Re: Tuning/performance issue... - Mailing list pgsql-performance
From | David Griffiths |
---|---|
Subject | Re: Tuning/performance issue... |
Date | |
Msg-id | 074601c387df$b5939270$6501a8c0@griffiths2 Whole thread Raw |
In response to | Tuning/performance issue... (David Griffiths <dgriffiths@boats.com>) |
Responses |
Re: Tuning/performance issue...
Re: Tuning/performance issue... |
List | pgsql-performance |
> The most efficient way to handle this query would probably be to join > the three tables with restrictions first, and then join the other tables > to those. You could force this with not too much rewriting using > something like (untested, but I think it's right) > > ... FROM commercial_entity CROSS JOIN user_account CROSS JOIN > address_list LEFT JOIN state_province ON address_list.state_province_id > = state_province.state_province_id > LEFT JOIN contact_info ON address_list.contact_info_id = > contact_info.contact_info_id > CROSS JOIN country > WHERE ... > > The explicit JOINs associate left-to-right, so this gives the intended > join order. (In your original query, explicit JOIN binds more tightly > than commas do.) Ok - that's interesting - I'll have to do some reading and more testing. > The reason PG's planner doesn't discover this join order for itself > is that it's written to not attempt to re-order outer joins from the > syntactically defined ordering. In general, such reordering would > change the results. It is possible to analyze the query and prove that > certain reorderings are valid (don't change the results), but we don't > currently have code to do that. Not sure I follow. Are you saying that, depending on when the outer-join is applied to the rows found at the time, you may end up with a different set of rows? I would have expected the optimizer to do the outer-joins last, as the extra data received by the outer-joins is not mandatory, and won't affect the rows that were retreived by joining user_account, address_list, and commercial_entity. An outer join would *never* be the most restrictive join in a query. I thought (from my readings on Oracle query tuning) that finding the most restrictive table/index was the first task of an optimizer. Reduce the result set as quickly as possible. That query has the line, "AND commercial_entity.commercial_entity_id=225528", which uses an index (primary key) and uses an "=". I would have expected that to be done first, then joined with the other inner-join tables, and finally have the outer-joins applied to the final result set to fill in the "might be there" data. Anyway, if the optimizer does the outer-joins first (address_list with state_province and contact_info), then it's picking the table with the most rows (address_list has 200K+ rows, where the other 3 big tables have 70K-90K). Would re-ordering the FROM clause (and LEFT JOIN portions) help? Could you give an example where applying an outer-join at a different time could result in different results? I think I can see at situation where you use part of the results in the outer-join in the where clause, but I am not sure. > I'm prepared to believe that Oracle contains code that actually does the > analysis about which outer-join reorderings are valid, and is then able > to find the right join order by deduction. I'm not sure about Oracle (other than what I stated above). In fact, about half the time, updating table stats to try to get the Oracle optimizer to do a better job on a query results in even worse performance. > ... FROM ((((commercial_entity CROSS JOIN country) CROSS JOIN > user_account) CROSS JOIN address_list) > LEFT JOIN state_province ON ...) > LEFT JOIN contact_info ON ... > WHERE ... > > This is clearly at odds with the SQL spec's syntactically defined join > order semantics. It's possible that it always yields the same results > as the spec requires, but I'm not at all sure about that. Again, I don't know. On the 3 queries based on these tables, Postgres and MySQL return the exact same data (they use the same data set). Do you have a link to the SQL spec's join-order requirements? > In any case > this strategy is certainly not "better" than ours, it just performs > poorly on a different set of queries. Would I be out of line to > speculate that your query was previously tuned to work well in MySQL? The query was pulled from our codebase (written for Oracle). I added a bit to it to make it slower, and then ported to MySQL and tested there first (just re-wrote the outer-join syntax). I found that re-ordering the tables in the from-clause on MySQL changed the time by 45-ish% (0.36 seconds to .20 seconds), but that's because I had forgotten to re-analyze the tables after refreshing the dataset. Now, table order doesn't make a difference in speed (or results). If anything, I've done more tuning for Postgres - added some extra indexes to try to help (country.country_id had a composite index with another column, but not an index for just it), etc. The dataset and schema is pure-Oracle. I extracted it out of the database, removed all Oracle-specific extensions, changed the column types, and migrated the indexes and foreign keys to MySQL and Postgres. Nothing more (other than an extra index or two for Postgres - nada for MySQL). This is all part of a "migrate away from Oracle" project. We are looking at 3 databases - MySQL (InnoDB), Postgres and Matisse (object oriented). We have alot of queries like this or worse, and I'm worried that many of them would need to be re-written. The developers know SQL, but nothing about tuning, etc. Thanks for the quick response - I will try explicit joining, and I'm looking forward to your comments on outer-joins and the optmizer (and anything else I've written). David.
pgsql-performance by date: