Re: Tuning/performance issue... - Mailing list pgsql-performance
From | Tom Lane |
---|---|
Subject | Re: Tuning/performance issue... |
Date | |
Msg-id | 28832.1065017666@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Tuning/performance issue... (David Griffiths <dgriffiths@boats.com>) |
List | pgsql-performance |
David Griffiths <dgriffiths@boats.com> writes: >> The reason PG's planner doesn't discover this join order for itself >> is that it's written to not attempt to re-order outer joins from the >> syntactically defined ordering. In general, such reordering would >> change the results. It is possible to analyze the query and prove that >> certain reorderings are valid (don't change the results), but we don't >> currently have code to do that. > Not sure I follow. Are you saying that, depending on when the outer-join is > applied to the rows found at the time, you may end up with a different set > of rows? Here's an example showing that it's not always safe to rearrange join order in the presence of outer joins: jtest=# create table a (f1 int); CREATE TABLE jtest=# create table b (f1 int, f2 int); CREATE TABLE jtest=# create table c(f1 int, f2 int); CREATE TABLE jtest=# insert into a values (1); INSERT 431307 1 jtest=# insert into b values (10,10); INSERT 431308 1 jtest=# insert into b values (11,11); INSERT 431309 1 jtest=# insert into c values (1,10); INSERT 431310 1 jtest=# insert into c values (2,11); INSERT 431311 1 jtest=# SELECT * FROM a, b LEFT JOIN c ON b.f2 = c.f2 WHERE a.f1 = c.f1; f1 | f1 | f2 | f1 | f2 ----+----+----+----+---- 1 | 10 | 10 | 1 | 10 (1 row) Per spec the JOIN operator binds more tightly than comma, so this is equivalent to: jtest=# SELECT * FROM a JOIN (b LEFT JOIN c ON b.f2 = c.f2) ON a.f1 = c.f1; f1 | f1 | f2 | f1 | f2 ----+----+----+----+---- 1 | 10 | 10 | 1 | 10 (1 row) Now suppose we try to join A and C before joining to B: jtest=# SELECT * FROM b LEFT JOIN (a join c ON a.f1 = c.f1) ON b.f2 = c.f2; f1 | f2 | f1 | f1 | f2 ----+----+----+----+---- 10 | 10 | 1 | 1 | 10 11 | 11 | | | (2 rows) We get a different answer, because some C rows are eliminated before reaching the left join, causing null-extended B rows to be added. (I don't have a MySQL installation here to try, but if they still work the way they used to, they get the wrong answer on the first query.) The point of this example is just that there are cases where it'd be incorrect for the planner to change the ordering of joins from what is implied by the query syntax. It is always safe to change the join order when only inner joins are involved. There are cases where outer join order is safe to change too, but you need analysis code that checks the query conditions to prove that a particular rearrangement is safe. Right now, we don't have such code, and so we just follow the simple rule "never rearrange any outer joins". > I would have expected the optimizer to do the outer-joins last, as the > extra data received by the outer-joins is not mandatory, and won't > affect the rows that were retreived by joining user_account, > address_list, and commercial_entity. I think your example falls into the category of provably-safe rearrangements ... but as I said, the planner doesn't know that. > An outer join would *never* be the most restrictive > join in a query. Sure it can, if the restriction conditions are mainly on the outer join's tables. But that's not really the issue here. As best I can tell without seeing your data statistics, the most restrictive conditions in your query are the ones on commercial_entity.commercial_entity_id and user_account.user_role_id. The trick is to apply those before joining any other tables. regards, tom lane
pgsql-performance by date: