Re: Tuning/performance issue... - Mailing list pgsql-performance

From David Griffiths
Subject Re: Tuning/performance issue...
Date
Msg-id 074601c387df$b5939270$6501a8c0@griffiths2
Whole thread Raw
In response to Tuning/performance issue...  (David Griffiths <dgriffiths@boats.com>)
Responses Re: Tuning/performance issue...
Re: Tuning/performance issue...
List pgsql-performance
> The most efficient way to handle this query would probably be to join
> the three tables with restrictions first, and then join the other tables
> to those.  You could force this with not too much rewriting using
> something like (untested, but I think it's right)
>
> ... FROM commercial_entity CROSS JOIN user_account CROSS JOIN
> address_list LEFT JOIN state_province ON address_list.state_province_id
> = state_province.state_province_id
> LEFT JOIN contact_info ON address_list.contact_info_id =
> contact_info.contact_info_id
> CROSS JOIN country
> WHERE ...
>
> The explicit JOINs associate left-to-right, so this gives the intended
> join order.  (In your original query, explicit JOIN binds more tightly
> than commas do.)

Ok - that's interesting - I'll have to do some reading and more testing.

> The reason PG's planner doesn't discover this join order for itself
> is that it's written to not attempt to re-order outer joins from the
> syntactically defined ordering.  In general, such reordering would
> change the results.  It is possible to analyze the query and prove that
> certain reorderings are valid (don't change the results), but we don't
> currently have code to do that.

Not sure I follow. Are you saying that, depending on when the outer-join is
applied to the rows found at the time, you may end up with a different set
of rows? I would have expected the optimizer to do the outer-joins last, as
the extra data received by the outer-joins is not mandatory, and won't
affect
the rows that were retreived by joining user_account, address_list, and
commercial_entity.

An outer join would *never* be the most restrictive
join in a query. I thought (from my readings on Oracle query tuning) that
finding the most restrictive table/index was the first task of an optimizer.
Reduce the result set as quickly as possible. That query has the line,

"AND commercial_entity.commercial_entity_id=225528",

which uses an index (primary key) and uses an "=". I would have expected
that to be done first, then joined with the other inner-join tables, and
finally
have the outer-joins applied to the final result set to fill in the "might
be there" data.

Anyway, if the optimizer does the outer-joins first (address_list with
state_province
and contact_info), then it's picking the table with the most rows
(address_list has
200K+ rows, where the other 3 big tables have 70K-90K). Would re-ordering
the FROM clause (and LEFT JOIN portions) help?

Could you give an example where applying an outer-join at a different time
could
result in different results? I think I can see at situation where you use
part of the results
in the outer-join in the where clause, but I am not sure.

> I'm prepared to believe that Oracle contains code that actually does the
> analysis about which outer-join reorderings are valid, and is then able
> to find the right join order by deduction.

I'm not sure about Oracle (other than what I stated above). In fact, about
half
the time, updating table stats to try to get the Oracle optimizer to do a
better
job on a query results in even worse performance.

> ... FROM ((((commercial_entity CROSS JOIN country) CROSS JOIN
>          user_account) CROSS JOIN address_list)
>          LEFT JOIN state_province ON ...)
>          LEFT JOIN contact_info ON ...
> WHERE ...
>
> This is clearly at odds with the SQL spec's syntactically defined join
> order semantics.  It's possible that it always yields the same results
> as the spec requires, but I'm not at all sure about that.

Again, I don't know. On the 3 queries based on these tables, Postgres
and MySQL return the exact same data (they use the same data set).

Do you have a link to the SQL spec's join-order requirements?

> In any case
> this strategy is certainly not "better" than ours, it just performs
> poorly on a different set of queries.  Would I be out of line to
> speculate that your query was previously tuned to work well in MySQL?

The query was pulled from our codebase (written for Oracle). I added a bit
to it
to make it slower, and then ported to MySQL and tested there first (just
re-wrote
the outer-join syntax). I found that  re-ordering the tables in the
from-clause on
MySQL changed the time by 45-ish% (0.36 seconds to .20 seconds), but that's
because I had forgotten to re-analyze the tables after refreshing the
dataset.
Now, table order doesn't make a difference in speed (or results).

If anything, I've done more tuning for Postgres - added some extra indexes
to try to help
(country.country_id had a composite index with another column, but not an
index for
just it), etc.

The dataset and schema is pure-Oracle. I extracted it out of the database,
removed all
Oracle-specific extensions, changed the column types, and migrated the
indexes and
foreign keys to MySQL and Postgres. Nothing more (other than an extra index
or two for Postgres - nada for MySQL).

This is all part of a "migrate away from Oracle" project. We are looking at
3 databases -
MySQL (InnoDB), Postgres and Matisse (object oriented). We have alot of
queries like this
or worse, and I'm worried that many of them would need to be re-written. The
developers
know SQL, but nothing about tuning, etc.

Thanks for the quick response - I will try explicit joining, and I'm looking
forward to
your comments on outer-joins and the optmizer (and anything else I've
written).

David.

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Tuning/performance issue...
Next
From: Shridhar Daithankar
Date:
Subject: Re: Tuning/performance issue....