Re: Tuning/performance issue... - Mailing list pgsql-performance

From Tom Lane
Subject Re: Tuning/performance issue...
Date
Msg-id 25573.1064982530@sss.pgh.pa.us
Whole thread Raw
In response to Tuning/performance issue...  (David Griffiths <dgriffiths@boats.com>)
List pgsql-performance
David Griffiths <dgriffiths@boats.com> writes:
> ... FROM commercial_entity, country, user_account,
> address_list LEFT JOIN state_province ON address_list.state_province_id
> = state_province.state_province_id
> LEFT JOIN contact_info ON address_list.contact_info_id =
> contact_info.contact_info_id
> WHERE ...

I believe what you're getting burnt by is that PG's planner interprets
this as forcing the address_list * state_province * contact_info join
to be done before it joins those tables to commercial_entity, country,
and user_account --- for discussion see
http://www.postgresql.org/docs/7.3/static/explicit-joins.html

Unfortunately your WHERE-clause restriction conditions are on
address_list, commercial_entity, and user_account; and it seems the
address_list constraint is very weak.  So the plan ends up forming a
large fraction of the address_list * state_province * contact_info join,
only to throw it away again when there's no matching rows selected from
commercial_entity and user_account.  The actual runtime and actual row
counts from the EXPLAIN ANALYZE output show that this is what's
happening.

The most efficient way to handle this query would probably be to join
the three tables with restrictions first, and then join the other tables
to those.  You could force this with not too much rewriting using
something like (untested, but I think it's right)

... FROM commercial_entity CROSS JOIN user_account CROSS JOIN
address_list LEFT JOIN state_province ON address_list.state_province_id
= state_province.state_province_id
LEFT JOIN contact_info ON address_list.contact_info_id =
contact_info.contact_info_id
CROSS JOIN country
WHERE ...

The explicit JOINs associate left-to-right, so this gives the intended
join order.  (In your original query, explicit JOIN binds more tightly
than commas do.)

The reason PG's planner doesn't discover this join order for itself
is that it's written to not attempt to re-order outer joins from the
syntactically defined ordering.  In general, such reordering would
change the results.  It is possible to analyze the query and prove that
certain reorderings are valid (don't change the results), but we don't
currently have code to do that.

> As a reference, our production Oracle database (exactly the same
> hardware, but RAID-mirroring) with way more load can handle the query in
> 1-2 seconds. I have MySQL 4.0.14 with InnoDB on the same machine
> (shutdown when I am testing Postgres, and visa versa) and it does the
> query in 0.20 seconds.

I'm prepared to believe that Oracle contains code that actually does the
analysis about which outer-join reorderings are valid, and is then able
to find the right join order by deduction.  The last I heard about
MySQL, they have no join-order analysis at all; they unconditionally
interpret this type of query left-to-right, ie as

... FROM ((((commercial_entity CROSS JOIN country) CROSS JOIN
         user_account) CROSS JOIN address_list)
         LEFT JOIN state_province ON ...)
         LEFT JOIN contact_info ON ...
WHERE ...

This is clearly at odds with the SQL spec's syntactically defined join
order semantics.  It's possible that it always yields the same results
as the spec requires, but I'm not at all sure about that.  In any case
this strategy is certainly not "better" than ours, it just performs
poorly on a different set of queries.  Would I be out of line to
speculate that your query was previously tuned to work well in MySQL?

            regards, tom lane

pgsql-performance by date:

Previous
From: Oleg Lebedev
Date:
Subject: Re: TPC-R benchmarks
Next
From: David Griffiths
Date:
Subject: Re: Tuning/performance issue...