Thread: Seemingly identical queries run at different speeds

Seemingly identical queries run at different speeds

From
VanL
Date:
Hello,

I have three queries that are essentially identical.  Two of them run in
fractions of a second; one of them takes longer than 15 minutes to
return. (Not sure how long it totally takes, that has been the limit of
my patience.)

The only difference between these queries is the use of table aliases in
the sql query.  What is happening in postgres that this makes such a
difference?

SQL queries below.

Thanks,

VanL

FAST (0.017 second):
select
     mm_batch.name as batch_name,
     mm_domain.name as domain_name,
     mm_management_unit.name as management_unit_name,
     mm_customer.firstname as customer_name,
     mm_legacy_account.username as old_username,
     mm_target_account.username as new_username
from
     mm_batch,
     mm_domain,
     mm_management_unit,
     mm_customer,
     mm_legacy_account,
     mm_target_account
where
         mm_domain.bid = mm_batch.id
     and mm_domain.mid = mm_management_unit.id
     and mm_domain.cid = mm_customer.id
     and mm_domain.lid = mm_legacy_account.id
     and mm_domain.tid = mm_target_account.id
     and mm_domain.name = 'example.com';
========================================================
FAST (0.016 second):
select
     B.name as batch_name,
     D.name as domain_name,
     M.name as management_unit_name,
     C.firstname as customer_name,
     LA.username as old_username,
     TA.username as new_username
from
     mm_batch B,
     mm_domain D,
     mm_management_unit M,
     mm_customer C,
     mm_legacy_account LA,
     mm_target_account TA
where
         D.bid = B.id
     and D.mid = M.id
     and D.cid = C.id
     and D.lid = LA.id
     and D.tid = TA.id
     and D.name = 'example.com';
=========================================================
SLOW ( > 15 minutes):
select
     mm_batch.name as batch_name,
     mm_domain.name as domain_name,
     mm_management_unit.name as management_unit_name,
     mm_customer.firstname as customer_name,
     mm_legacy_account.username as old_username,
     mm_target_account.username as new_username
from
     mm_batch B,
     mm_domain D,
     mm_management_unit M,
     mm_customer C,
     mm_legacy_account LA,
     mm_target_account TA
where
         mm_domain.bid = mm_batch.id
     and mm_domain.mid = mm_management_unit.id
     and mm_domain.cid = mm_customer.id
     and mm_domain.lid = mm_legacy_account.id
     and mm_domain.tid = mm_target_account.id
     and mm_domain.name = 'example.com';


Re: Seemingly identical queries run at different speeds

From
Arguile
Date:
On Tue, 2003-07-22 at 19:27, VanL wrote:
> I have three queries that are essentially identical.  Two of them run in
> fractions of a second; one of them takes longer than 15 minutes to
> return. (Not sure how long it totally takes, that has been the limit of
> my patience.)
>
> The only difference between these queries is the use of table aliases in
> the sql query.  What is happening in postgres that this makes such a
> difference?

You're getting bit by a 'feature' that is supposed to make life easier.
PostgreSQL automatically adds tables you reference to the FROM clause if
they're not listed there already, it then sends a NOTICE saying it did
so. This is supposed to be helpful, here it certainly is not.

What's happening in the below is you reference mm_batch, mm_domain, etc.
which are _different_ than B, D, etc. because aliasing changes what
query processor refers to them as (allowing stuff like self-joins).

So Pg is appending them to the FROM clause, which now looks like this:

    FROM mm_batch B, mm_domain D, mm_management_unit M, mm_customer C,
    mm_legacy_account LA, mm_target_account TA, mm_batch, mm_domain,
    mm_management_unit, mm_customer, mm_legacy_account,
    mm_target_account

Yikes. It's now generating an absolutely huge cartesian product between
the result set you want and all those extra tables.

I believe in the upcoming 7.4.x release this won't happen unless you
enable ADD_MISSING_FROM in your conf. Something I certainly won't be
enabling due to this type of situation.

> SLOW ( > 15 minutes):
> select
>      mm_batch.name as batch_name,
>      mm_domain.name as domain_name,
>      mm_management_unit.name as management_unit_name,
>      mm_customer.firstname as customer_name,
>      mm_legacy_account.username as old_username,
>      mm_target_account.username as new_username
> from
>      mm_batch B,
>      mm_domain D,
>      mm_management_unit M,
>      mm_customer C,
>      mm_legacy_account LA,
>      mm_target_account TA
> where
>          mm_domain.bid = mm_batch.id
>      and mm_domain.mid = mm_management_unit.id
>      and mm_domain.cid = mm_customer.id
>      and mm_domain.lid = mm_legacy_account.id
>      and mm_domain.tid = mm_target_account.id
>      and mm_domain.name = 'example.com';



Re: Seemingly identical queries run at different speeds

From
Alvaro Herrera
Date:
On Tue, Jul 22, 2003 at 05:27:37PM -0600, VanL wrote:

> I have three queries that are essentially identical.  Two of them run in
> fractions of a second; one of them takes longer than 15 minutes to
> return. (Not sure how long it totally takes, that has been the limit of
> my patience.)
>
> The only difference between these queries is the use of table aliases in
> the sql query.  What is happening in postgres that this makes such a
> difference?

I think the third query is adding items to the FROM clause that you
probably don't want.  The result is a cartesian product, which is going
to take eons to materialize (i.e. give results back).

See:
http://www14.us.postgresql.org/docs/7.3/static/sql-select.html#R2-SQL-SELECT-4

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)