Re: Subselect performance question - Mailing list pgsql-sql

From Manfred Koizar
Subject Re: Subselect performance question
Date
Msg-id n083du4g2eue067lrc3eqqdq01aiod5dqu@4ax.com
Whole thread Raw
In response to Subselect performance question  ("Josh Berkus" <josh@agliodbs.com>)
Responses Re: Subselect performance question
List pgsql-sql
On Thu, 02 May 2002 09:37:56 -0700, Josh Berkus wrote:

>Folks,
>
>I have a view which can be designed with eithher a sub-select in the
>SELECT clause or the FROM clause.    The main query data set is small
>(1000 records) but the table being filtered in the sub-select is large
>(110,000 records).

Josh,

I don't know, if it matters, but your queries are not equivalent.  If
for a client_id there is no row with case_status = 'ACT', active_count
is ...

>
>Under those circumstances, is there a hard-and-fast rule for which
>query I should use?  Most of the time, users will only look at one
>record at a time from the main data set.
>
>SELECT client_id, client_name,
>    (SELECT count(*) FROM case_clients WHERE client_id = clients.client_id
>AND case_status = 'ACT') as active_count,
>    (SELECT count(*) FROM case_clients WHERE client_id = clients.client_id
>AND case_status = 'STL') as settled_count,
>FROM clients;

... 0 (zero), because the sub-select counts 0 rows

>
>OR:
>
>
>SELECT client_id, client_name,  active_count, settled_count,
>FROM clients
>    LEFT OUTER JOIN (SELECT client_id, count(*) as active_count FROM
>case_clients WHERE case_status = 'ACT' GROUP BY client_id) ac ON
>ac.client_id = clients.client_id
>   LEFT OUTER JOIN (SELECT client_id, count(*) as settled_count FROM
>case_clients WHERE case_status = 'STL' GROUP BY client_id) sc ON
>sc.client_id = clients.client_id;
>

NULL, because you do a LEFT JOIN to an empty sub-select.

SELECT c.client_id,      c.client_name,      count(ac.client_id) active_count,      count(sc.client_id) settled_count
FROM clients c   LEFT JOIN case_clients ac ON ac.client_id=c.client_id AND
ac.case_status= 'ACT'   LEFT JOIN case_clients sc ON sc.client_id=c.client_id AND
sc.case_status= 'STL' 
GROUP BY c.client_id, c.client_name;

should match your first query and it doesn't use sub-selects at all.
I haven't tested it though, so watch out for typos ...
My personal hard-and-fast rule is "avoid sub-selects, use joins",  but
YMMV.

>-Josh Berkus

ServusManfred


pgsql-sql by date:

Previous
From: "wit"
Date:
Subject: get transaction no.
Next
From: Tom Lane
Date:
Subject: Re: Subselect performance question