Re: JOIN vs. WHERE ... IN (subselect) - Mailing list pgsql-sql

From Chris Linstruth
Subject Re: JOIN vs. WHERE ... IN (subselect)
Date
Msg-id Pine.BSI.4.33.0305170741200.6053-100000@cello.qnet.com
Whole thread Raw
In response to JOIN vs. WHERE ... IN (subselect)  ("Dean Gibson (DB Administrator)" <dba-sql@ultimeth.net>)
Responses Re: JOIN vs. WHERE ... IN (subselect)
List pgsql-sql
I was just given the following.  Not your exact query but replacing
WHERE IN with JOIN ... AS SUBSELECT fixed my performance problems.
This query wnet from 15-30 seconds to under a second.

I have attempted to research exactly what Mr. Arnold did, and found
nothing regarding JOIN ... AS SUBSELECT in the docs.  Any pointers
appreciated.

On Sun, 11 May 2003, Mathieu Arnold wrote:
>
>
> --Le 11/05/2003 09:48 -0700, Chris Linstruth \xe9crivait :
>
> |
> | But this can take 15-30 seconds:
> |
> | SELECT count(radacctid) AS sessions,
> |         sum(acctsessiontime) AS connecttime
> |         FROM radacct
> |         WHERE radacctid IN
> |         (SELECT DISTINCT ON (acctsessionid) radacctid FROM radacct
> |           WHERE username='cjl'
> |           AND acctstoptime IS NOT NULL
> |           AND date_trunc('month', now())=date_trunc('month',
> | acctstoptime));
> |
> | There are probably many different ways to perform this query.  My
> | main problem is trying to overcome the fact that try as I might,
> | I can't stop the occasional duplicate accounting record from being
> | inserted so I have to weed them out, hence the "DISTINCT ON
> | (acctsessionid)".
>
> try this :
> SELECT count(radacctid) AS sessions,
>         sum(acctsessiontime) AS connecttime
>         FROM radacct
>              JOIN (SELECT DISTINCT ON (acctsessionid) radacctid FROM radacct
>                    WHERE username='cjl'
>                    AND acctstoptime IS NOT NULL
>                    AND date_trunc('month', now())=date_trunc('month',
> acctstoptime)) AS subselect USING (radacctid);
>

-- 
Chris Linstruth <cjl@qnet.com>
QNET
1529 East Palmdale Blvd Suite 200
Palmdale, CA 93550
(661) 538-2028


On Fri, 16 May 2003, Dean Gibson (DB Administrator) wrote:

> Using PostgreSQL 7.3.2 on Linux.
>
> One of the types of queries that I like to do on our database is:
>
> SELECT * FROM table1
>     WHERE indexed_column1 IN
>     (SELECT column2 FROM table2 WHERE <condition>);
>
> However, in our database table1 is quite large (~1M rows), and the above query takes "forever", EVEN IF table2
CONSISTSOF ONLY ONE ROW!
 
>
> However, my third-party SQL book says that the above is equivalent to:
>
> SELECT table1.* FROM table1, table2
>     WHERE indexed_column1 = column2 AND <condition>;
>
> And indeed, the above query executes virtually instantaneously if "<condition>" results in a small number of rows
fromtable2.
 
>
> I'm fairly new to SQL;  are they really equivalent?  If so, is there some reason that the first form is not
optimized/transformedinto the second form, or is this a planned enhancement?
 
>
> -- Dean
>
> ps: If indexed_column1 has the same name as column2, then the query:
>
> SELECT table1.* from table1
>     JOIN table2 USING( column )
>     WHERE <condition>;
>
> Also executes quickly.  I just like the readability of the first query above, but as they say, you gotta do what
works...
 
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>



pgsql-sql by date:

Previous
From: "Dean Gibson (DB Administrator)"
Date:
Subject: JOIN vs. WHERE ... IN (subselect)
Next
From: "Dean Gibson (DB Administrator)"
Date:
Subject: Re: JOIN vs. WHERE ... IN (subselect)