Re: JOIN vs. WHERE ... IN (subselect) - Mailing list pgsql-sql
From | Chris Linstruth |
---|---|
Subject | Re: JOIN vs. WHERE ... IN (subselect) |
Date | |
Msg-id | Pine.BSI.4.33.0305170741200.6053-100000@cello.qnet.com Whole thread Raw |
In response to | JOIN vs. WHERE ... IN (subselect) ("Dean Gibson (DB Administrator)" <dba-sql@ultimeth.net>) |
Responses |
Re: JOIN vs. WHERE ... IN (subselect)
|
List | pgsql-sql |
I was just given the following. Not your exact query but replacing WHERE IN with JOIN ... AS SUBSELECT fixed my performance problems. This query wnet from 15-30 seconds to under a second. I have attempted to research exactly what Mr. Arnold did, and found nothing regarding JOIN ... AS SUBSELECT in the docs. Any pointers appreciated. On Sun, 11 May 2003, Mathieu Arnold wrote: > > > --Le 11/05/2003 09:48 -0700, Chris Linstruth \xe9crivait : > > | > | But this can take 15-30 seconds: > | > | SELECT count(radacctid) AS sessions, > | sum(acctsessiontime) AS connecttime > | FROM radacct > | WHERE radacctid IN > | (SELECT DISTINCT ON (acctsessionid) radacctid FROM radacct > | WHERE username='cjl' > | AND acctstoptime IS NOT NULL > | AND date_trunc('month', now())=date_trunc('month', > | acctstoptime)); > | > | There are probably many different ways to perform this query. My > | main problem is trying to overcome the fact that try as I might, > | I can't stop the occasional duplicate accounting record from being > | inserted so I have to weed them out, hence the "DISTINCT ON > | (acctsessionid)". > > try this : > SELECT count(radacctid) AS sessions, > sum(acctsessiontime) AS connecttime > FROM radacct > JOIN (SELECT DISTINCT ON (acctsessionid) radacctid FROM radacct > WHERE username='cjl' > AND acctstoptime IS NOT NULL > AND date_trunc('month', now())=date_trunc('month', > acctstoptime)) AS subselect USING (radacctid); > -- Chris Linstruth <cjl@qnet.com> QNET 1529 East Palmdale Blvd Suite 200 Palmdale, CA 93550 (661) 538-2028 On Fri, 16 May 2003, Dean Gibson (DB Administrator) wrote: > Using PostgreSQL 7.3.2 on Linux. > > One of the types of queries that I like to do on our database is: > > SELECT * FROM table1 > WHERE indexed_column1 IN > (SELECT column2 FROM table2 WHERE <condition>); > > However, in our database table1 is quite large (~1M rows), and the above query takes "forever", EVEN IF table2 CONSISTSOF ONLY ONE ROW! > > However, my third-party SQL book says that the above is equivalent to: > > SELECT table1.* FROM table1, table2 > WHERE indexed_column1 = column2 AND <condition>; > > And indeed, the above query executes virtually instantaneously if "<condition>" results in a small number of rows fromtable2. > > I'm fairly new to SQL; are they really equivalent? If so, is there some reason that the first form is not optimized/transformedinto the second form, or is this a planned enhancement? > > -- Dean > > ps: If indexed_column1 has the same name as column2, then the query: > > SELECT table1.* from table1 > JOIN table2 USING( column ) > WHERE <condition>; > > Also executes quickly. I just like the readability of the first query above, but as they say, you gotta do what works... > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >