Re: Using the IN clauise - Mailing list pgsql-general

From terry@greatgulfhomes.com
Subject Re: Using the IN clauise
Date
Msg-id 001801c27c5d$4707ee20$2766f30a@development.greatgulfhomes.com
Whole thread Raw
In response to Re: Using the IN clauise  (Robert Treat <xzilla@users.sourceforge.net>)
List pgsql-general
I could not find any doc on this, so:

In case anyone is interested, I did a test, and here are my results:
I created the tables like this:
<cfloop from="1" to="1000000" index="variables.idx">
    <cfquery name="test_insert" datasource="PostgreSQL">
        INSERT INTO test_in_1 (f1, v1) VALUES ('#variables.idx#',
'#Chr(RandRange(65, 90))#')
    </cfquery>
    <cfquery name="test_insert" datasource="PostgreSQL">
        INSERT INTO test_in_2 (f1, v2) VALUES ('#variables.idx#',
'#Chr(RandRange(65, 90))#')
    </cfquery>
</cfloop>

And when I ran the queries (twice to ensure that the first one wasn't worse
because of disk being cached for the second query to run on)

test_in (Records=501, Time=6279ms)
SQL =
SELECT test_in_1.v1
    FROM   test_in_1
    WHERE  test_in_1.f1 IN (SELECT test_in_2.f1 FROM test_in_2 WHERE
test_in_2.v2 = 'A');

test_in (Records=501, Time=113ms)
SQL =
SELECT  test_in_1.v1
    FROM    test_in_1, test_in_2
    WHERE   test_in_1.f1 = test_in_2.f1
        AND test_in_2.v2 = 'A'

test_in (Records=501, Time=5739ms)
SQL =
SELECT test_in_1.v1
    FROM   test_in_1
    WHERE  test_in_1.f1 IN (SELECT test_in_2.f1 FROM test_in_2 WHERE
test_in_2.v2 = 'A');

test_in (Records=501, Time=70ms)
SQL =
SELECT  test_in_1.v1
    FROM    test_in_1, test_in_2
    WHERE   test_in_1.f1 = test_in_2.f1
        AND test_in_2.v2 = 'A'


Terry Fielder
Network Engineer
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com



> -----Original Message-----
> From: Robert Treat [mailto:xzilla@users.sourceforge.net]
> Sent: Friday, October 25, 2002 2:52 PM
> To: terry@ashtonwoodshomes.com
> Cc: Postgres (E-mail)
> Subject: Re: [GENERAL] Using the IN clauise
>
>
> On Fri, 2002-10-25 at 14:40, terry@ashtonwoodshomes.com wrote:
> > My post didn't go through the first time, so here goes again:
> > >
> > >
> > > I thought I read somewhere that it was better to avoid using
> > > the IN clause (at least when that means doing a subselect)
> > > for efficiency reasons, but I cannot find it on the website now.
> > >
> > > Does anyone know where that is?
> > >
>
> Do a search on the interactive docs for IN or EXISTS, I think it's
> mentioned there.  Either way it is generally recommended to use the
> EXISTS grammer rather than the IN for performance reasons. When in
> doubt, explain analyze is your friend.
>
>
> > > And can someone confirm the following:
> > >
> > > This:
> > > SELECT t1.f1
> > > FROM t1
> > > WHERE t1.f1 IN (select f1 from t2 AS t2 where t2.f1 = t1.f1
> > > AND t2.f2 = 'v1')
> > >
> > > is generally slower to run then:
> > > SELECT t1.f1
> > > FROM t1, t2
> > > WHERE t1.f1 = t2.f1
> > >   AND t2.f3 = 'v1'
> > >
>
> I would believe it to be true, though depending on your table / data
> structure I can't say 100% percent true.  Try running explain
> analyze on
> both queries and see what you return.
>
> > > Thanks
> > >
> > > Terry Fielder
> > > Network Engineer
> > > Great Gulf Homes / Ashton Woods Homes
> > > terry@greatgulfhomes.com
> > >
>
> Robert Treat
>


pgsql-general by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: Simultaneous vacuum-ing
Next
From: markus.cl@gmx.de (Markus Dehmann)
Date:
Subject: Why does postgres seach in ALL lines (not optimal!)