Home > mailing lists

Subquery WHERE IN or WHERE EXISTS faster? - Mailing list pgsql-performance

From	Ulrich
Subject	Subquery WHERE IN or WHERE EXISTS faster?
Date	June 28, 2008 15:22:49
Msg-id	48665741.2060900@gmx.net Whole thread Raw
Responses	Re: Subquery WHERE IN or WHERE EXISTS faster?
List	pgsql-performance

Tree view

Hi,
I am new to SQL and have two tables..., "processor" and
"users_processors". The first table contains Processors:

CREATE TABLE processor (
id SERIAL,
speed varchar(50) NOT NULL,
type int2 NOT NULL,
PRIMARY KEY (id)
);
CREATE UNIQUE INDEX processor_speed_index ON processors(lower(speed));

Example:
1 "100MHz" 0
2 "36GHz" 7
...


The second Table defines which processor one user has got:

CREATE TABLE users_processors (
userid int REFERENCES users ON UPDATE CASCADE ON DELETE CASCADE,
processorid int REFERENCES processors ON UPDATE CASCADE ON DELETE CASCADE,
PRIMARY KEY(userid, processorid)
);
CREATE INDEX users_processors_processorid_index ON
users_processors(processorid);
CREATE INDEX users_processors_processorid_index ON
users_processors(processorid);

Example:
1 2
1 3
1 4
...
2 1
2 2
...
(The user "1" own processors 2,3,4 and the user 2 owns processors 1,2)


__________________________________________________________

Now, I would like to list all processors user "1" has got. The following
query does that:
SELECT speed FROM processors WHERE id IN (SELECT processorid FROM
users_processors WHERE userid=1) ORDER BY speed ASC LIMIT 10 OFFSET 2;

This would return 10 processors beginning with number 3. I have read,
that this query is slow and can be faster. I analyzed it:
Limit  (cost=22.90..22.90 rows=1 width=118) (actual time=0.344..0.349
rows=9 loops=1)
   ->  Sort  (cost=22.90..22.90 rows=2 width=118) (actual
time=0.341..0.341 rows=11 loops=1)
         Sort Key: processors.speed
         Sort Method:  quicksort  Memory: 17kB
         ->  Nested Loop  (cost=15.03..22.89 rows=2 width=118) (actual
time=0.225..0.289 rows=11 loops=1)
               ->  HashAggregate  (cost=15.03..15.05 rows=2 width=4)
(actual time=0.207..0.214 rows=11 loops=1)
                     ->  Bitmap Heap Scan on users_processors
(cost=4.34..15.01 rows=11 width=4) (actual time=0.175..0.179 rows=11
loops=1)
                           Recheck Cond: (userid = 1)
                           ->  Bitmap Index Scan on
users_processors_userid_index  (cost=0.00..4.33 rows=11 width=0) (actual
time=0.159..0.159 rows=12 loops=1)
                                 Index Cond: (userid = 1)
               ->  Index Scan using processors_pkey on processors
(cost=0.00..3.90 rows=1 width=122) (actual time=0.004..0.004 rows=1
loops=11)
                     Index Cond: (processors.id =
users_processors.processorid)
 Total runtime: 0.478 ms
(13 rows)


__________________________________________________________


People say that this query is faster:
SELECT speed FROM processors WHERE EXISTS (SELECT 1 FROM
users_processors WHERE userid=1 AND processorid=processors.id) ORDER BY
speed ASC LIMIT 10 OFFSET 2;

Analyze returns:
 Limit  (cost=4404.52..4404.55 rows=10 width=118) (actual
time=0.179..0.184 rows=9 loops=1)
   ->  Sort  (cost=4404.52..4405.18 rows=265 width=118) (actual
time=0.176..0.177 rows=11 loops=1)
         Sort Key: processors.speed
         Sort Method:  quicksort  Memory: 17kB
         ->  Seq Scan on processors (cost=0.00..4398.44 rows=265
width=118) (actual time=0.056..0.118 rows=11 loops=1)
               Filter: (subplan)
               SubPlan
                 ->  Index Scan using users_processors_pkey on
users_processors  (cost=0.00..8.27 rows=1 width=0) (actual
time=0.006..0.006 rows=1 loops=11)
                       Index Cond: ((userid = 1) AND (processorid = $0))
 Total runtime: 0.267 ms
(10 rows)




The second query is faster, but I have only used a very small table with
less than 20 items. In real-world I will have tables with thousands of
entries. I wonder if the second query is also faster in cases where I
have big tables, because it does a "Seq Scan", for me this looks like a
complete table scan. This seams reasonable if we look at the query I do
not expect that it is possible to use an INDEX for the second query. So,
is it slower?

Which query would you use, the first or the second one?


I would also like to know the total number of processors one user has
got. I would use one of those queries and replace the "SELECT speed"
with "SELECT count(*)" and remove the LIMIT and OFFSET. Is this good? I
have read that count(*) is slow.

Kind regards
Ulrich

pgsql-performance by date:

From: Chris Browne
Date: 27 June 2008, 19:08:36
Subject: Re: Federated Postgresql architecture ?

From: Tom Lane
Date: 28 June 2008, 15:53:51
Subject: Re: Subquery WHERE IN or WHERE EXISTS faster?

Subquery WHERE IN or WHERE EXISTS faster? - Mailing list pgsql-performance

Previous

Next