Subquery WHERE IN or WHERE EXISTS faster? - Mailing list pgsql-performance
From | Ulrich |
---|---|
Subject | Subquery WHERE IN or WHERE EXISTS faster? |
Date | |
Msg-id | 48665741.2060900@gmx.net Whole thread Raw |
Responses |
Re: Subquery WHERE IN or WHERE EXISTS faster?
|
List | pgsql-performance |
Hi, I am new to SQL and have two tables..., "processor" and "users_processors". The first table contains Processors: CREATE TABLE processor ( id SERIAL, speed varchar(50) NOT NULL, type int2 NOT NULL, PRIMARY KEY (id) ); CREATE UNIQUE INDEX processor_speed_index ON processors(lower(speed)); Example: 1 "100MHz" 0 2 "36GHz" 7 ... The second Table defines which processor one user has got: CREATE TABLE users_processors ( userid int REFERENCES users ON UPDATE CASCADE ON DELETE CASCADE, processorid int REFERENCES processors ON UPDATE CASCADE ON DELETE CASCADE, PRIMARY KEY(userid, processorid) ); CREATE INDEX users_processors_processorid_index ON users_processors(processorid); CREATE INDEX users_processors_processorid_index ON users_processors(processorid); Example: 1 2 1 3 1 4 ... 2 1 2 2 ... (The user "1" own processors 2,3,4 and the user 2 owns processors 1,2) __________________________________________________________ Now, I would like to list all processors user "1" has got. The following query does that: SELECT speed FROM processors WHERE id IN (SELECT processorid FROM users_processors WHERE userid=1) ORDER BY speed ASC LIMIT 10 OFFSET 2; This would return 10 processors beginning with number 3. I have read, that this query is slow and can be faster. I analyzed it: Limit (cost=22.90..22.90 rows=1 width=118) (actual time=0.344..0.349 rows=9 loops=1) -> Sort (cost=22.90..22.90 rows=2 width=118) (actual time=0.341..0.341 rows=11 loops=1) Sort Key: processors.speed Sort Method: quicksort Memory: 17kB -> Nested Loop (cost=15.03..22.89 rows=2 width=118) (actual time=0.225..0.289 rows=11 loops=1) -> HashAggregate (cost=15.03..15.05 rows=2 width=4) (actual time=0.207..0.214 rows=11 loops=1) -> Bitmap Heap Scan on users_processors (cost=4.34..15.01 rows=11 width=4) (actual time=0.175..0.179 rows=11 loops=1) Recheck Cond: (userid = 1) -> Bitmap Index Scan on users_processors_userid_index (cost=0.00..4.33 rows=11 width=0) (actual time=0.159..0.159 rows=12 loops=1) Index Cond: (userid = 1) -> Index Scan using processors_pkey on processors (cost=0.00..3.90 rows=1 width=122) (actual time=0.004..0.004 rows=1 loops=11) Index Cond: (processors.id = users_processors.processorid) Total runtime: 0.478 ms (13 rows) __________________________________________________________ People say that this query is faster: SELECT speed FROM processors WHERE EXISTS (SELECT 1 FROM users_processors WHERE userid=1 AND processorid=processors.id) ORDER BY speed ASC LIMIT 10 OFFSET 2; Analyze returns: Limit (cost=4404.52..4404.55 rows=10 width=118) (actual time=0.179..0.184 rows=9 loops=1) -> Sort (cost=4404.52..4405.18 rows=265 width=118) (actual time=0.176..0.177 rows=11 loops=1) Sort Key: processors.speed Sort Method: quicksort Memory: 17kB -> Seq Scan on processors (cost=0.00..4398.44 rows=265 width=118) (actual time=0.056..0.118 rows=11 loops=1) Filter: (subplan) SubPlan -> Index Scan using users_processors_pkey on users_processors (cost=0.00..8.27 rows=1 width=0) (actual time=0.006..0.006 rows=1 loops=11) Index Cond: ((userid = 1) AND (processorid = $0)) Total runtime: 0.267 ms (10 rows) The second query is faster, but I have only used a very small table with less than 20 items. In real-world I will have tables with thousands of entries. I wonder if the second query is also faster in cases where I have big tables, because it does a "Seq Scan", for me this looks like a complete table scan. This seams reasonable if we look at the query I do not expect that it is possible to use an INDEX for the second query. So, is it slower? Which query would you use, the first or the second one? I would also like to know the total number of processors one user has got. I would use one of those queries and replace the "SELECT speed" with "SELECT count(*)" and remove the LIMIT and OFFSET. Is this good? I have read that count(*) is slow. Kind regards Ulrich
pgsql-performance by date: