Re: Subquery WHERE IN or WHERE EXISTS faster? - Mailing list pgsql-performance
From | Ulrich |
---|---|
Subject | Re: Subquery WHERE IN or WHERE EXISTS faster? |
Date | |
Msg-id | 4866B624.1090906@gmx.net Whole thread Raw |
In response to | Re: Subquery WHERE IN or WHERE EXISTS faster? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Subquery WHERE IN or WHERE EXISTS faster?
Re: Subquery WHERE IN or WHERE EXISTS faster? Re: Subquery WHERE IN or WHERE EXISTS faster? |
List | pgsql-performance |
Hi, I have added a bit of dummy Data, 100000 processors, 10000 users, each user got around 12 processors. I have tested both queries. First of all, I was surprised that it is that fast :) Here are the results: EXPLAIN ANALYZE SELECT speed FROM processors WHERE id IN (SELECT processorid FROM users_processors WHERE userid=4040) ORDER BY speed ASC LIMIT 10 OFFSET 1; Limit (cost=113.73..113.75 rows=7 width=5) (actual time=0.335..0.340 rows=10 loops=1) -> Sort (cost=113.73..113.75 rows=8 width=5) (actual time=0.332..0.333 rows=11 loops=1) Sort Key: processors.speed Sort Method: quicksort Memory: 17kB -> Nested Loop (cost=47.22..113.61 rows=8 width=5) (actual time=0.171..0.271 rows=13 loops=1) -> HashAggregate (cost=47.22..47.30 rows=8 width=4) (actual time=0.148..0.154 rows=13 loops=1) -> Bitmap Heap Scan on users_processors (cost=4.36..47.19 rows=12 width=4) (actual time=0.074..0.117 rows=13 loops=1) Recheck Cond: (userid = 4040) -> Bitmap Index Scan on users_processors_userid_index (cost=0.00..4.35 rows=12 width=0) (actual time=0.056..0.056 rows=13 loops=1) Index Cond: (userid = 4040) -> Index Scan using processors_pkey on processors (cost=0.00..8.28 rows=1 width=9) (actual time=0.006..0.007 rows=1 loops=13) Index Cond: (processors.id = users_processors.processorid) Total runtime: 0.471 ms (13 rows) ___________ EXPLAIN ANALYZE SELECT speed FROM processors WHERE EXISTS (SELECT 1 FROM users_processors WHERE userid=4040 AND processorid=processors.id) ORDER BY speed ASC LIMIT 10 OFFSET 1; Limit (cost=831413.86..831413.89 rows=10 width=5) (actual time=762.475..762.482 rows=10 loops=1) -> Sort (cost=831413.86..831538.86 rows=50000 width=5) (actual time=762.471..762.473 rows=11 loops=1) Sort Key: processors.speed Sort Method: quicksort Memory: 17kB -> Seq Scan on processors (cost=0.00..830299.00 rows=50000 width=5) (actual time=313.591..762.411 rows=13 loops=1) Filter: (subplan) SubPlan -> Index Scan using users_processors_pkey on users_processors (cost=0.00..8.29 rows=1 width=0) (actual time=0.006..0.006 rows=0 loops=100000) Index Cond: ((userid = 4040) AND (processorid = $0)) Total runtime: 762.579 ms (10 rows) As you can see the second query is much slower. First I thought "Just a difference of 0.3ms?", but then I realized that it was 762ms not 0.762 ;-). Both queries return the same result, so I will use #1 and count(*) takes just 0.478ms if I use query #1. Kind Regards, Ulrich Tom Lane wrote: > Ulrich <ulrich.mierendorff@gmx.net> writes: > >> People say that [EXISTS is faster] >> > > People who say that are not reliable authorities, at least as far as > Postgres is concerned. But it is always a bad idea to extrapolate > results on toy tables to large tables --- quite aside from measurement > noise and caching issues, the planner might pick a different plan when > faced with large tables. Load up a realistic amount of data and then > see what you get. > > regards, tom lane > >
pgsql-performance by date: