Re: Subquery WHERE IN or WHERE EXISTS faster? - Mailing list pgsql-performance

From Ulrich
Subject Re: Subquery WHERE IN or WHERE EXISTS faster?
Date
Msg-id 4866B624.1090906@gmx.net
Whole thread Raw
In response to Re: Subquery WHERE IN or WHERE EXISTS faster?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Subquery WHERE IN or WHERE EXISTS faster?
Re: Subquery WHERE IN or WHERE EXISTS faster?
Re: Subquery WHERE IN or WHERE EXISTS faster?
List pgsql-performance
Hi,
I have added a bit of dummy Data, 100000 processors, 10000 users, each
user got around 12 processors.

I have tested both queries. First of all, I was surprised that it is
that fast :) Here are the results:


EXPLAIN ANALYZE SELECT speed FROM processors WHERE id IN (SELECT
processorid FROM users_processors WHERE userid=4040) ORDER BY speed ASC
LIMIT 10 OFFSET 1;

Limit  (cost=113.73..113.75 rows=7 width=5) (actual time=0.335..0.340
rows=10 loops=1)
   ->  Sort  (cost=113.73..113.75 rows=8 width=5) (actual
time=0.332..0.333 rows=11 loops=1)
         Sort Key: processors.speed
         Sort Method:  quicksort  Memory: 17kB
         ->  Nested Loop  (cost=47.22..113.61 rows=8 width=5) (actual
time=0.171..0.271 rows=13 loops=1)
               ->  HashAggregate  (cost=47.22..47.30 rows=8 width=4)
(actual time=0.148..0.154 rows=13 loops=1)
                     ->  Bitmap Heap Scan on users_processors
(cost=4.36..47.19 rows=12 width=4) (actual time=0.074..0.117 rows=13
loops=1)
                           Recheck Cond: (userid = 4040)
                           ->  Bitmap Index Scan on
users_processors_userid_index  (cost=0.00..4.35 rows=12 width=0) (actual
time=0.056..0.056 rows=13 loops=1)
                                 Index Cond: (userid = 4040)
               ->  Index Scan using processors_pkey on processors
(cost=0.00..8.28 rows=1 width=9) (actual time=0.006..0.007 rows=1 loops=13)
                     Index Cond: (processors.id =
users_processors.processorid)
 Total runtime: 0.471 ms
(13 rows)

___________

EXPLAIN ANALYZE SELECT speed FROM processors WHERE EXISTS (SELECT 1 FROM
users_processors WHERE userid=4040 AND processorid=processors.id) ORDER
BY speed ASC LIMIT 10 OFFSET 1;

 Limit  (cost=831413.86..831413.89 rows=10 width=5) (actual
time=762.475..762.482 rows=10 loops=1)
   ->  Sort  (cost=831413.86..831538.86 rows=50000 width=5) (actual
time=762.471..762.473 rows=11 loops=1)
         Sort Key: processors.speed
         Sort Method:  quicksort  Memory: 17kB
         ->  Seq Scan on processors  (cost=0.00..830299.00 rows=50000
width=5) (actual time=313.591..762.411 rows=13 loops=1)
               Filter: (subplan)
               SubPlan
                 ->  Index Scan using users_processors_pkey on
users_processors  (cost=0.00..8.29 rows=1 width=0) (actual
time=0.006..0.006 rows=0 loops=100000)
                       Index Cond: ((userid = 4040) AND (processorid = $0))
 Total runtime: 762.579 ms
(10 rows)




As you can see the second query is much slower. First I thought "Just a
difference of 0.3ms?", but then I realized that it was 762ms not 0.762 ;-).
Both queries return the same result, so I will use #1 and count(*) takes
just 0.478ms if I use query #1.

Kind Regards,
Ulrich

Tom Lane wrote:
> Ulrich <ulrich.mierendorff@gmx.net> writes:
>
>> People say that [EXISTS is faster]
>>
>
> People who say that are not reliable authorities, at least as far as
> Postgres is concerned.  But it is always a bad idea to extrapolate
> results on toy tables to large tables --- quite aside from measurement
> noise and caching issues, the planner might pick a different plan when
> faced with large tables.  Load up a realistic amount of data and then
> see what you get.
>
>             regards, tom lane
>
>


pgsql-performance by date:

Previous
From: "Steinar H. Gunderson"
Date:
Subject: Re: Planner should use index on a LIKE 'foo%' query
Next
From: Gregory Stark
Date:
Subject: Re: Subquery WHERE IN or WHERE EXISTS faster?