Re: Query plan for NOT IN - Mailing list pgsql-performance

From Grzegorz Jaśkiewicz
Subject Re: Query plan for NOT IN
Date
Msg-id 2f4958ff0910050656w1028016dib925f38a7a45d96e@mail.gmail.com
Whole thread Raw
In response to Query plan for NOT IN  (Matthew Wakeling <matthew@flymine.org>)
Responses Re: Query plan for NOT IN
List pgsql-performance


On Mon, Oct 5, 2009 at 2:52 PM, Matthew Wakeling <matthew@flymine.org> wrote:

mnw21-modmine-r13features-copy=# select count(*) from project;
 count
-------
   10
(1 row)

mnw21-modmine-r13features-copy=# select count(*) from intermineobject;
 count
----------
 26344616
(1 row)

mnw21-modmine-r13features-copy=# \d intermineobject;
Table "public.intermineobject"
 Column |  Type   | Modifiers
--------+---------+-----------
 object | text    |
 id     | integer | not null
 class  | text    |
Indexes:
   "intermineobject_pkey" UNIQUE, btree (id)

mnw21-modmine-r13features-copy=# explain select * from project where id NOT IN (SELECT id FROM intermineobject);
                                    QUERY PLAN
------------------------------------------------------------------------------------
 Seq Scan on project  (cost=1476573.93..1476575.05 rows=5 width=183)
  Filter: (NOT (hashed SubPlan 1))
  SubPlan 1
    ->  Seq Scan on intermineobject  (cost=0.00..1410720.74 rows=26341274 width=4)
(4 rows)

This query plan seems to me to be a little slow. Surely it could iterate through the ten project rows and perform ten index lookups in the big table?

 
try using join instead of 'not in'..


select p.* from project p left join intermineobject i on i.id=p.id where i.id is null;


--
GJ

pgsql-performance by date:

Previous
From: Matthew Wakeling
Date:
Subject: Query plan for NOT IN
Next
From: Matthew Wakeling
Date:
Subject: Re: Query plan for NOT IN