Home > mailing lists

Re: BUG #7556: "select not in sub query" plan very poor vs "not exists" - Mailing list pgsql-bugs

From	Craig Ringer
Subject	Re: BUG #7556: "select not in sub query" plan very poor vs "not exists"
Date	September 20, 2012 04:39:52
Msg-id	505A9E0B.3000600@ringerc.id.au Whole thread Raw
In response to	BUG #7556: "select not in sub query" plan very poor vs "not exists" (l1t@tom.com)
List	pgsql-bugs

Tree view

On 09/19/2012 01:48 PM, l1t@tom.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      7556
> Logged by:          lt
> Email address:      l1t@tom.com
> PostgreSQL version: 9.2.0
> Operating system:   windows xp
> Description:
>

create table sli_test (id int primary key,info varchar(20));
insert into sli_test select
generate_series(1,1000000),'digoal'||generate_series(1,1000000);
analyze verbose sli_test;
create table sli_test2 (id int not null,info varchar(20));
insert into sli_test2 select
generate_series(1,1000000),'dbase'||generate_series(1,1000000);
analyze verbose sli_test2;

explain select max(a.info)from sli_test a where a.id not in(select
b.id from sli_test2 b where b.id<50000);

>                                        QUERY PLAN
> ---------------------------------------------------------------------------------------
>   Aggregate  (cost=9241443774.00..9241443774.01 rows=1 width=12)

Here's what I get on 9.1:

regress=# explain select max(a.info)from sli_test a where a.id not in(select
regress(# b.id from sli_test2 b where b.id<50000);
                                    QUERY PLAN

---------------------------------------------------------------------------------
  Aggregate  (cost=38050.82..38050.83 rows=1 width=12)
    ->  Seq Scan on sli_test a  (cost=18026.82..36800.82 rows=500000
width=12)
          Filter: (NOT (hashed SubPlan 1))
          SubPlan 1
            ->  Seq Scan on sli_test2 b  (cost=0.00..17906.00 rows=48329
width=4)
                  Filter: (id < 50000)
(6 rows)


It runs in about 500ms here.

You don't appear to have posted the full query plan, so it's hard to
compare.

In general, `NOT IN` is a poor formulation for a query; you're better
off with a JOIN or with `NOT EXISTS`. See eg


http://stackoverflow.com/questions/12444142/postgresql-how-to-figure-out-missing-numbers-in-a-column-using-generate-series/12444165#12444165

--
Craig Ringer

pgsql-bugs by date:

From: Craig Ringer
Date: 20 September 2012, 02:22:37
Subject: Re: BUG #7558: Postgres service not starting.

From: Heikki Linnakangas
Date: 20 September 2012, 07:18:15
Subject: Re: BUG #6412: psql & fe-connect truncate passwords

Re: BUG #7556: "select not in sub query" plan very poor vs "not exists" - Mailing list pgsql-bugs

Previous

Next