On 09/19/2012 01:48 PM, l1t@tom.com wrote:
> The following bug has been logged on the website:
>
> Bug reference: 7556
> Logged by: lt
> Email address: l1t@tom.com
> PostgreSQL version: 9.2.0
> Operating system: windows xp
> Description:
>
create table sli_test (id int primary key,info varchar(20));
insert into sli_test select
generate_series(1,1000000),'digoal'||generate_series(1,1000000);
analyze verbose sli_test;
create table sli_test2 (id int not null,info varchar(20));
insert into sli_test2 select
generate_series(1,1000000),'dbase'||generate_series(1,1000000);
analyze verbose sli_test2;
explain select max(a.info)from sli_test a where a.id not in(select
b.id from sli_test2 b where b.id<50000);
> QUERY PLAN
> ---------------------------------------------------------------------------------------
> Aggregate (cost=9241443774.00..9241443774.01 rows=1 width=12)
Here's what I get on 9.1:
regress=# explain select max(a.info)from sli_test a where a.id not in(select
regress(# b.id from sli_test2 b where b.id<50000);
QUERY PLAN
---------------------------------------------------------------------------------
Aggregate (cost=38050.82..38050.83 rows=1 width=12)
-> Seq Scan on sli_test a (cost=18026.82..36800.82 rows=500000
width=12)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Seq Scan on sli_test2 b (cost=0.00..17906.00 rows=48329
width=4)
Filter: (id < 50000)
(6 rows)
It runs in about 500ms here.
You don't appear to have posted the full query plan, so it's hard to
compare.
In general, `NOT IN` is a poor formulation for a query; you're better
off with a JOIN or with `NOT EXISTS`. See eg
http://stackoverflow.com/questions/12444142/postgresql-how-to-figure-out-missing-numbers-in-a-column-using-generate-series/12444165#12444165
--
Craig Ringer