Thread: RE:Re: BUG #7556 addition info

RE:Re: BUG #7556 addition info

From
l1t@tom.com
Date:

I downloaded the latest version,and the cost of "not in" is much higher than that of "not exist". please see attachment for detail.
As the time of query is very long,I didn't get the explain analyze result.
I think the id columns of table a and b are not null, so the query of "not in" and "not exists" are  equal,they should use similar plans.
I notice there is a "Materialize" step in my bad plans of "not in",but there isn't in your following plan. I wonder how to get the plan you posted here, are there any configure argument or command?
egress=# explain select max(a.info)from sli_test a where a.id not in(select
regress(# b.id from sli_test2 b where b.id<50000);
QUERY PLAN
---------------------------------------------------------------------------------
Aggregate  (cost=38050.82..38050.83 rows=1 width=12)
-> Seq Scan on sli_test a (cost=18026.82..36800.82 rows=500000 width=12)
         Filter: (NOT (hashed SubPlan 1))
         SubPlan 1
-> Seq Scan on sli_test2 b (cost=0.00..17906.00 rows=48329 width=4)
                 Filter: (id < 50000)
(6 rows)

It runs in about 500ms here.
and I got following plan on kingbase 7.1,a modified version of postgres product.
TEST=# explain select max(a.name)from a where a.id not in(select b.id from b where b.id<50000);
                                              QUERY PLAN
--------------------------------------------------------------------------------
-----------------------
Aggregate  (cost=72541.71..72541.72 rows=1 width=12)
   ->  Merge Anti Join  (cost=0.00..70041.71 rows=1000000 width=12)
         Merge Cond: (A.ID = B.ID)
         ->  Index Scan using A_PKEY on A  (cost=0.00..32257.36 rows=1000000 width=16)
         ->  Index Scan using B_ID on  B  (cost=0.00..31117.69 rows=333333 width=4)
               Index Cond: (ID < 50000)
(6 行)
时间: 20.095 ms
 




Re: RE:Re: BUG #7556 addition info

From
Tom Lane
Date:
l1t@tom.com writes:
> I think the id columns of table a and b are not null, so the query of "not in" and "not exists" are  equal,they
shoulduse similar plans. 

NOT IN and NOT EXISTS are *not* equivalent.  Per SQL standard, NOT IN
has different (and usually not very desirable) behavior with NULL
values.  The spec-mandated behavior makes it a lot harder to optimize
nicely --- and since you usually don't want what it does anyway, we
haven't tried hard.

            regards, tom lane