Thread: Anti join confusion
Hi Richard Guo
I found this path https://commitfest.postgresql.org/patch/3235/ already supports anti join , But I've found that in many cases it doesn't work.It always uses SubPlan Here's my testing process.
###########
create table join1 (id integer,name varchar(300),k1 integer);
create table join2 (id integer,name varchar(300),score integer);
insert into join1 values ( generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(150201,1350300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join2 values ( generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',1);
insert into join2 values ( generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAABBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',2);
insert into join2 values ( generate_series(20001,22000),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
insert into join2 values ( generate_series(150201,950300),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
create index idx_j1 on join1(id);
create index idx_j2 on join2(id);
VACUUM ANALYZE JOIN1;
VACUUM ANALYZE JOIN2;
create table join1 (id integer,name varchar(300),k1 integer);
create table join2 (id integer,name varchar(300),score integer);
insert into join1 values ( generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(150201,1350300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join2 values ( generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',1);
insert into join2 values ( generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAABBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',2);
insert into join2 values ( generate_series(20001,22000),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
insert into join2 values ( generate_series(150201,950300),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
create index idx_j1 on join1(id);
create index idx_j2 on join2(id);
VACUUM ANALYZE JOIN1;
VACUUM ANALYZE JOIN2;
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 WHERE T2.ID>10000)
test-# ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..9016319078.86 rows=630150 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..9016255063.86 rows=262562 width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..32181.54 rows=863294 width=4)
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..24492.07 rows=863294 width=4)
Index Cond: (id > 10000)
(8 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 );
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..8633476697.61 rows=630150 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..8633412682.61 rows=262562 width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..30676.42 rows=882100 width=4)
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..22819.92 rows=882100 width=4)
(7 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID < 1000);
QUERY PLAN
----------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=61.73..38730.47 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..57.06 rows=1865 width=4)
Index Cond: (id < 1000)
(5 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID = 1000);
QUERY PLAN
------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=4.45..38673.19 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..4.44 rows=1 width=4)
Index Cond: (id = 1000)
(5 rows)
test-# ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..9016319078.86 rows=630150 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..9016255063.86 rows=262562 width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..32181.54 rows=863294 width=4)
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..24492.07 rows=863294 width=4)
Index Cond: (id > 10000)
(8 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 );
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..8633476697.61 rows=630150 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..8633412682.61 rows=262562 width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..30676.42 rows=882100 width=4)
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..22819.92 rows=882100 width=4)
(7 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID < 1000);
QUERY PLAN
----------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=61.73..38730.47 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..57.06 rows=1865 width=4)
Index Cond: (id < 1000)
(5 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID = 1000);
QUERY PLAN
------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=4.45..38673.19 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..4.44 rows=1 width=4)
Index Cond: (id = 1000)
(5 rows)
Thanks
wenhui qiu <qiuwenhuifx@gmail.com> 于2025年2月24日周一 09:48写道:
Hi Richard GuoI found this path https://commitfest.postgresql.org/patch/3235/ already supports anti join , But I've found that in many cases it doesn't work.It always uses SubPlan Here's my testing process.###########
create table join1 (id integer,name varchar(300),k1 integer);
create table join2 (id integer,name varchar(300),score integer);
insert into join1 values ( generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(1,20000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(50201,50300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join1 values ( generate_series(150201,1350300),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAASSSSSAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',10);
insert into join2 values ( generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',1);
insert into join2 values ( generate_series(1,40000),'aaaaaaaaaaaaaaaaAAAAAAABBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',2);
insert into join2 values ( generate_series(20001,22000),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
insert into join2 values ( generate_series(150201,950300),'aaaaaaaaaaaaaaaaAACCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaaaaaaaAAAAAAAAAAAAAAAAAAAAA',3);
create index idx_j1 on join1(id);
create index idx_j2 on join2(id);
VACUUM ANALYZE JOIN1;
VACUUM ANALYZE JOIN2;test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 WHERE T2.ID>10000)
test-# ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..9016319078.86 rows=630150 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..9016255063.86 rows=262562 width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..32181.54 rows=863294 width=4)
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..24492.07 rows=863294 width=4)
Index Cond: (id > 10000)
(8 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 );
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..8633476697.61 rows=630150 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..8633412682.61 rows=262562 width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..30676.42 rows=882100 width=4)
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..22819.92 rows=882100 width=4)
(7 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID < 1000);
QUERY PLAN
----------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=61.73..38730.47 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..57.06 rows=1865 width=4)
Index Cond: (id < 1000)
(5 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID = 1000);
QUERY PLAN
------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=4.45..38673.19 rows=630150 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using idx_j2 on join2 t2 (cost=0.42..4.44 rows=1 width=4)
Index Cond: (id = 1000)
(5 rows)
Planner now doesn't support pulling up the "NOT IN" sublink. The "NOT IN" sublink will be transformed into SubPlan.
Thanks,
Tender Wang
wenhui qiu <qiuwenhuifx@gmail.com> writes: > I found this path https://commitfest.postgresql.org/patch/3235/ already > supports anti join , But I've found that in many cases it doesn't work.It > always uses SubPlan Here's my testing process. NOT IN is not convertible to an anti-join: the semantics are wrong because of the way it treats nulls in the sub-select's output. In principle you could do it if you could prove that the sub-select's output is never null, but we need to do this transformation long before we have enough information to make such a deduction. regards, tom lane
HI Tom , Tender Wang 
But I saw the path test case are support ,

and my test case set the id is primary key also SubPlan
test=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID = 1000 );
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=8.45..37446.94 rows=610100 width=8) (actual time=0.071..209.875 rows=1220199 loops=1)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
Rows Removed by Filter: 1
Buffers: shared hit=22190
SubPlan 1
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..8.44 rows=1 width=4) (actual time=0.027..0.029 rows=1 loops=1)
Index Cond: (id = 1000)
Heap Fetches: 0
Buffers: shared hit=4
Planning Time: 0.245 ms
Execution Time: 247.906 ms
(11 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID > 10000 );
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..8414945643.30 rows=610100 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..8414883633.30 rows=254208 width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..31028.21 rows=829768 width=4)
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..23637.37 rows=829768 width=4)
Index Cond: (id > 10000)
(8 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID < 10000 );
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=327.05..37765.54 rows=610100 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..301.22 rows=10331 width=4)
Index Cond: (id < 10000)
(5 rows)
test=#
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=8.45..37446.94 rows=610100 width=8) (actual time=0.071..209.875 rows=1220199 loops=1)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
Rows Removed by Filter: 1
Buffers: shared hit=22190
SubPlan 1
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..8.44 rows=1 width=4) (actual time=0.027..0.029 rows=1 loops=1)
Index Cond: (id = 1000)
Heap Fetches: 0
Buffers: shared hit=4
Planning Time: 0.245 ms
Execution Time: 247.906 ms
(11 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID > 10000 );
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..8414945643.30 rows=610100 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..8414883633.30 rows=254208 width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..31028.21 rows=829768 width=4)
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..23637.37 rows=829768 width=4)
Index Cond: (id > 10000)
(8 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID < 10000 );
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=327.05..37765.54 rows=610100 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..301.22 rows=10331 width=4)
Index Cond: (id < 10000)
(5 rows)
test=#
On Mon, Feb 24, 2025 at 11:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
wenhui qiu <qiuwenhuifx@gmail.com> writes:
> I found this path https://commitfest.postgresql.org/patch/3235/ already
> supports anti join , But I've found that in many cases it doesn't work.It
> always uses SubPlan Here's my testing process.
NOT IN is not convertible to an anti-join: the semantics are wrong
because of the way it treats nulls in the sub-select's output.
In principle you could do it if you could prove that the sub-select's
output is never null, but we need to do this transformation long
before we have enough information to make such a deduction.
regards, tom lane
Attachment
wenhui qiu <qiuwenhuifx@gmail.com> 于2025年2月24日周一 15:10写道:
HI Tom , Tender WangBut I saw the path test case are support ,
Hi wenhui,
In the above picture, it is NOT EXISTS sub-query, which can be pulled up, but NOT IN sub-query, which can not now.
As Tom said:
"NOT IN is not convertible to an anti-join: the semantics are wrong
because of the way it treats nulls in the sub-select's output."and my test case set the id is primary key also SubPlantest=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID = 1000 );
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=8.45..37446.94 rows=610100 width=8) (actual time=0.071..209.875 rows=1220199 loops=1)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
Rows Removed by Filter: 1
Buffers: shared hit=22190
SubPlan 1
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..8.44 rows=1 width=4) (actual time=0.027..0.029 rows=1 loops=1)
Index Cond: (id = 1000)
Heap Fetches: 0
Buffers: shared hit=4
Planning Time: 0.245 ms
Execution Time: 247.906 ms
(11 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID > 10000 );
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..8414945643.30 rows=610100 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..8414883633.30 rows=254208 width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..31028.21 rows=829768 width=4)
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..23637.37 rows=829768 width=4)
Index Cond: (id > 10000)
(8 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID < 10000 );
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=327.05..37765.54 rows=610100 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..301.22 rows=10331 width=4)
Index Cond: (id < 10000)
(5 rows)
test=#
Thanks,
Tender Wang
Attachment
Hi Tender Wang
Thank you very much for the analysis.Do we have plans for NOT IN subquery pull up?
Thanks
On Mon, Feb 24, 2025 at 4:00 PM Tender Wang <tndrwang@gmail.com> wrote:
wenhui qiu <qiuwenhuifx@gmail.com> 于2025年2月24日周一 15:10写道:HI Tom , Tender WangBut I saw the path test case are support ,Hi wenhui,In the above picture, it is NOT EXISTS sub-query, which can be pulled up, but NOT IN sub-query, which can not now.As Tom said:"NOT IN is not convertible to an anti-join: the semantics are wrongbecause of the way it treats nulls in the sub-select's output."and my test case set the id is primary key also SubPlantest=# explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID = 1000 );
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=8.45..37446.94 rows=610100 width=8) (actual time=0.071..209.875 rows=1220199 loops=1)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
Rows Removed by Filter: 1
Buffers: shared hit=22190
SubPlan 1
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..8.44 rows=1 width=4) (actual time=0.027..0.029 rows=1 loops=1)
Index Cond: (id = 1000)
Heap Fetches: 0
Buffers: shared hit=4
Planning Time: 0.245 ms
Execution Time: 247.906 ms
(11 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID > 10000 );
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Gather (cost=1000.42..8414945643.30 rows=610100 width=8)
Workers Planned: 2
-> Parallel Seq Scan on join1 t1 (cost=0.42..8414883633.30 rows=254208 width=8)
Filter: (NOT (ANY (id = (SubPlan 1).col1)))
SubPlan 1
-> Materialize (cost=0.42..31028.21 rows=829768 width=4)
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..23637.37 rows=829768 width=4)
Index Cond: (id > 10000)
(8 rows)
test=# explain SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID < 10000 );
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on join1 t1 (cost=327.05..37765.54 rows=610100 width=8)
Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
SubPlan 1
-> Index Only Scan using join2_pkey on join2 t2 (cost=0.42..301.22 rows=10331 width=4)
Index Cond: (id < 10000)
(5 rows)
test=#--Thanks,Tender Wang
Attachment
On Mon, 2025-02-24 at 17:12 +0800, wenhui qiu wrote: > Do we have plans for NOT IN subquery pull up? As mentioned before, that is not possible. Best practice is to avoid NOT IN with subqueries altogether. The result is usually not what people expect it to be. Yours, Laurenz Albe -- *E-Mail Disclaimer* Der Inhalt dieser E-Mail ist ausschliesslich fuer den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen. *CONFIDENTIALITY NOTICE & DISCLAIMER *This message and any attachment are confidential and may be privileged or otherwise protected from disclosure and solely for the use of the person(s) or entity to whom it is intended. If you have received this message in error and are not the intended recipient, please notify the sender immediately and delete this message and any attachment from your system. If you are not the intended recipient, be advised that any use of this message is prohibited and may be unlawful, and you must not copy this message or attachment or disclose the contents to any other person.
Hi Laurenz
Actually ,Many fork postgresql databases have already implemented ,For example, if the relevant field has a non-null constraint ,Many databases can do the same thing as not exist ( MySQL ,SQL Server,Oracle)
Thanks
Actually ,Many fork postgresql databases have already implemented ,For example, if the relevant field has a non-null constraint ,Many databases can do the same thing as not exist ( MySQL ,SQL Server,Oracle)
Thanks
On Mon, Feb 24, 2025 at 7:55 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Mon, 2025-02-24 at 17:12 +0800, wenhui qiu wrote:
> Do we have plans for NOT IN subquery pull up?
As mentioned before, that is not possible.
Best practice is to avoid NOT IN with subqueries altogether.
The result is usually not what people expect it to be.
Yours,
Laurenz Albe
--
*E-Mail Disclaimer*
Der Inhalt dieser E-Mail ist ausschliesslich fuer den
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat
dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte,
dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich
in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.
*CONFIDENTIALITY NOTICE & DISCLAIMER
*This message and any attachment are
confidential and may be privileged or otherwise protected from disclosure
and solely for the use of the person(s) or entity to whom it is intended.
If you have received this message in error and are not the intended
recipient, please notify the sender immediately and delete this message and
any attachment from your system. If you are not the intended recipient, be
advised that any use of this message is prohibited and may be unlawful, and
you must not copy this message or attachment or disclose the contents to
any other person.
On Mon, Feb 24, 2025 at 8:08 AM wenhui qiu <qiuwenhuifx@gmail.com> wrote: > Actually ,Many fork postgresql databases have already implemented ,For example, if the relevant field has a non-nullconstraint ,Many databases can do the same thing as not exist ( MySQL ,SQL Server,Oracle) I'm not surprised to hear it. Long-time PostgreSQL users just don't use NOT IN, so it's fine, but anyone coming from another database gets hosed. I think it would be good to put some effort into improving this area, but I do not have time to work on it myself. -- Robert Haas EDB: http://www.enterprisedb.com