Re: Anti join confusion - Mailing list pgsql-hackers

From wenhui qiu
Subject Re: Anti join confusion
Date
Msg-id CAGjGUAKJw6GQA-x=H3fKKyMUkqEcxy5EnwovqyOp7kh8fgy8Tw@mail.gmail.com
Whole thread Raw
In response to Re: Anti join confusion  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Anti join confusion
List pgsql-hackers
HI Tom , Tender Wang 
     But I saw the path test case are support , 
image.png
and my test case set the id is primary key also SubPlan
test=#  explain analyze SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID = 1000 );
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on join1 t1  (cost=8.45..37446.94 rows=610100 width=8) (actual time=0.071..209.875 rows=1220199 loops=1)
   Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
   Rows Removed by Filter: 1
   Buffers: shared hit=22190
   SubPlan 1
     ->  Index Only Scan using join2_pkey on join2 t2  (cost=0.42..8.44 rows=1 width=4) (actual time=0.027..0.029 rows=1 loops=1)
           Index Cond: (id = 1000)
           Heap Fetches: 0
           Buffers: shared hit=4
 Planning Time: 0.245 ms
 Execution Time: 247.906 ms
(11 rows)

test=#  explain  SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID > 10000 );
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.42..8414945643.30 rows=610100 width=8)
   Workers Planned: 2
   ->  Parallel Seq Scan on join1 t1  (cost=0.42..8414883633.30 rows=254208 width=8)
         Filter: (NOT (ANY (id = (SubPlan 1).col1)))
         SubPlan 1
           ->  Materialize  (cost=0.42..31028.21 rows=829768 width=4)
                 ->  Index Only Scan using join2_pkey on join2 t2  (cost=0.42..23637.37 rows=829768 width=4)
                       Index Cond: (id > 10000)
(8 rows)

test=#  explain  SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN (SELECT T2.id FROM join2 t2 where T2.ID < 10000 );
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Seq Scan on join1 t1  (cost=327.05..37765.54 rows=610100 width=8)
   Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
   SubPlan 1
     ->  Index Only Scan using join2_pkey on join2 t2  (cost=0.42..301.22 rows=10331 width=4)
           Index Cond: (id < 10000)
(5 rows)

test=#

On Mon, Feb 24, 2025 at 11:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
wenhui qiu <qiuwenhuifx@gmail.com> writes:
>    I found this path https://commitfest.postgresql.org/patch/3235/ already
> supports anti join , But I've found that in many cases it doesn't work.It
> always uses SubPlan Here's my testing process.

NOT IN is not convertible to an anti-join: the semantics are wrong
because of the way it treats nulls in the sub-select's output.

In principle you could do it if you could prove that the sub-select's
output is never null, but we need to do this transformation long
before we have enough information to make such a deduction.

                        regards, tom lane
Attachment

pgsql-hackers by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: Update docs for UUID data type
Next
From: Tender Wang
Date:
Subject: Re: Anti join confusion