Re: Convert NOT IN sublinks to anti-joins when safe - Mailing list pgsql-hackers

From Zhang Mingli
Subject Re: Convert NOT IN sublinks to anti-joins when safe
Date
Msg-id 3b45ceea-016e-475a-a2c0-321d52f4d7a7@Spark
Whole thread Raw
In response to Convert NOT IN sublinks to anti-joins when safe  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Convert NOT IN sublinks to anti-joins when safe
List pgsql-hackers
Hi,

On Feb 3, 2026 at 15:13 +0800, Richard Guo <guofenglinux@gmail.com>, wrote:

This topic has been discussed several times in the past. Due to the
semantic mismatch regarding NULL handling, NOT IN is not ordinarily
equivalent to an anti-join. However, if we can prove that neither the
outer expressions nor the subquery outputs can yield NULL values, it
should be safe to convert NOT IN to an anti-join.


Thanks for working on this!
I've reviewed the patch and it looks good overall. 

I noticed several minor issues in the test case comments:

1. The comment doesn't match the SQL:

+-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable
+-- by qual clause
+SELECT * FROM not_null_tab
+WHERE id NOT IN (
+    SELECT t1.id
+    FROM null_tab t1
+    INNER JOIN null_tab t2 ON t1.id = t2.id
+    LEFT JOIN null_tab t3 ON TRUE
+);

The comment says "forced nonnullable by qual clause", but there's no explicit IS NOT NULL qual here. 

2. Another test case that could use a more precise comment:

+-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join,
+-- inner side is defined NOT NULL
+SELECT * FROM not_null_tab t1
+LEFT JOIN not_null_tab t2
+ON t2.id NOT IN (SELECT id FROM not_null_tab);

Correct me if I’m wrong.
This is a subtle case - the key point is that the ON clause is evaluated on actual t2 rows *before* LEFT JOIN's null-padding. 
The current comment is technically correct but might be clearer as:

-- ANTI JOIN: outer side(t2) is defined NOT NULL.
— ON clause is evaluated on actual t2 rows before LEFT JOIN's
-- null-padding, so t2.id is NOT NULL; inner side is also defined NOT NULL


3.Also, one suggestion for additional test coverage - the case where the subquery output comes from the nullable side of an outer join but is forced non-nullable by qual:

-- ANTI JOIN: inner side comes from nullable side of outer join
-- but is forced non-nullable by WHERE clause
EXPLAIN (COSTS OFF)
SELECT * FROM not_null_tab
WHERE id NOT IN (
    SELECT t2.id
    FROM not_null_tab t1
    LEFT JOIN not_null_tab t2 ON t1.id = t2.id
    WHERE t2.id IS NOT NULL
);

The existing tests cover t1.id (non-nullable side) with IS NOT NULL, but not t2.id (nullable side). 

If I read the code correctly, this should work via find_subquery_safe_quals + find_nonnullable_vars, but explicit coverage would be good.

And I test it:
                     QUERY PLAN
----------------------------------------------------
 Hash Anti Join
   Hash Cond: (not_null_tab.id = t2.id)
   ->  Seq Scan on not_null_tab
   ->  Hash
         ->  Merge Join
               Merge Cond: (t2.id = t1.id)
               ->  Sort
                     Sort Key: t2.id
                     ->  Seq Scan on null_tab t2
                           Filter: (id IS NOT NULL)
               ->  Sort
                     Sort Key: t1.id
                     ->  Seq Scan on null_tab t1
(13 rows)




--
Zhang Mingli
HashData

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: [Proposal] Adding Log File Capability to pg_createsubscriber
Next
From: Masahiko Sawada
Date:
Subject: Re: [19] CREATE SUBSCRIPTION ... SERVER