Re: Convert NOT IN sublinks to anti-joins when safe - Mailing list pgsql-hackers

From wenhui qiu
Subject Re: Convert NOT IN sublinks to anti-joins when safe
Date
Msg-id CAGjGUAJNHuj3X8VB1yaRV5qaCWU1WgzMVt2yXeoEAXi-LdPZcA@mail.gmail.com
Whole thread Raw
In response to Re: Convert NOT IN sublinks to anti-joins when safe  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
HI Richard 
> As mentioned in my initial email, the goal of this patch is not to
> handle every possible case, but rather only to handle the basic form
> where both sides of NOT IN are provably non-nullable.  This keeps the
> code complexity to a minimum, and I believe this would cover the most
> common use cases in real world.
Agree +1  ,The current path already covers common scenarios and is no less comprehensive than other databases.I'm already quite pleased that it can be merged.
Having tested a certain widely used open-source database, I found it unable to process the following query: `SELECT * FROM join1 WHERE id NOT IN (SELECT id FROM join2 WHERE id IS NOT NULL);` Note that join2 allows null values for id.


Thanks


On Thu, Feb 5, 2026 at 2:09 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Feb 4, 2026 at 11:59 PM David Geier <geidav.pg@gmail.com> wrote:
> If the sub-select can yield NULLs, the rewrite can be fixed by adding an
> OR t2.c1 IS NULL clause, such as:
>
> SELECT t1.c1 FROM t1 WHERE
>   NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL)

I'm not sure if this rewrite results in a better plan.  The OR clause
would force a nested loop join, which could be much slower than a
hashed-subplan plan.

> If the outer expression can yield NULLs, the rewrite can be fixed by
> adding a t1.c1 IS NOT NULL clause, such as:
>
> SELECT t1.c1 FROM T1 WHERE
>   t1.c1 IS NOT NULL AND
>   NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1)

This rewrite doesn't seem correct to me.  If t2 is empty, you would
incorrectly lose the NULL rows from t1 in the final result.

> What's our today's take on doing more involved transformations inside
> the planner to support such cases? It would greatly open up the scope of
> the optimization.

As mentioned in my initial email, the goal of this patch is not to
handle every possible case, but rather only to handle the basic form
where both sides of NOT IN are provably non-nullable.  This keeps the
code complexity to a minimum, and I believe this would cover the most
common use cases in real world.

- Richard


pgsql-hackers by date:

Previous
From: lakshmi
Date:
Subject: Re: parallel data loading for pgbench -i
Next
From: Peter Smith
Date:
Subject: Re: Warn when creating or enabling a subscription with max_logical_replication_workers = 0