Re: Convert NOT IN sublinks to anti-joins when safe - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Convert NOT IN sublinks to anti-joins when safe
Date
Msg-id CAMbWs49nvNcBaUXTw5_euodb7ONADwDULJ4Cxw5qurDXdurc+Q@mail.gmail.com
Whole thread Raw
In response to Re: Convert NOT IN sublinks to anti-joins when safe  (David Geier <geidav.pg@gmail.com>)
Responses Re: Convert NOT IN sublinks to anti-joins when safe
List pgsql-hackers
On Wed, Feb 4, 2026 at 11:59 PM David Geier <geidav.pg@gmail.com> wrote:
> If the sub-select can yield NULLs, the rewrite can be fixed by adding an
> OR t2.c1 IS NULL clause, such as:
>
> SELECT t1.c1 FROM t1 WHERE
>   NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL)

I'm not sure if this rewrite results in a better plan.  The OR clause
would force a nested loop join, which could be much slower than a
hashed-subplan plan.

> If the outer expression can yield NULLs, the rewrite can be fixed by
> adding a t1.c1 IS NOT NULL clause, such as:
>
> SELECT t1.c1 FROM T1 WHERE
>   t1.c1 IS NOT NULL AND
>   NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1)

This rewrite doesn't seem correct to me.  If t2 is empty, you would
incorrectly lose the NULL rows from t1 in the final result.

> What's our today's take on doing more involved transformations inside
> the planner to support such cases? It would greatly open up the scope of
> the optimization.

As mentioned in my initial email, the goal of this patch is not to
handle every possible case, but rather only to handle the basic form
where both sides of NOT IN are provably non-nullable.  This keeps the
code complexity to a minimum, and I believe this would cover the most
common use cases in real world.

- Richard



pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: pg_upgrade: fix memory leak in SLRU I/O code
Next
From: jian he
Date:
Subject: Re: CREATE TABLE LIKE INCLUDING TRIGGERS