Re: Convert NOT IN sublinks to anti-joins when safe - Mailing list pgsql-hackers
| From | Zhang Mingli |
|---|---|
| Subject | Re: Convert NOT IN sublinks to anti-joins when safe |
| Date | |
| Msg-id | 3b45ceea-016e-475a-a2c0-321d52f4d7a7@Spark Whole thread Raw |
| In response to | Convert NOT IN sublinks to anti-joins when safe (Richard Guo <guofenglinux@gmail.com>) |
| Responses |
Re: Convert NOT IN sublinks to anti-joins when safe
|
| List | pgsql-hackers |
Hi,
On Feb 3, 2026 at 15:13 +0800, Richard Guo <guofenglinux@gmail.com>, wrote:
On Feb 3, 2026 at 15:13 +0800, Richard Guo <guofenglinux@gmail.com>, wrote:
This topic has been discussed several times in the past. Due to the
semantic mismatch regarding NULL handling, NOT IN is not ordinarily
equivalent to an anti-join. However, if we can prove that neither the
outer expressions nor the subquery outputs can yield NULL values, it
should be safe to convert NOT IN to an anti-join.
Thanks for working on this!
I've reviewed the patch and it looks good overall.
I noticed several minor issues in the test case comments:
1. The comment doesn't match the SQL:
+-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable
+-- by qual clause
+SELECT * FROM not_null_tab
+WHERE id NOT IN (
+ SELECT t1.id
+ FROM null_tab t1
+ INNER JOIN null_tab t2 ON t1.id = t2.id
+ LEFT JOIN null_tab t3 ON TRUE
+);
The comment says "forced nonnullable by qual clause", but there's no explicit IS NOT NULL qual here.
2. Another test case that could use a more precise comment:
+-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join,
+-- inner side is defined NOT NULL
+SELECT * FROM not_null_tab t1
+LEFT JOIN not_null_tab t2
+ON t2.id NOT IN (SELECT id FROM not_null_tab);
Correct me if I’m wrong.
This is a subtle case - the key point is that the ON clause is evaluated on actual t2 rows *before* LEFT JOIN's null-padding.
The current comment is technically correct but might be clearer as:
-- ANTI JOIN: outer side(t2) is defined NOT NULL.
— ON clause is evaluated on actual t2 rows before LEFT JOIN's
-- null-padding, so t2.id is NOT NULL; inner side is also defined NOT NULL
3.Also, one suggestion for additional test coverage - the case where the subquery output comes from the nullable side of an outer join but is forced non-nullable by qual:
-- ANTI JOIN: inner side comes from nullable side of outer join
-- but is forced non-nullable by WHERE clause
EXPLAIN (COSTS OFF)
SELECT * FROM not_null_tab
WHERE id NOT IN (
SELECT t2.id
FROM not_null_tab t1
LEFT JOIN not_null_tab t2 ON t1.id = t2.id
WHERE t2.id IS NOT NULL
);
The existing tests cover t1.id (non-nullable side) with IS NOT NULL, but not t2.id (nullable side).
If I read the code correctly, this should work via find_subquery_safe_quals + find_nonnullable_vars, but explicit coverage would be good.
And I test it:
QUERY PLAN
----------------------------------------------------
Hash Anti Join
Hash Cond: (not_null_tab.id = t2.id)
-> Seq Scan on not_null_tab
-> Hash
-> Merge Join
Merge Cond: (t2.id = t1.id)
-> Sort
Sort Key: t2.id
-> Seq Scan on null_tab t2
Filter: (id IS NOT NULL)
-> Sort
Sort Key: t1.id
-> Seq Scan on null_tab t1
(13 rows)
--
Zhang Mingli
HashData
pgsql-hackers by date: