Re: Convert NOT IN sublinks to anti-joins when safe - Mailing list pgsql-hackers
| From | wenhui qiu |
|---|---|
| Subject | Re: Convert NOT IN sublinks to anti-joins when safe |
| Date | |
| Msg-id | CAGjGUA+J+vkW1pM5JZdWMTwTFAiCE5ORrvORs-mGtXVPk5HRgg@mail.gmail.com Whole thread Raw |
| In response to | Convert NOT IN sublinks to anti-joins when safe (Richard Guo <guofenglinux@gmail.com>) |
| Responses |
Re: Convert NOT IN sublinks to anti-joins when safe
|
| List | pgsql-hackers |
Hi Richard
> I believe we are now in a much better position to attempt this again.
> The planner has accumulated significant infrastructure that makes this
> proof straightforward and reliable. Specifically, we can now leverage
> the outer-join-aware-Var infrastructure to tell whether a Var comes
> from the nullable side of an outer join, and the not-null-attnums hash
> The planner has accumulated significant infrastructure that makes this
> proof straightforward and reliable. Specifically, we can now leverage
> the outer-join-aware-Var infrastructure to tell whether a Var comes
> from the nullable side of an outer join, and the not-null-attnums hash
> table to efficiently check whether a Var is defined NOT NULL. We also
> have the expr_is_nonnullable() function that is smart enough to deduce
> non-nullability for expressions more complex than simple Vars/Consts.
> have the expr_is_nonnullable() function that is smart enough to deduce
> non-nullability for expressions more complex than simple Vars/Consts.
Thank you for working on this.Indeed, the benefits are substantial and highly necessary, as Oracle, SQL Server, and MySQL have all implemented varying degrees of support.I shall test this path in my spare time.
Thanks ,
On Tue, Feb 3, 2026 at 3:13 PM Richard Guo <guofenglinux@gmail.com> wrote:
This topic has been discussed several times in the past. Due to the
semantic mismatch regarding NULL handling, NOT IN is not ordinarily
equivalent to an anti-join. However, if we can prove that neither the
outer expressions nor the subquery outputs can yield NULL values, it
should be safe to convert NOT IN to an anti-join.
I believe we are now in a much better position to attempt this again.
The planner has accumulated significant infrastructure that makes this
proof straightforward and reliable. Specifically, we can now leverage
the outer-join-aware-Var infrastructure to tell whether a Var comes
from the nullable side of an outer join, and the not-null-attnums hash
table to efficiently check whether a Var is defined NOT NULL. We also
have the expr_is_nonnullable() function that is smart enough to deduce
non-nullability for expressions more complex than simple Vars/Consts.
Attached is a draft patch for this attempt (part of the code is
adapted from an old patch [1] by David and Tom). This patch aims for
a conservative implementation: the goal is not to handle every
theoretical case, but to handle canonical query patterns with minimal
code complexity.
The patch primarily targets patterns like:
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned_users);
... and
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM
banned_users WHERE user_id IS NOT NULL);
This is a very typical syntax for exclusion. In well-modeled
databases, join keys like id and user_id are very likely to be defined
as NOT NULL.
It seems to me that the ROI here is quite positive: the added code
complexity is very low (thanks to the existing infrastructure), while
the benefit is that users writing this typical pattern will finally
get efficient anti-join plans without needing manual rewrites.
(For the outer expressions, we could potentially also use outer query
quals to prove non-nullability. This patch does not attempt to do so.
Implementing this would require passing state down during the
pull_up_sublinks recursion; and given that find_nonnullable_vars can
fail to prove non-nullability in many cases due to the lack of
const-simplification at this stage, I'm not sure whether it is worth
the code complexity. Besides, I haven't fully convinced myself that
doing this does not introduce correctness issues.)
Any thoughts?
[1] https://postgr.es/m/13766.1405037879@sss.pgh.pa.us
- Richard
pgsql-hackers by date: