On Wed, Feb 26, 2025 at 4:46 AM Richard Guo <guofenglinux@gmail.com> wrote:
> I agree that it'd be beneficial to make some improvements to NOT IN
> subqueries.  From what I can see, we may have two potential options:
>
> * As Tom mentioned, we can prove that the subquery's output never
> contains NULL values and then convert the NOT IN into an anti-join.
> (It seems to me that we would also need to prove that the outer side
> never contains NULL values either, because whether the NULL values
> from the outer side should be included in the output depends on
> whether the inner side is empty.)
>
> * We can add support in the executor to handle the NULL semantics of
> NOT IN.  This may require inventing a new join type.
>
> I'm not quite sure which option is more promising at the moment, or if
> there are other options to consider.
I'm not quite sure, either. I think that the first option (proving
that there can be no NULL values) is probably more similar to
thingswe've done elsewhere in the planner, so I think I have been
assuming that if we did something about this, it would be that.
However, I have also had the idea of extending the executor to handle
this situation, and it's possible that option is more promising. I
don't really know. I think it's pretty common to have NOT IN clauses
where one can be certain that there definitely isn't a NULL present,
but unfortunately I think it's also pretty common to have cases where
a NULL could be present, or at least, where one cannot provide that no
NULL can be present.
--
Robert Haas
EDB: http://www.enterprisedb.com