Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators - Mailing list pgsql-hackers

From Chengpeng Yan
Subject Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
Date
Msg-id 5CDAD4B4-6DC6-4C31-8AC1-1F45EA6ED480@outlook.com
Whole thread
In response to Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
> On Apr 20, 2026, at 11:46, David Rowley <dgrowleyml@gmail.com> wrote:
>
> Of course, it is possible to make the strict function do that, and
> non-hashed IN / NOT IN handles it, so the hashed version shouldn't
> have an excuse to not do the right thing.
>
> I've attached a version that "probes" the equality function for its
> NULL = NULL behaviour and its NULL = non-NULL behaviour and returns
> whatever the result of the probe was at the appropriate time.
>
> What I came up with does feel quite elaborate, so I'd quite like a 2nd opinion.
>
> The patch does assume that the non-strict function will return the
> same thing for NULL = non-NULL as it will for non-NULL = NULL.
> Technically, if you coded the function to do something different
> there, the hashed vs non-hashed could differ in their result. My
> thoughts there, if someone is expecting anything sane out of such an
> equality function, then they're probably going to be disappointed due
> to various other optimisations we have.

Hi,

Thanks for the discussion.

I agree with Tom's concern that it does not seem safe to generalize from
NULL = first-non-NULL to all non-NULL values. Unless I am missing one, I
do not know of a planner/executor-visible contract that would justify
that assumption.

For the original NULL-LHS bug, a linear fallback still seems like the
safest baseline fix to me. It is conservative, but it matches
ExecEvalScalarArrayOp() without adding extra assumptions. The obvious
downside is performance, although this path only triggers when the
runtime LHS is NULL and the comparator is non-strict. It may also be
possible to cache the NULL-LHS outcome once per expression, since the
RHS array is constant in the hashed SAOP case, which might help reduce
the cost of that fallback.

ChangAo's example also seems to expose a separate correctness issue. If
the comparator can return NULL even for non-NULL inputs, then a lookup
hit seems sufficient, but a miss is no longer enough to distinguish
FALSE for IN / TRUE for NOT IN from NULL.

A conservative fix there would again be a linear fallback after miss,
which should recover the right semantics, but that case does seem much
more performance-sensitive.

So I would be interested to hear what people think about both cases,
especially if there is a better way to preserve correctness without
paying the full linear-fallback cost.

--
Best regards,
Chengpeng Yan





pgsql-hackers by date:

Previous
From: Daniil Davydov
Date:
Subject: Re: Typos in the code and README
Next
From: Ayush Tiwari
Date:
Subject: Re: [PATCH] postmaster: fix stale PM_STARTUP comment