Re: Reduce planning time for large NOT IN lists containing NULL - Mailing list pgsql-hackers

From Ilia Evdokimov
Subject Re: Reduce planning time for large NOT IN lists containing NULL
Date
Msg-id 4c761b02-5a60-4076-aa0c-9c6fef06e2c1@tantorlabs.com
Whole thread Raw
In response to Re: Reduce planning time for large NOT IN lists containing NULL  (David Geier <geidav.pg@gmail.com>)
Responses Re: Reduce planning time for large NOT IN lists containing NULL
List pgsql-hackers

On 2/24/26 11:29, David Geier wrote:

Using array_contains_nulls() seems fine. In case the IN list doesn't
contain NULL, the function can immediately bail thanks to the
!ARR_HASNULL() check in the beginning.

It only needs to iterate over the NULL-bitmap, if it exists. This is the
case if there's actually a NULL element in the array, or if the array
initially contained NULL and all NULLs got removed subsequently.

If we ever find the latter case to matter we could remove the
NULL-bitmap in array_set_element() / array_set_element_expanded(), when
the last NULL element got removed.

Could you clarify what exactly this additional test meant to verify?
Zsolt's test case creates an array that initially contains NULL. The
NULL element is subsequently replaced by a non-NULL value but
array_set_element_expanded() keeps the NULL-bitmap around. With that,
your ARR_ISNULL() check bails and causes the selectivity estimation to
incorrectly return 0.

Ah, right - thanks for the clarification. I agree.

Regarding the regression test: the suggestion test case is good, but there is not a straightforward way to expose the estimated row count without also showing the costs, and costs are unstable. To avoid that, I reused the parsing approach already present in stats_ext.sql to extract only the estimated row count from EXPLAIN.

Since the test table contains exactly 1000 rows and we run ANALYZE, all rows are included in the statistics sample. Therefore the estimate for x <> ALL(array[1, 99, 2]) is deterministically 997 rows, and the test stable and ensures we detect the incorrect early-zero estimate.

Let me know if you'd prefer a different approach. I've attached v4 patch.

-- 
Best regards, 
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Optimize SELECT * in EXISTS
Next
From: Tom Lane
Date:
Subject: Re: Cleaning up array_ref() and array_set()