I've recently been investigating improving our plans for queries like:
SELECT * FROM t WHERE t.foo IN (1, 2..1000);
where the table "t" has a partial index on "foo" where "foo IS NOT NULL".
Currently the planner generates an index [only] scan so long as the number of
items in the IN expression is <= 100, but as soon as you add the 101st item
it reverts to seq scan. If we add the explicit null check like:
SELECT * FROM t WHERE t.foo IN (1, 2..1000) AND foo IS NOT NULL;
then we go back to the desired index scan.
This occurs because predtest.c stops expanding ScalarArrayOpExpr's with
constant array arguments into OR trees when the array size is > 100. The rest
of the predicate proving code then becomes unable to infer that foo is not null
and therefore the planner cannot prove that the partial index is correct to
use.
(Please pardon technical details in the below background that may be way off;
I don't have a lot of experience with the Postgres codebase yet, and am still
trying to build a mental model of things.)
At first I was imagining having the parse keep track of whether an array const
expr contained any nulls and perhaps adding generated quals (in an equivalence
class?) to allow the planner to easily prove the index was correct. I'd been
going down this track because in my mind the issue was because the planner
needed to verify whether all of the array elements were not null.
But as I started to dig into the predtest.c NOT NULL proofs and add test cases,
I realized that at least in many normal op cases we can safely infer that foo
is not null when "foo <op> <array>" is true even if the array contains null
elements.
This is such a simple change that it seems like I must be missing a case where
the above doesn't hold true, but I can't immediately think of any, and indeed
with the attached patch all existing tests pass (including some additional
ones I added for predtest to play around with it).
Am I missing something obvious? Is this a valid approach?
Other outstanding questions:
Should I add additional tests for predtest? It already seems to cover some null
test cases with scalar array ops, but I'd be happy to add more if desired.
Should I add a test case for the resulting plan with "foo IN (...)" with an
array with more than 100 elements?
Thanks,
James Coleman