Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's - Mailing list pgsql-hackers

From David Rowley
Subject Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
Date
Msg-id CAKJS1f_=AYV3bV-ToqLH06D8sAA7nPUz2Ujb-Uug=qouXVZcxg@mail.gmail.com
Whole thread Raw
In response to Proving IS NOT NULL inference for ScalarArrayOpExpr's  (James Coleman <jtc331@gmail.com>)
Responses Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
List pgsql-hackers
On Sun, 11 Nov 2018 at 10:33, James Coleman <jtc331@gmail.com> wrote:
> At first I was imagining having the parse keep track of whether an array const
> expr contained any nulls and perhaps adding generated quals (in an equivalence
> class?) to allow the planner to easily prove the index was correct. I'd been
> going down this track because in my mind the issue was because the planner
> needed to verify whether all of the array elements were not null.
>
> But as I started to dig into the predtest.c NOT NULL proofs and add test cases,
> I realized that at least in many normal op cases we can safely infer that foo
> is not null when "foo <op> <array>" is true even if the array contains null
> elements.
>
> This is such a simple change that it seems like I must be missing a case where
> the above doesn't hold true, but I can't immediately think of any, and indeed
> with the attached patch all existing tests pass (including some additional
> ones I added for predtest to play around with it).
>
> Am I missing something obvious? Is this a valid approach?

I started looking at this patch and I see that there's no check to
ensure that the IS NOT NULL pred matches the left operand in the IN()
clause.

Here's a demo of why this is bad:

create table t (a int, b int);
create index t_a_is_not_null on t(a) where a is not null;
insert into t values(null,1); -- not indexed
set enable_seqscan=0; -- force index to be used whenever possible
select * from t where b

in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101);
-- row missing!
 a | b
---+---
(0 rows)

explain select * from t where b

in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98
,99,100,101); -- should not use t_a_is_not_null index.


                    QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----
 Bitmap Heap Scan on t  (cost=4.42..320.84 rows=1141 width=8)
   Recheck Cond: (a IS NOT NULL)
   Filter: (b = ANY

('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101}'::integer[]))
   ->  Bitmap Index Scan on t_a_is_not_null  (cost=0.00..4.13 rows=2249 width=0)
(4 rows)

set enable_seqscan=1;
SET
select * from t where b

in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101);
-- Correct!
 a | b
---+---
   | 1
(1 row)

Basically, the planner assumes that the WHERE a IS NOT NULL index
implies WHERE b IN(1,...,101), which is definitely not the case.

Probably your new code needs to be expanded to become:

if (IsA(clause, ScalarArrayOpExpr))
{
    ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
    if (op_strict(saop->opno) &&
        clause_is_strict_for((Node *) linitial(saop->args), subexpr))
        return true;
}


> Other outstanding questions:
>
> Should I add additional tests for predtest? It already seems to cover some null
> test cases with scalar array ops, but I'd be happy to add more if desired.
>
> Should I add a test case for the resulting plan with "foo IN (...)" with an
> array with more than 100 elements?

I've not looked in detail, but perhaps the place to put the tests are
in src/test/modules/test_predtest.  This module adds a function named
test_predtest() that you can likely use more easily than trying to
EXPLAIN plans and hoping the planner's behaviour helps to exhibit the
behaviour of the changed code.

I'll mark this as waiting on author.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Paul Martinez
Date:
Subject: PATCH: Include all columns in default names for foreign key constraints.
Next
From: David Rowley
Date:
Subject: Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's