Thread: constraint exclusion and nulls in IN (..) clause

constraint exclusion and nulls in IN (..) clause

From

Amit Langote

Date:

01 February 2018, 09:56:24

Hi.

When addressing a review comment on the fast partition pruning thread [1],
I noticed that specifying null in the IN-list will cause constraint
exclusion to wrongly fail to refute a table's check predicate.

create table foo (a int check (a = 1));
insert into foo values (null), (1);

-- ExecEvalScalarArrayOp() won't return true for any record in foo
select * from foo where a in (null, 2);
 a
---
(0 rows)

-- The null in the IN-list prevents constraint exclusion to exclude foo
-- from being scanned in the first place
explain (costs off) select * from foo where a in (null, 2);
                 QUERY PLAN
---------------------------------------------
 Seq Scan on foo
   Filter: (a = ANY ('{NULL,2}'::integer[]))
(2 rows)

I propose a patch that teaches predtest.c to disregard any null values in
a SAOP (i.e., the IN (..) expression) when performing constraint exclusion
using that SAOP, because they cause predicate_refuted_by_recurse()'s logic
to fail to conclude the refutation.  There is a rule that all items of an
OR clause (SAOP is treated as one) must refute the predicate.  But the
OpExpr constructed with null as its constant argument won't refute
anything and thus will cause the whole OR clause to fail to refute the
predicate.

-- With the patch
explain (costs off) select * from foo where a in (null, 2);
        QUERY PLAN
--------------------------
 Result
   One-Time Filter: false
(2 rows)

explain (costs off) select * from foo where a in (null, 2, 1);
                  QUERY PLAN
-----------------------------------------------
 Seq Scan on foo
   Filter: (a = ANY ('{NULL,2,1}'::integer[]))
(2 rows)

Thoughts?

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/64241fee-09af-fe4b-a0ab-7cd739965041%40lab.ntt.co.jp

Attachment

v1-0001-Disregard-nulls-in-SAOP-rightarg-array-list-durin.patch

Re: constraint exclusion and nulls in IN (..) clause

From

Ashutosh Bapat

Date:

01 February 2018, 10:40:21

On Thu, Feb 1, 2018 at 12:26 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> Hi.
>
> When addressing a review comment on the fast partition pruning thread [1],
> I noticed that specifying null in the IN-list will cause constraint
> exclusion to wrongly fail to refute a table's check predicate.
>
> create table foo (a int check (a = 1));
> insert into foo values (null), (1);
>
> -- ExecEvalScalarArrayOp() won't return true for any record in foo
> select * from foo where a in (null, 2);
>  a
> ---
> (0 rows)

AFAIU, that's true only when = operator is strict. For a non-strict
operator which treats two NULL values as equivalent it would return
row with NULL value.

>
> -- The null in the IN-list prevents constraint exclusion to exclude foo
> -- from being scanned in the first place
> explain (costs off) select * from foo where a in (null, 2);
>                  QUERY PLAN
> ---------------------------------------------
>  Seq Scan on foo
>    Filter: (a = ANY ('{NULL,2}'::integer[]))
> (2 rows)
>
> I propose a patch that teaches predtest.c to disregard any null values in
> a SAOP (i.e., the IN (..) expression) when performing constraint exclusion
> using that SAOP, because they cause predicate_refuted_by_recurse()'s logic
> to fail to conclude the refutation.  There is a rule that all items of an
> OR clause (SAOP is treated as one) must refute the predicate.  But the
> OpExpr constructed with null as its constant argument won't refute
> anything and thus will cause the whole OR clause to fail to refute the
> predicate.

A cursory look through constraint exclusion code esp.
operator_predicate_proof() doesn't show any special handling for
strict/non-strict operators. Probably that's why that function refuses
to refute/imply anything when it encounters NULLs.
1593      * We have two identical subexpressions, and two other
subexpressions that
1594      * are not identical but are both Consts; and we have commuted the
1595      * operators if necessary so that the Consts are on the
right.  We'll need
1596      * to compare the Consts' values.  If either is NULL, fail.
1597      */
1598     if (pred_const->constisnull)
1599         return false;
1600     if (clause_const->constisnull)
1601         return false;

>
> -- With the patch
> explain (costs off) select * from foo where a in (null, 2);
>         QUERY PLAN
> --------------------------
>  Result
>    One-Time Filter: false
> (2 rows)
>
> explain (costs off) select * from foo where a in (null, 2, 1);
>                   QUERY PLAN
> -----------------------------------------------
>  Seq Scan on foo
>    Filter: (a = ANY ('{NULL,2,1}'::integer[]))
> (2 rows)
>
> Thoughts?

AFAIU, this doesn't look to be the right way to fix the problem; it
assumes that the operators are strict. Sorry, if I have misunderstood
the patch and your thoughts behind it. May be constraint exclusion
code should be taught to treat strict/non-strict operators separately.
I am not sure.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: constraint exclusion and nulls in IN (..) clause

From

Amit Langote

Date:

01 February 2018, 11:53:31

Thanks for the comments.

On 2018/02/01 16:40, Ashutosh Bapat wrote:
> On Thu, Feb 1, 2018 at 12:26 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> Hi.
>>
>> When addressing a review comment on the fast partition pruning thread [1],
>> I noticed that specifying null in the IN-list will cause constraint
>> exclusion to wrongly fail to refute a table's check predicate.
>>
>> create table foo (a int check (a = 1));
>> insert into foo values (null), (1);
>>
>> -- ExecEvalScalarArrayOp() won't return true for any record in foo
>> select * from foo where a in (null, 2);
>>  a
>> ---
>> (0 rows)
> 
> AFAIU, that's true only when = operator is strict. For a non-strict
> operator which treats two NULL values as equivalent it would return
> row with NULL value.

That's true.

>> -- The null in the IN-list prevents constraint exclusion to exclude foo
>> -- from being scanned in the first place
>> explain (costs off) select * from foo where a in (null, 2);
>>                  QUERY PLAN
>> ---------------------------------------------
>>  Seq Scan on foo
>>    Filter: (a = ANY ('{NULL,2}'::integer[]))
>> (2 rows)
>>
>> I propose a patch that teaches predtest.c to disregard any null values in
>> a SAOP (i.e., the IN (..) expression) when performing constraint exclusion
>> using that SAOP, because they cause predicate_refuted_by_recurse()'s logic
>> to fail to conclude the refutation.  There is a rule that all items of an
>> OR clause (SAOP is treated as one) must refute the predicate.  But the
>> OpExpr constructed with null as its constant argument won't refute
>> anything and thus will cause the whole OR clause to fail to refute the
>> predicate.
> 
> A cursory look through constraint exclusion code esp.
> operator_predicate_proof() doesn't show any special handling for
> strict/non-strict operators. Probably that's why that function refuses
> to refute/imply anything when it encounters NULLs.
> 1593      * We have two identical subexpressions, and two other
> subexpressions that
> 1594      * are not identical but are both Consts; and we have commuted the
> 1595      * operators if necessary so that the Consts are on the
> right.  We'll need
> 1596      * to compare the Consts' values.  If either is NULL, fail.
> 1597      */
> 1598     if (pred_const->constisnull)
> 1599         return false;
> 1600     if (clause_const->constisnull)
> 1601         return false;

There does actually exist strictness check in
predicate_refuted_by_simple_clause(), but comes into play only if the
predicate is a IS NULL clause.  In our case, both predicate and clause
expressions would be operator clauses for which the strictness doesn't
seem to be considered.

>> -- With the patch
>> explain (costs off) select * from foo where a in (null, 2);
>>         QUERY PLAN
>> --------------------------
>>  Result
>>    One-Time Filter: false
>> (2 rows)
>>
>> explain (costs off) select * from foo where a in (null, 2, 1);
>>                   QUERY PLAN
>> -----------------------------------------------
>>  Seq Scan on foo
>>    Filter: (a = ANY ('{NULL,2,1}'::integer[]))
>> (2 rows)
>>
>> Thoughts?
> 
> AFAIU, this doesn't look to be the right way to fix the problem; it
> assumes that the operators are strict. Sorry, if I have misunderstood
> the patch and your thoughts behind it. May be constraint exclusion
> code should be taught to treat strict/non-strict operators separately.
> I am not sure.

Yeah, the patch in its current form is wrong, because it will give wrong
answers if the operator being used in a SAOP is non-strict.  I modified
the patch to consider operator strictness before doing anything with nulls.

Thanks,
Amit

Attachment

v2-0001-Disregard-nulls-in-SAOP-rightarg-array-list-durin.patch

Re: constraint exclusion and nulls in IN (..) clause

From

Ashutosh Bapat

Date:

05 February 2018, 07:20:51

On Thu, Feb 1, 2018 at 2:23 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
>
> Yeah, the patch in its current form is wrong, because it will give wrong
> answers if the operator being used in a SAOP is non-strict.  I modified
> the patch to consider operator strictness before doing anything with nulls.

That's fine, but I am not sure whether this fits constraint
exclusion's charter. Please add this patch to the next commitfest so
that we can have a better opinion.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: constraint exclusion and nulls in IN (..) clause

From

Amit Langote

Date:

05 February 2018, 07:25:00

On 2018/02/05 13:20, Ashutosh Bapat wrote:
> On Thu, Feb 1, 2018 at 2:23 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>
>> Yeah, the patch in its current form is wrong, because it will give wrong
>> answers if the operator being used in a SAOP is non-strict.  I modified
>> the patch to consider operator strictness before doing anything with nulls.
> 
> That's fine, but I am not sure whether this fits constraint
> exclusion's charter. Please add this patch to the next commitfest so
> that we can have a better opinion.

OK, done.

Thanks,
Amit

Re: constraint exclusion and nulls in IN (..) clause

From

Robert Haas

Date:

07 February 2018, 00:10:03

On Sun, Feb 4, 2018 at 11:20 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Thu, Feb 1, 2018 at 2:23 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> Yeah, the patch in its current form is wrong, because it will give wrong
>> answers if the operator being used in a SAOP is non-strict.  I modified
>> the patch to consider operator strictness before doing anything with nulls.
>
> That's fine, but I am not sure whether this fits constraint
> exclusion's charter. Please add this patch to the next commitfest so
> that we can have a better opinion.

It seems like a perfectly reasonable extension of the existing
functionality to me.  (I have, at present, no opinion on the patch
itself.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: constraint exclusion and nulls in IN (..) clause

From

Emre Hasegeli

Date:

04 March 2018, 16:12:49

> Yeah, the patch in its current form is wrong, because it will give wrong
> answers if the operator being used in a SAOP is non-strict.  I modified
> the patch to consider operator strictness before doing anything with nulls.

I tried to review this patch without any familiarity to the code.

arrayconst_next_fn():

> +   /* skip nulls if ok to do so */
> +   if (state->opisstrict)
> +   {
> +       while (state->elem_nulls[state->next_elem])
> +           state->next_elem++;
> +   }

Shouldn't we check if we consumed all elements (state->next_elem >=
state->num_elems) inside the while loop?

arrayexpr_next_fn():

> +   /* skip nulls if ok to do so */
> +   if (state->opisstrict)
> +   {
> +       Node *node = (Node *) lfirst(state->next);
> +
> +       while (IsA(node, Const) && ((Const *) node)->constisnull)
> +           state->next = lnext(state->next);
> +   }

I cannot find a way to test this change.  Can you imagine a query to
exercise it on the regression tests?

Re: constraint exclusion and nulls in IN (..) clause

From

Amit Langote

Date:

05 March 2018, 04:59:37

Hi.

On 2018/03/04 22:12, Emre Hasegeli wrote:
>> Yeah, the patch in its current form is wrong, because it will give wrong
>> answers if the operator being used in a SAOP is non-strict.  I modified
>> the patch to consider operator strictness before doing anything with nulls.
> 
> I tried to review this patch without any familiarity to the code.

Thanks for the review.

> arrayconst_next_fn():
> 
>> +   /* skip nulls if ok to do so */
>> +   if (state->opisstrict)
>> +   {
>> +       while (state->elem_nulls[state->next_elem])
>> +           state->next_elem++;
>> +   }
> 
> Shouldn't we check if we consumed all elements (state->next_elem >=
> state->num_elems) inside the while loop?

You're right.  Fixed.

> arrayexpr_next_fn():
> 
>> +   /* skip nulls if ok to do so */
>> +   if (state->opisstrict)
>> +   {
>> +       Node *node = (Node *) lfirst(state->next);
>> +
>> +       while (IsA(node, Const) && ((Const *) node)->constisnull)
>> +           state->next = lnext(state->next);
>> +   }
> 
> I cannot find a way to test this change.  Can you imagine a query to
> exercise it on the regression tests?

So far, I hadn't either.  I figured one out and added it to inherit.sql.
Basically, I needed to write the query such that an IN () expression
doesn't get const-simplified to a Const containing array, but rather
remains an ArrayExpr.  So, arrayexpr_*() functions in predtest.c are now
exercised.

Attached updated patch.

Thanks,
Amit

Attachment

v3-0001-Disregard-nulls-in-SAOP-rightarg-array-list-durin.patch

Re: constraint exclusion and nulls in IN (..) clause

From

Emre Hasegeli

Date:

05 March 2018, 17:04:01

>> Shouldn't we check if we consumed all elements (state->next_elem >=
>> state->num_elems) inside the while loop?
>
> You're right.  Fixed.

I don't think the fix is correct.  arrayconst_next_fn() can still
execute state->next_elem++ without checking if we consumed all
elements.  I couldn't manage to crash it though.

I am also not sure if it is proper to skip some items inside the
"next_fn", but I don't know the code to suggest a better alternative.

> +   /* skip nulls if ok to do so */
> +   if (state->opisstrict && state->next)

Do we need && state->next in here?  It is checked for (state->next ==
NULL) 2 lines above.

> +   {
> +       Node   *node = (Node *) lfirst(state->next);
> +
> +       while (node && IsA(node, Const) && ((Const *) node)->constisnull)

Should we spell the first check like node != NULL as the rest of the code does.

> +       {
> +           state->next = lnext(state->next);
> +           if (state->next)
> +               node = (Node *) lfirst(state->next);
> +       }
> +   }

I think this function is also not correct as it can call
lnext(state->next) without checking.

> So far, I hadn't either.  I figured one out and added it to inherit.sql.
> Basically, I needed to write the query such that an IN () expression
> doesn't get const-simplified to a Const containing array, but rather
> remains an ArrayExpr.  So, arrayexpr_*() functions in predtest.c are now
> exercised.

Nice.  I noticed that this part of the code was not covered at all by
the tests before.

Re: constraint exclusion and nulls in IN (..) clause

From

Amit Langote

Date:

06 March 2018, 07:43:11

Hi.

Thanks for reviewing again.

On 2018/03/05 23:04, Emre Hasegeli wrote:
>>> Shouldn't we check if we consumed all elements (state->next_elem >=
>>> state->num_elems) inside the while loop?
>>
>> You're right.  Fixed.
> 
> I don't think the fix is correct.  arrayconst_next_fn() can still
> execute state->next_elem++ without checking if we consumed all
> elements.  I couldn't manage to crash it though.

I couldn't get it to crash either, but it's wrong anyway.  What happens is
the calling code would perform constraint exclusion with a garbage clause
(that is one containing a garbage value picked from elem_values when
next_elem has exceeded num_elems) and that may alter the result of
partition pruning.  Verified that with the following example.

create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);

-- before fixing
explain (costs off) select * from p where a in (2, null);
                    QUERY PLAN
---------------------------------------------------
 Append
   ->  Seq Scan on p1
         Filter: (a = ANY ('{2,NULL}'::integer[]))
   ->  Seq Scan on p2
         Filter: (a = ANY ('{2,NULL}'::integer[]))
(5 rows)

So, it looks as if the proposed patch has no effect at all.

-- after fixing
explain (costs off) select * from p where a in (2, null);
                        QUERY PLAN
----------------------------------------------------------
 Append
   ->  Seq Scan on p2
         Filter: (a = ANY ('{2,NULL}'::integer[]))

Was a bit surprised that the calling code didn't crash while handling a
garbage clause.

> I am also not sure if it is proper to skip some items inside the
> "next_fn", but I don't know the code to suggest a better alternative.

Patch teaches it to ignore nulls when it's known that the operator being
used is strict.  It is harmless and has the benefit that constraint
exclusion gives an answer that is consistent with what actually running
such a qual against a table's rows would do.

Consider this example.

create table foo (a int check (a = 1));

insert into foo values (1), (null);

select * from foo where a in (2, null);
 a
---
(0 rows)

set constraint_exclusion to on;

-- you'd think constraint exclusion would avoid the scan
explain select * from foo where a in (2, null);
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on foo  (cost=0.00..41.88 rows=13 width=4)
   Filter: (a = ANY ('{2,NULL}'::integer[]))
(2 rows)

But it didn't, because the null in that list caused constraint exclusion
to mistakenly decide that the clause as a whole doesn't refute the
constraint (check (a = 1)).

-- with the patch
explain select * from foo where a in (2, null);
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=4)
   One-Time Filter: false
(2 rows)

After ignoring null, only (a = 2) is left to consider and that does refute
(a = 1), so pruning works as desired.

BTW, if you compose the qual using an OR directly, it works as desired
even with HEAD:

explain select * from foo where a = 2 or a = null;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=4)
   One-Time Filter: false
(2 rows)

That's because a = null is const-simplified in an earlier planning phase
to false, so (a = 2 or false) reduces to just a = 2, which does refute
foo's constraint (a = 1).  I think the same thing should happen when
constraint exclusion internally turns an IN (..) clause into a set of OR
clauses.  Skipping nulls in these next_fn functions is, in a way, same as
const-simplifying them away.

>> +   /* skip nulls if ok to do so */
>> +   if (state->opisstrict && state->next)
> 
> Do we need && state->next in here?  It is checked for (state->next ==
> NULL) 2 lines above.

Yeah, it wasn't needed.

>> +   {
>> +       Node   *node = (Node *) lfirst(state->next);
>> +
>> +       while (node && IsA(node, Const) && ((Const *) node)->constisnull)
> 
> Should we spell the first check like node != NULL as the rest of the code does.

OK.

>> +       {
>> +           state->next = lnext(state->next);
>> +           if (state->next)
>> +               node = (Node *) lfirst(state->next);
>> +       }
>> +   }
> 
> I think this function is also not correct as it can call
> lnext(state->next) without checking.

Yeah.  Rearranged the code to fix that.

Attached updated patch.  Thanks again.

Regards,
Amit

Attachment

v4-0001-Disregard-nulls-in-SAOP-rightarg-array-list-durin.patch

Re: constraint exclusion and nulls in IN (..) clause

From

Emre Hasegeli

Date:

06 March 2018, 12:46:42

> Patch teaches it to ignore nulls when it's known that the operator being
> used is strict.  It is harmless and has the benefit that constraint
> exclusion gives an answer that is consistent with what actually running
> such a qual against a table's rows would do.

Yes, I understood that.  I just meant that I don't know if it is the
best way to skip NULLs inside "next_fn".  Maybe the caller of the
"next_fn"s should skip them.  Anyway, the committer can judge this
better.

> Yeah.  Rearranged the code to fix that.

This version looks correct to me.

> +           state->next = (state->next != NULL) ? lnext(state->next) : NULL;
> +           node = (state->next != NULL) ? lfirst(state->next) : NULL;

I think it is unnecessary to check for (state->next != NULL) two
times.  We can put those in a single if.

Re: constraint exclusion and nulls in IN (..) clause

From

Amit Langote

Date:

06 March 2018, 13:10:53

On 2018/03/06 18:46, Emre Hasegeli wrote:
>> Patch teaches it to ignore nulls when it's known that the operator being
>> used is strict.  It is harmless and has the benefit that constraint
>> exclusion gives an answer that is consistent with what actually running
>> such a qual against a table's rows would do.
> 
> Yes, I understood that.  I just meant that I don't know if it is the
> best way to skip NULLs inside "next_fn".  Maybe the caller of the
> "next_fn"s should skip them.  Anyway, the committer can judge this
> better.

I think putting that inside those next_fn functions tends to centralize
the logic and would run only only for the intended cases.

>> Yeah.  Rearranged the code to fix that.
> 
> This version looks correct to me.
> 
>> +           state->next = (state->next != NULL) ? lnext(state->next) : NULL;
>> +           node = (state->next != NULL) ? lfirst(state->next) : NULL;
> 
> I think it is unnecessary to check for (state->next != NULL) two
> times.  We can put those in a single if.

Hmm, state->next refers to two different pointer values on line 1 and line
2.  It may end up being set to NULL on line 1.  Am I missing something?

Thanks,
Amit

Re: constraint exclusion and nulls in IN (..) clause

From

Emre Hasegeli

Date:

06 March 2018, 13:16:11

> Hmm, state->next refers to two different pointer values on line 1 and line
> 2.  It may end up being set to NULL on line 1.  Am I missing something?

True, lnext(state->next) can set it to NULL.  I confused by the below
code on the same function doing the steps in reverse order.  With this
cleared, I have nothing else to say, so I am setting this to ready for
committer.

Re: constraint exclusion and nulls in IN (..) clause

From

Amit Langote

Date:

06 March 2018, 13:20:10

On 2018/03/06 19:16, Emre Hasegeli wrote:
>> Hmm, state->next refers to two different pointer values on line 1 and line
>> 2.  It may end up being set to NULL on line 1.  Am I missing something?
> 
> True, lnext(state->next) can set it to NULL.  I confused by the below
> code on the same function doing the steps in reverse order.  With this
> cleared, I have nothing else to say, so I am setting this to ready for
> committer.

Thanks. :)

Regards,
Amit

Re: constraint exclusion and nulls in IN (..) clause

From

Tom Lane

Date:

06 March 2018, 22:37:18

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> [ v4-0001-Disregard-nulls-in-SAOP-rightarg-array-list-durin.patch ]

This patch seems pretty wrong to me.  The proposed proof rule is wrong
for !useOr expressions (that is, "scalar op ALL (array)"); you can't
just ignore null items in that case.  It's also wrong when we need strong
refutation.  I was about to point to the comment for predicate_refuted_by,

 * An important fine point is that truth of the clauses must imply that
 * the predicate returns FALSE, not that it does not return TRUE.  This
 * is normally used to try to refute CHECK constraints, and the only
 * thing we can assume about a CHECK constraint is that it didn't return
 * FALSE --- a NULL result isn't a violation per the SQL spec.  (Someday

and reject the patch as unfixably wrong, when I noticed that in
fact somebody had added support for weak refutation to this code.
Now I'm just desperately unhappy about the lack of comment-updating
work in that commit (b08df9cab, looks like).  It's not acceptable
to falsify comments and not update them.  I have a nasty feeling that
there are outright bugs in the logic, too.

I'm inclined to think that you've attacked the problem at the wrong place
anyway.  The notion that one arm of an OR reducing to NULL might not
prevent proving what we want is by no means specific to ScalarArrayOp.
I think it'd make more sense to see about incorporating that idea in
predicate_implied_by_simple_clause/predicate_refuted_by_simple_clause.
It might be necessary to change the boolean results in the recursive
logic to three-way, not sure.

I'm setting this patch back to Waiting On Author pending fixes
for the above issues.  In the meantime I see a separate work item
here to do code review for b08df9cab.

            regards, tom lane

Re: constraint exclusion and nulls in IN (..) clause

From

Tom Lane

Date:

10 March 2018, 07:40:16

I wrote:
> I think it'd make more sense to see about incorporating that idea in
> predicate_implied_by_simple_clause/predicate_refuted_by_simple_clause.

After further thought, it seems like the place to deal with this is
really operator_predicate_proof(), as in the attached draft patch
against HEAD.  This passes the smell test for me, in the sense that
it's an arguably correct and general extension of the proof rules,
but it could use more testing.

TBH, the change in the existing regression test case in inherit.sql
makes me itch.  We've got

create table list_parted (
    a    varchar
) partition by list (a);
...
create table part_null_xy partition of list_parted for values in (null, 'xy');
...
explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd');

Now, the fact that "null" is included in this query's IN clause is a
complete no-op, because the IN is using a strict equality operator.
So it's nice that the planner can see that and realize that there's
no point in scanning part_null_xy ... but this means that the syntax
that's been chosen for list partitioning is seriously misleading.
"in (null, 'xy')" in the CREATE TABLE command has nothing to do with
the semantics of that identical clause in any other context, and indeed
it seems chosen in a way to confuse even (or especially?) seasoned
experts.

I suppose it's too late to do anything about that now, but it sure
seems like NULL should've been handled some other way.

            regards, tom lane

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index fb963d0..446207d 100644
*** a/src/backend/optimizer/util/predtest.c
--- b/src/backend/optimizer/util/predtest.c
*************** static Node *extract_not_arg(Node *claus
*** 100,106 ****
  static Node *extract_strong_not_arg(Node *clause);
  static bool clause_is_strict_for(Node *clause, Node *subexpr);
  static bool operator_predicate_proof(Expr *predicate, Node *clause,
!                          bool refute_it);
  static bool operator_same_subexprs_proof(Oid pred_op, Oid clause_op,
                               bool refute_it);
  static bool operator_same_subexprs_lookup(Oid pred_op, Oid clause_op,
--- 100,106 ----
  static Node *extract_strong_not_arg(Node *clause);
  static bool clause_is_strict_for(Node *clause, Node *subexpr);
  static bool operator_predicate_proof(Expr *predicate, Node *clause,
!                          bool refute_it, bool weak);
  static bool operator_same_subexprs_proof(Oid pred_op, Oid clause_op,
                               bool refute_it);
  static bool operator_same_subexprs_lookup(Oid pred_op, Oid clause_op,
*************** predicate_implied_by_simple_clause(Expr
*** 1137,1143 ****
      }

      /* Else try operator-related knowledge */
!     return operator_predicate_proof(predicate, clause, false);
  }

  /*----------
--- 1137,1143 ----
      }

      /* Else try operator-related knowledge */
!     return operator_predicate_proof(predicate, clause, false, weak);
  }

  /*----------
*************** predicate_refuted_by_simple_clause(Expr
*** 1232,1238 ****
      }

      /* Else try operator-related knowledge */
!     return operator_predicate_proof(predicate, clause, true);
  }


--- 1232,1238 ----
      }

      /* Else try operator-related knowledge */
!     return operator_predicate_proof(predicate, clause, true, weak);
  }


*************** static const StrategyNumber BT_refute_ta
*** 1498,1506 ****
   * values, since then the operators aren't being given identical inputs.  But
   * we only support that for btree operators, for which we can assume that all
   * non-null inputs result in non-null outputs, so that it doesn't matter which
!  * two non-null constants we consider.  Currently the code below just reports
!  * "proof failed" if either constant is NULL, but in some cases we could be
!  * smarter (and that likely would require checking strong vs. weak proofs).
   *
   * We can make proofs involving several expression forms (here "foo" and "bar"
   * represent subexpressions that are identical according to equal()):
--- 1498,1505 ----
   * values, since then the operators aren't being given identical inputs.  But
   * we only support that for btree operators, for which we can assume that all
   * non-null inputs result in non-null outputs, so that it doesn't matter which
!  * two non-null constants we consider.  If either constant is NULL, we have
!  * to think harder, but sometimes the proof still works, as explained below.
   *
   * We can make proofs involving several expression forms (here "foo" and "bar"
   * represent subexpressions that are identical according to equal()):
*************** static const StrategyNumber BT_refute_ta
*** 1528,1534 ****
   * and we dare not make deductions with those.
   */
  static bool
! operator_predicate_proof(Expr *predicate, Node *clause, bool refute_it)
  {
      OpExpr       *pred_opexpr,
                 *clause_opexpr;
--- 1527,1534 ----
   * and we dare not make deductions with those.
   */
  static bool
! operator_predicate_proof(Expr *predicate, Node *clause,
!                          bool refute_it, bool weak)
  {
      OpExpr       *pred_opexpr,
                 *clause_opexpr;
*************** operator_predicate_proof(Expr *predicate
*** 1675,1691 ****
       * We have two identical subexpressions, and two other subexpressions that
       * are not identical but are both Consts; and we have commuted the
       * operators if necessary so that the Consts are on the right.  We'll need
!      * to compare the Consts' values.  If either is NULL, fail.
!      *
!      * Future work: in some cases the desired proof might hold even with NULL
!      * constants.  But beware that we've not yet identified the operators as
!      * btree ops, so for instance it'd be quite unsafe to assume they are
!      * strict without checking.
       */
-     if (pred_const->constisnull)
-         return false;
      if (clause_const->constisnull)
          return false;

      /*
       * Lookup the constant-comparison operator using the system catalogs and
--- 1675,1720 ----
       * We have two identical subexpressions, and two other subexpressions that
       * are not identical but are both Consts; and we have commuted the
       * operators if necessary so that the Consts are on the right.  We'll need
!      * to compare the Consts' values.  If either is NULL, we can't do that, so
!      * usually the proof fails ... but in some cases we can claim success.
       */
      if (clause_const->constisnull)
+     {
+         /* If clause_op isn't strict, we can't prove anything */
+         if (!op_strict(clause_op))
+             return false;
+
+         /*
+          * At this point we know that the clause returns NULL.  For proof
+          * types that assume truth of the clause, this means the proof is
+          * vacuously true (a/k/a "false implies anything").  That's all proof
+          * types except weak implication.
+          */
+         if (!(weak && !refute_it))
+             return true;
+
+         /*
+          * For weak implication, it's still possible for the proof to succeed,
+          * if the predicate can also be proven NULL.  In that case we've got
+          * NULL => NULL which is valid for this proof type.
+          */
+         if (pred_const->constisnull && op_strict(pred_op))
+             return true;
+         /* Else the proof fails */
+         return false;
+     }
+     if (pred_const->constisnull)
+     {
+         /*
+          * If the pred_op is strict, we know the predicate yields NULL, which
+          * means the proof succeeds for either weak implication or weak
+          * refutation.
+          */
+         if (weak && op_strict(pred_op))
+             return true;
+         /* Else the proof fails */
          return false;
+     }

      /*
       * Lookup the constant-comparison operator using the system catalogs and
diff --git a/src/test/modules/test_predtest/expected/test_predtest.out
b/src/test/modules/test_predtest/expected/test_predtest.out
index 9dd8aec..5574e03 100644
*** a/src/test/modules/test_predtest/expected/test_predtest.out
--- b/src/test/modules/test_predtest/expected/test_predtest.out
*************** w_i_holds         | f
*** 781,793 ****
  s_r_holds         | f
  w_r_holds         | f

- -- XXX ideally, we could prove this case too, for strong implication
  select * from test_predtest($$
  select x <= 5, x in (1,3,5,null)
  from integers
  $$);
  -[ RECORD 1 ]-----+--
! strong_implied_by | f
  weak_implied_by   | f
  strong_refuted_by | f
  weak_refuted_by   | f
--- 781,792 ----
  s_r_holds         | f
  w_r_holds         | f

  select * from test_predtest($$
  select x <= 5, x in (1,3,5,null)
  from integers
  $$);
  -[ RECORD 1 ]-----+--
! strong_implied_by | t
  weak_implied_by   | f
  strong_refuted_by | f
  weak_refuted_by   | f
diff --git a/src/test/modules/test_predtest/sql/test_predtest.sql
b/src/test/modules/test_predtest/sql/test_predtest.sql
index 298b8bf..2734735 100644
*** a/src/test/modules/test_predtest/sql/test_predtest.sql
--- b/src/test/modules/test_predtest/sql/test_predtest.sql
*************** select x <= 5, x in (1,3,5,7)
*** 306,312 ****
  from integers
  $$);

- -- XXX ideally, we could prove this case too, for strong implication
  select * from test_predtest($$
  select x <= 5, x in (1,3,5,null)
  from integers
--- 306,311 ----
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891..d56e5d2 100644
*** a/src/test/regress/expected/inherit.out
--- b/src/test/regress/expected/inherit.out
*************** explain (costs off) select * from list_p
*** 1715,1725 ****
   Append
     ->  Seq Scan on part_ab_cd
           Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
!    ->  Seq Scan on part_ef_gh
!          Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
!    ->  Seq Scan on part_null_xy
!          Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
! (7 rows)

  explain (costs off) select * from list_parted where a = 'ab';
                  QUERY PLAN
--- 1715,1721 ----
   Append
     ->  Seq Scan on part_ab_cd
           Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
! (3 rows)

  explain (costs off) select * from list_parted where a = 'ab';
                  QUERY PLAN

Re: constraint exclusion and nulls in IN (..) clause

From

Amit Langote

Date:

14 March 2018, 11:16:38

On 2018/03/10 13:40, Tom Lane wrote:
> I wrote:
>> I think it'd make more sense to see about incorporating that idea in
>> predicate_implied_by_simple_clause/predicate_refuted_by_simple_clause.
> 
> After further thought, it seems like the place to deal with this is
> really operator_predicate_proof(), as in the attached draft patch
> against HEAD.  This passes the smell test for me, in the sense that
> it's an arguably correct and general extension of the proof rules,
> but it could use more testing.

Thanks for the patch.  I agree it handles the case I presented my patch
for in a more principled manner.  So, I've marked the CF entry for my
patch as Rejected.


Looking at the patch, I guess it'd be right to think that the case for
which the following block of code will be executed only arises for
OpExpr's generated by predtest.c (that is, when iterating an SAOP):

     if (clause_const->constisnull)
+    {
+        /* If clause_op isn't strict, we can't prove anything */
+        if (!op_strict(clause_op))
+            return false;
+
<snip>

That's because any other expr = null would've been reduced to
constant-false at some earlier point.

Then, in the same block I see that there is:

+        /*
+         * For weak implication, it's still possible for the proof to
succeed,
+         * if the predicate can also be proven NULL.  In that case we've got
+         * NULL => NULL which is valid for this proof type.
+         */
+        if (pred_const->constisnull && op_strict(pred_op))
+            return true;

which, afaics, will never be able to return true, because
pred_const->constisnull will never be true here.  That's because the
following code that will run before the above code will determine the
result of operator_predicate_proof() in that case:

    if (equal(pred_leftop, clause_leftop))
    {
        if (equal(pred_rightop, clause_rightop))
        {
            /* We have x op1 y and x op2 y */
            return operator_same_subexprs_proof(pred_op, clause_op,
refute_it);

I wonder if the result would be valid in that case, because the proof
cache should only be looked up for non-null sub-expressions, no?

> TBH, the change in the existing regression test case in inherit.sql
> makes me itch.  We've got
> 
> create table list_parted (
>     a    varchar
> ) partition by list (a);
> ...
> create table part_null_xy partition of list_parted for values in (null, 'xy');
> ...
> explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd');
> 
> Now, the fact that "null" is included in this query's IN clause is a
> complete no-op, because the IN is using a strict equality operator.
> So it's nice that the planner can see that and realize that there's
> no point in scanning part_null_xy ... but this means that the syntax
> that's been chosen for list partitioning is seriously misleading.
> "in (null, 'xy')" in the CREATE TABLE command has nothing to do with
> the semantics of that identical clause in any other context, and indeed
> it seems chosen in a way to confuse even (or especially?) seasoned
> experts.

Hmm, yeah, it is confusing.  Actually, what it really internally becomes
is the following expression:

  (a IS NULL) OR (a = 'xy')

or if the CREATE TABLE had "in (null, 'xy', 'yz'), the expression will become:

  (a IS NULL) OR (a = ANY (ARRAY['xy'::text, 'yz'::text]))

which, as you say, is not the same thing as how the original expression is
interpreted elsewhere.

> I suppose it's too late to do anything about that now, but it sure
> seems like NULL should've been handled some other way.

Maybe, write something in the documentation about that?

Thanks,
Amit

Re: constraint exclusion and nulls in IN (..) clause

From

Amit Langote

Date:

14 March 2018, 11:21:57

On 2018/03/14 17:16, Amit Langote wrote:
> On 2018/03/10 13:40, Tom Lane wrote:
>> I wrote:
>>> I think it'd make more sense to see about incorporating that idea in
>>> predicate_implied_by_simple_clause/predicate_refuted_by_simple_clause.
>>
>> After further thought, it seems like the place to deal with this is
>> really operator_predicate_proof(), as in the attached draft patch
>> against HEAD.  This passes the smell test for me, in the sense that
>> it's an arguably correct and general extension of the proof rules,
>> but it could use more testing.
> 
> Thanks for the patch.  I agree it handles the case I presented my patch
> for in a more principled manner.  So, I've marked the CF entry for my
> patch as Rejected.

Oops, sorry I hadn't actually seen the CF entry before hitting send on
this email.  Seeing that Tom intends to attach his patch with this CF
entry, I will leave the entry alone for now.

Thanks,
Amit

Re: constraint exclusion and nulls in IN (..) clause

From

Tom Lane

Date:

20 March 2018, 21:44:26

I wrote:
> After further thought, it seems like the place to deal with this is
> really operator_predicate_proof(), as in the attached draft patch
> against HEAD.  This passes the smell test for me, in the sense that
> it's an arguably correct and general extension of the proof rules,
> but it could use more testing.

Was anyone planning to do more work or testing on this?  Or should
I just push it so we can close the CF entry?

            regards, tom lane

Re: constraint exclusion and nulls in IN (..) clause

From

Emre Hasegeli

Date:

21 March 2018, 13:22:03

> After further thought, it seems like the place to deal with this is
> really operator_predicate_proof(), as in the attached draft patch
> against HEAD.  This passes the smell test for me, in the sense that
> it's an arguably correct and general extension of the proof rules,
> but it could use more testing.

I am not sure if we are covering the case when clause_const and
pred_const are both NULL.  In this case, we should be able to return
true only by checking op_strict(pred_op) or maybe even without
checking that.  Am I mistaken?

Re: constraint exclusion and nulls in IN (..) clause

From

Tom Lane

Date:

21 March 2018, 17:00:41

Emre Hasegeli <emre@hasegeli.com> writes:
> I am not sure if we are covering the case when clause_const and
> pred_const are both NULL.  In this case, we should be able to return
> true only by checking op_strict(pred_op) or maybe even without
> checking that.  Am I mistaken?

Yeah, that's there.  We need both operators to be strict, I think;
otherwise we can't really assume anything about what they'd return
for NULL inputs.  But if they are, we have NULL => NULL which is
valid for all proof cases.

            regards, tom lane

Re: constraint exclusion and nulls in IN (..) clause

From

Emre Hasegeli

Date:

21 March 2018, 17:24:38

Yeah, that's there. We need both operators to be strict, I think;
otherwise we can't really assume anything about what they'd return
for NULL inputs. But if they are, we have NULL => NULL which is
valid for all proof cases.

I understand. I don’t see any problems in this case.

Re: constraint exclusion and nulls in IN (..) clause

From

Amit Langote

Date:

22 March 2018, 13:22:56

On 2018/03/21 23:00, Tom Lane wrote:
> Emre Hasegeli <emre@hasegeli.com> writes:
>> I am not sure if we are covering the case when clause_const and
>> pred_const are both NULL.  In this case, we should be able to return
>> true only by checking op_strict(pred_op) or maybe even without
>> checking that.  Am I mistaken?
> 
> Yeah, that's there.  We need both operators to be strict, I think;
> otherwise we can't really assume anything about what they'd return
> for NULL inputs.  But if they are, we have NULL => NULL which is
> valid for all proof cases.

Thank you for closing the CF entry.

I too wasn't sure if the patch's modifications to
operator_predicate_proof() led to correct handling for the case where both
clause_const and pred_const are both NULL consts.  ISTM that the result in
that case becomes what operator_same_subexprs_proof() would return for the
pred_op (possibly commuted) and clause_op pair.  But maybe we don't end up
in that case much.

Regards,
Amit

Re: constraint exclusion and nulls in IN (..) clause

From

Tom Lane

Date:

22 March 2018, 16:34:36

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> I too wasn't sure if the patch's modifications to
> operator_predicate_proof() led to correct handling for the case where both
> clause_const and pred_const are both NULL consts.  ISTM that the result in
> that case becomes what operator_same_subexprs_proof() would return for the
> pred_op (possibly commuted) and clause_op pair.  But maybe we don't end up
> in that case much.

Probably not.  It's hard to get to this type of case at all, because in
most cases, if we have a null constant as a subexpression, previous
const-simplification would have replaced the whole expression with null.
I think in practice it's only interesting when the upper levels of
predtest.c have constructed an operator expression out of parts of the
given clauses, which is exactly what happens for cases like IN-lists.

            regards, tom lane