Thread: Evaluate expression at planning time for two more cases

Evaluate expression at planning time for two more cases

From
Surafel Temesgen
Date:
Hi,

In good written query IS NULL and IS NOT NULL check on primary and non null constraints columns should not happen but if it is mentioned PostgreSQL have to be smart enough for not checking every return result about null value on primary key column. Instead it can be evaluate its truth value and set the result only once. The attached patch evaluate and set the truth value for null and not null check on primary column on planning time if the relation attribute is not mention on nullable side of outer join.

Thought?

regards

Surafel

Attachment

Re: Evaluate expression at planning time for two more cases

From
Ashutosh Bapat
Date:
Hi Surafel,

On Thu, Aug 27, 2020 at 6:01 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
>
> Hi,
>
> In good written query IS NULL and IS NOT NULL check on primary and non null constraints columns should not happen but
ifit is mentioned PostgreSQL have to be smart enough for not checking every return result about null value on primary
keycolumn. Instead it can be evaluate its truth value and set the result only once. The attached patch evaluate and set
thetruth value for null and not null check on primary column on planning time if the relation attribute is not mention
onnullable side of outer join. 
>
> Thought?

Thanks for the patch. Such SQL may arise from not-so-smart SQL
generation tools. It will be useful to have this optimization. Here
are some comments on your patch.

                     }
                     else
                         has_nonconst_input = true;
@@ -3382,7 +3395,47 @@ eval_const_expressions_mutator(Node *node,

+
+                    if (pkattnos != NULL &&
bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
pkattnos)
+                        && !check_null_side(context->root, relid))

Since this is working on parse->rtable this will work only for top level tables
as against the inherited tables or partitions which may have their own primary
key constraints if the parent doesn't have those.

This better be done when planning individual relations, plain or join or upper,
where all the necessary information is already available with each of the
relations and also the quals, derived as well as user specified, are
distributed to individual relations where they should be evalutated. My memory
is hazy but it might be possible do this while distributing the quals
themselves (distribute_qual_to_rels()).

Said that, to me, this looks more like something we should be able to do at the
time of constraint exclusion. But IIRC, we just prove whether constraints
refute a qual and not necessarily whether constraints imply a qual, making it
redundant, as is required here. E.g. primary key constraint implies key NOT
NULL rendering a "key IS NOT NULL" qual redundant. It might be better to test
the case when col IS NOT NULL is specified on a column which already has a NOT
NULL constraint. That may be another direction to take. We may require much
lesser code.

With either of these two approaches, the amount of code changes might
be justified.

+explain (costs off)
+SELECT * FROM b RIGHT JOIN a ON (b.a_id = a.id) WHERE (a.id IS NULL
OR a.id > 0);
+                  QUERY PLAN
+-----------------------------------------------
+ Hash Right Join
+   Hash Cond: (b.a_id = a.id)
+   ->  Seq Scan on b
+   ->  Hash
+         ->  Bitmap Heap Scan on a
+               Recheck Cond: (id > 0)
+               ->  Bitmap Index Scan on a_pkey
+                     Index Cond: (id > 0)
+(8 rows)

Thanks for the tests.

Please add the patch to the next commitfest https://commitfest.postgresql.org/.

--
Best Wishes,
Ashutosh Bapat



Re: Evaluate expression at planning time for two more cases

From
Surafel Temesgen
Date:
Hi ,

Thank you for looking into this

On Fri, Aug 28, 2020 at 9:48 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
                     }
                     else
                         has_nonconst_input = true;
@@ -3382,7 +3395,47 @@ eval_const_expressions_mutator(Node *node,

+
+                    if (pkattnos != NULL &&
bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
pkattnos)
+                        && !check_null_side(context->root, relid))

Since this is working on parse->rtable this will work only for top level tables
as against the inherited tables or partitions which may have their own primary
key constraints if the parent doesn't have those.



In that case the table have to be specified in from clause otherwise its error

e.g postgres=# CREATE TABLE cities (

name text,

population float,

altitude int

);

CREATE TABLE

postgres=# CREATE TABLE capitals (

id serial primary key,

state char(2)

) INHERITS (cities);

CREATE TABLE

postgres=# EXPLAIN SELECT * FROM cities WHERE id is not null;

ERROR: column "id" does not exist

LINE 1: EXPLAIN SELECT * FROM cities WHERE id is not null;


Even it will not work on the child table because the primary key constraint on the parent table is not in-force in the child table.


 
This better be done when planning individual relations, plain or join or upper,
where all the necessary information is already available with each of the
relations and also the quals, derived as well as user specified, are
distributed to individual relations where they should be evalutated. My memory
is hazy but it might be possible do this while distributing the quals
themselves (distribute_qual_to_rels()).


The place where all the necessary information available is on reduce_outer_joins as the comment of the function states but the downside is its will only be inexpensive if the query contains outer join

 
Said that, to me, this looks more like something we should be able to do at the
time of constraint exclusion. But IIRC, we just prove whether constraints
refute a qual and not necessarily whether constraints imply a qual, making it
redundant, as is required here. E.g. primary key constraint implies key NOT
NULL rendering a "key IS NOT NULL" qual redundant. It might be better to test
the case when col IS NOT NULL is specified on a column which already has a NOT
NULL constraint. That may be another direction to take. We may require much
lesser code.


I don’t add NOT NULL constraint optimization to the patch because cached plan is not invalidation in case of a change in NOT NULL constraint

 
Please add the patch to the next commitfest https://commitfest.postgresql.org/.

 
Thank you

regards
Surafel

Re: Evaluate expression at planning time for two more cases

From
Tom Lane
Date:
Surafel Temesgen <surafel3000@gmail.com> writes:
> [ null_check_on_pkey_optimization_v1.patch ]

I took a very brief look at this.

> I don’t add NOT NULL constraint optimization to the patch because cached
> plan is not invalidation in case of a change in NOT NULL constraint

That's actually not a problem, even though some people (including me)
have bandied about such suppositions in the past.  Relying on attnotnull
in the planner is perfectly safe [1].  Plus it'd likely be cheaper as
well as more general than looking up pkey information.  If we did need
to explicitly record the plan's dependency on a constraint, this patch
would be wrong anyhow because it fails to make any such notation about
the pkey constraint it relied on.

The "check_null_side" code you're proposing seems really horrid.
For one thing, it seems quite out of place for eval_const_expressions
to be doing that.  For another, it's wrong in principle because
eval_const_expressions doesn't know which part of the query tree
it's being invoked on, so it cannot know whether outer-join
nullability is an issue.  For another, doing that work over again
from scratch every time we see a potentially optimizable NullTest
looks expensive.  (I wonder whether you have tried to measure the
performance penalty imposed by this patch in cases where it fails
to make any proof.)

I've been doing some handwaving about changing the representation
of Vars, with an eye to making it clear by inspection whether a
given Var is nullable by some lower outer join [2].  If that work
ever comes to fruition then the need for "check_null_side" would
go away.  So maybe we should put this idea on the back burner
until that happens.

I'm not sure what I think about Ashutosh's ideas about doing this
somewhere else than eval_const_expressions.  I do not buy the argument
that it's interesting to do this separately for each child partition.
Child partitions that have attnotnull constraints different from their
parent's are at best a tiny minority use-case, if indeed we allow them
at all (I tend to think we shouldn't).  On the other hand it's possible
that postponing the check would allow bypassing the outer-join problem,
ie if we only do it for quals that have dropped down to the relation
scan level then we don't need to worry about outer join effects.

Another angle here is that eval_const_expressions runs before
reduce_outer_joins, meaning that if it's doing things that depend
on outer-join-ness then it will sometimes fail to optimize cases
that could be optimized.  As a not incidental example, consider

    select ... from t1 left join t2 on (...) where t2.x is not null;

reduce_outer_joins will realize that the left join can be reduced
to a plain join, whereupon (if t2.x is attnotnull) the WHERE clause
really is constant-true --- and this seems like a poster-child case
for it being useful to optimize away the WHERE clause.  But
we won't be able to detect that if we apply the optimization during
eval_const_expressions.  So maybe that's a good reason to do it
somewhere later.

            regards, tom lane

[1] https://www.postgresql.org/message-id/23564.1585885251%40sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/flat/15848.1576515643%40sss.pgh.pa.us



Re: Evaluate expression at planning time for two more cases

From
Surafel Temesgen
Date:
Hi Tom

On Tue, Sep 8, 2020 at 4:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Surafel Temesgen <surafel3000@gmail.com> writes:
> [ null_check_on_pkey_optimization_v1.patch ]

I took a very brief look at this.

> I don’t add NOT NULL constraint optimization to the patch because cached
> plan is not invalidation in case of a change in NOT NULL constraint

That's actually not a problem, even though some people (including me)
have bandied about such suppositions in the past.  Relying on attnotnull
in the planner is perfectly safe [1].  Plus it'd likely be cheaper as
well as more general than looking up pkey information.  If we did need
to explicitly record the plan's dependency on a constraint, this patch
would be wrong anyhow because it fails to make any such notation about
the pkey constraint it relied on.


ok thank you. I will change my next patch accordingly

 
The "check_null_side" code you're proposing seems really horrid.
For one thing, it seems quite out of place for eval_const_expressions
to be doing that.  For another, it's wrong in principle because
eval_const_expressions doesn't know which part of the query tree
it's being invoked on, so it cannot know whether outer-join
nullability is an issue.  For another, doing that work over again
from scratch every time we see a potentially optimizable NullTest
looks expensive.  (I wonder whether you have tried to measure the
performance penalty imposed by this patch in cases where it fails
to make any proof.)


I was thinking about collecting data about joins only once at the start of eval_const_expressions but I assume most queries don't have NULL check expressions and postpone it until we find one. Thinking about it again I think it can be done better by storing check_null_side_state into eval_const_expressions_context to use it for subsequent evaluation.  


I'm not sure what I think about Ashutosh's ideas about doing this
somewhere else than eval_const_expressions.  I do not buy the argument
that it's interesting to do this separately for each child partition.
Child partitions that have attnotnull constraints different from their
parent's are at best a tiny minority use-case, if indeed we allow them
at all (I tend to think we shouldn't).  On the other hand it's possible
that postponing the check would allow bypassing the outer-join problem,
ie if we only do it for quals that have dropped down to the relation
scan level then we don't need to worry about outer join effects.


At eval_const_expressions we check every expression and optimize it if possible. Introducing other check and optimization mechanism to same other place just for this optimization seems expensive with respect to performance penalty to me

 
Another angle here is that eval_const_expressions runs before
reduce_outer_joins, meaning that if it's doing things that depend
on outer-join-ness then it will sometimes fail to optimize cases
that could be optimized.  As a not incidental example, consider

        select ... from t1 left join t2 on (...) where t2.x is not null;

reduce_outer_joins will realize that the left join can be reduced
to a plain join, whereupon (if t2.x is attnotnull) the WHERE clause
really is constant-true --- and this seems like a poster-child case
for it being useful to optimize away the WHERE clause.  But
we won't be able to detect that if we apply the optimization during
eval_const_expressions.  So maybe that's a good reason to do it
somewhere later.

In this case the expression not changed to constant-true because the relation is on nullable side of outer join 

regards
Surafel

Re: Evaluate expression at planning time for two more cases

From
Ashutosh Bapat
Date:


On Tue, 8 Sep 2020 at 07:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:


I'm not sure what I think about Ashutosh's ideas about doing this
somewhere else than eval_const_expressions.  I do not buy the argument
that it's interesting to do this separately for each child partition.
Child partitions that have attnotnull constraints different from their
parent's are at best a tiny minority use-case, if indeed we allow them
at all (I tend to think we shouldn't). 

I agree about partitions. But, IMO, a child having constraints different from that of a parent is more common in inheritance trees.

Another point I raised in my mail was about constraint exclusion. Why aren't these clauses constant-folded by constraint exclusion? Sorry, I haven't looked at the constraint exclusion code myself for this.

As a not incidental example, consider

        select ... from t1 left join t2 on (...) where t2.x is not null;

reduce_outer_joins will realize that the left join can be reduced
to a plain join, whereupon (if t2.x is attnotnull) the WHERE clause
really is constant-true --- and this seems like a poster-child case
for it being useful to optimize away the WHERE clause.  But
we won't be able to detect that if we apply the optimization during
eval_const_expressions.  So maybe that's a good reason to do it
somewhere later.

+1 

--
Best Wishes,
Ashutosh

Re: Evaluate expression at planning time for two more cases

From
Surafel Temesgen
Date:


On Tue, Sep 8, 2020 at 12:59 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
Hi Tom

On Tue, Sep 8, 2020 at 4:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
 
The "check_null_side" code you're proposing seems really horrid.
For one thing, it seems quite out of place for eval_const_expressions
to be doing that.  For another, it's wrong in principle because
eval_const_expressions doesn't know which part of the query tree
it's being invoked on, so it cannot know whether outer-join
nullability is an issue.  For another, doing that work over again
from scratch every time we see a potentially optimizable NullTest
looks expensive.  (I wonder whether you have tried to measure the
performance penalty imposed by this patch in cases where it fails
to make any proof.)


I was thinking about collecting data about joins only once at the start of eval_const_expressions but I assume most queries don't have NULL check expressions and postpone it until we find one. Thinking about it again I think it can be done better by storing check_null_side_state into eval_const_expressions_context to use it for subsequent evaluation.  


Attached patch does like the above and includes NOT NULL constraint column.

regards

Surafel  

Attachment

Re: Evaluate expression at planning time for two more cases

From
Pavel Borisov
Date:
Thank you for working on this!
I got slightly into this patch. I can be wrong, but my opinion is that planner/optimizer-related changes are not
withoutdangers generally.  So anyway, they should be justified by performance increase, or the previous behavior should
beconsidered totally wrong. Patching the thing which is just a little sub-optimal seems for me seems not necessary.
 

So it would be very good to see measurements of a performance gain from this patch. And also I think tests with
partitionedand inherited relations for demonstration of the right work in the cases discussed in the thread should be a
must-dofor this patch.
 

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

The new status of this patch is: Waiting on Author

Re: Evaluate expression at planning time for two more cases

From
Surafel Temesgen
Date:
Hi Pavel Borisov,
It's always good to select the optimal way even if it didn't have performance gain
but in case of this patch i see 4x speed up on my laptop and it will work on any
table that have NULL constraint

regards
Surafel

Re: Evaluate expression at planning time for two more cases

From
David Rowley
Date:
On Tue, 8 Sept 2020 at 13:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I've been doing some handwaving about changing the representation
> of Vars, with an eye to making it clear by inspection whether a
> given Var is nullable by some lower outer join [2].  If that work
> ever comes to fruition then the need for "check_null_side" would
> go away.  So maybe we should put this idea on the back burner
> until that happens.

I looked at this patch too. I agree that we should delay adding any
new smarts in regards to NULL or NOT NULL until we have some more
robust infrastructure to make this sort of patch easier and cheaper.

My vote is to just return this patch with feedback.  Maybe Surafel
will be interested in pursuing this later when we have better
infrastructure or perhaps helping review the patch you're talking
about.

David



Re: Evaluate expression at planning time for two more cases

From
David Rowley
Date:
On Tue, 9 Mar 2021 at 05:13, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
> It was a minor change therefore I rebased the patch, please take a look.

I only had a quick look at the v3 patch.

+ rel = table_open(rte->relid, NoLock);
+ att = TupleDescAttr(rel->rd_att, var->varattno - 1);

+ if (att->attnotnull && !check_null_side(context->root, relid, context))

This is not really an acceptable way to determine the notnull
attribute value. Andy Fan proposes a much better way in [1].
RelOptInfo is meant to cache the required Relation data that we need
during query planning. IIRC, Andy's patch correctly uses this and does
so in an efficient way.

In any case, as you can see there's a bit of other work going on to
smarten up the planner around NULL value detection. The UniqueKeys
patch requires this and various other things have come up that really
should be solved.

Surafel, I'd suggest we return this patch with feedback and maybe you
could instead help reviewing the other patches in regards to the NOT
NULL tracking and maybe come back to this once the dust has settled
and everyone is clear on how we determine if a column is NULL or not.

Let me know your thoughts.

David

[1] https://www.postgresql.org/message-id/CAKU4AWpQjAqJwQ2X-aR9g3+ZHRzU1k8hNP7A+_mLuOv-n5aVKA@mail.gmail.com



Re: Evaluate expression at planning time for two more cases

From
Jaime Casanova
Date:
On Tue, Jul 06, 2021 at 01:09:56PM +1200, David Rowley wrote:
> On Tue, 9 Mar 2021 at 05:13, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
> > It was a minor change therefore I rebased the patch, please take a look.
> 
[...]
> 
> This is not really an acceptable way to determine the notnull
> attribute value. Andy Fan proposes a much better way in [1].
> RelOptInfo is meant to cache the required Relation data that we need
> during query planning. IIRC, Andy's patch correctly uses this and does
> so in an efficient way.
> 
> In any case, as you can see there's a bit of other work going on to
> smarten up the planner around NULL value detection. The UniqueKeys
> patch requires this and various other things have come up that really
> should be solved.
> 
> Surafel, I'd suggest we return this patch with feedback and maybe you
> could instead help reviewing the other patches in regards to the NOT
> NULL tracking and maybe come back to this once the dust has settled
> and everyone is clear on how we determine if a column is NULL or not.
> 
> Let me know your thoughts.
> 

Hi Surafel, 

We haven't seen an answer from you on this.
I'm marking the patch as "Returned with feedback" as was suggested.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL