Thread: Evaluate expression at planning time for two more cases
In good written query IS NULL and IS NOT NULL check on primary and non null constraints columns should not happen but if it is mentioned PostgreSQL have to be smart enough for not checking every return result about null value on primary key column. Instead it can be evaluate its truth value and set the result only once. The attached patch evaluate and set the truth value for null and not null check on primary column on planning time if the relation attribute is not mention on nullable side of outer join.
Thought?
regards
Surafel
Attachment
Hi Surafel, On Thu, Aug 27, 2020 at 6:01 PM Surafel Temesgen <surafel3000@gmail.com> wrote: > > Hi, > > In good written query IS NULL and IS NOT NULL check on primary and non null constraints columns should not happen but ifit is mentioned PostgreSQL have to be smart enough for not checking every return result about null value on primary keycolumn. Instead it can be evaluate its truth value and set the result only once. The attached patch evaluate and set thetruth value for null and not null check on primary column on planning time if the relation attribute is not mention onnullable side of outer join. > > Thought? Thanks for the patch. Such SQL may arise from not-so-smart SQL generation tools. It will be useful to have this optimization. Here are some comments on your patch. } else has_nonconst_input = true; @@ -3382,7 +3395,47 @@ eval_const_expressions_mutator(Node *node, + + if (pkattnos != NULL && bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber, pkattnos) + && !check_null_side(context->root, relid)) Since this is working on parse->rtable this will work only for top level tables as against the inherited tables or partitions which may have their own primary key constraints if the parent doesn't have those. This better be done when planning individual relations, plain or join or upper, where all the necessary information is already available with each of the relations and also the quals, derived as well as user specified, are distributed to individual relations where they should be evalutated. My memory is hazy but it might be possible do this while distributing the quals themselves (distribute_qual_to_rels()). Said that, to me, this looks more like something we should be able to do at the time of constraint exclusion. But IIRC, we just prove whether constraints refute a qual and not necessarily whether constraints imply a qual, making it redundant, as is required here. E.g. primary key constraint implies key NOT NULL rendering a "key IS NOT NULL" qual redundant. It might be better to test the case when col IS NOT NULL is specified on a column which already has a NOT NULL constraint. That may be another direction to take. We may require much lesser code. With either of these two approaches, the amount of code changes might be justified. +explain (costs off) +SELECT * FROM b RIGHT JOIN a ON (b.a_id = a.id) WHERE (a.id IS NULL OR a.id > 0); + QUERY PLAN +----------------------------------------------- + Hash Right Join + Hash Cond: (b.a_id = a.id) + -> Seq Scan on b + -> Hash + -> Bitmap Heap Scan on a + Recheck Cond: (id > 0) + -> Bitmap Index Scan on a_pkey + Index Cond: (id > 0) +(8 rows) Thanks for the tests. Please add the patch to the next commitfest https://commitfest.postgresql.org/. -- Best Wishes, Ashutosh Bapat
}
else
has_nonconst_input = true;
@@ -3382,7 +3395,47 @@ eval_const_expressions_mutator(Node *node,
+
+ if (pkattnos != NULL &&
bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
pkattnos)
+ && !check_null_side(context->root, relid))
Since this is working on parse->rtable this will work only for top level tables
as against the inherited tables or partitions which may have their own primary
key constraints if the parent doesn't have those.
In that case the table have to be specified in from clause otherwise its error
e.g postgres=# CREATE TABLE cities (
name text,
population float,
altitude int
);
CREATE TABLE
postgres=# CREATE TABLE capitals (
id serial primary key,
state char(2)
) INHERITS (cities);
CREATE TABLE
postgres=# EXPLAIN SELECT * FROM cities WHERE id is not null;
ERROR: column "id" does not exist
LINE 1: EXPLAIN SELECT * FROM cities WHERE id is not null;
Even it will not work on the child table because the primary key constraint on the parent table is not in-force in the child table.
This better be done when planning individual relations, plain or join or upper,
where all the necessary information is already available with each of the
relations and also the quals, derived as well as user specified, are
distributed to individual relations where they should be evalutated. My memory
is hazy but it might be possible do this while distributing the quals
themselves (distribute_qual_to_rels()).
The place where all the necessary information available is on reduce_outer_joins as the comment of the function states but the downside is its will only be inexpensive if the query contains outer join
Said that, to me, this looks more like something we should be able to do at the
time of constraint exclusion. But IIRC, we just prove whether constraints
refute a qual and not necessarily whether constraints imply a qual, making it
redundant, as is required here. E.g. primary key constraint implies key NOT
NULL rendering a "key IS NOT NULL" qual redundant. It might be better to test
the case when col IS NOT NULL is specified on a column which already has a NOT
NULL constraint. That may be another direction to take. We may require much
lesser code.
I don’t add NOT NULL constraint optimization to the patch because cached plan is not invalidation in case of a change in NOT NULL constraint
Please add the patch to the next commitfest https://commitfest.postgresql.org/.
Surafel Temesgen <surafel3000@gmail.com> writes: > [ null_check_on_pkey_optimization_v1.patch ] I took a very brief look at this. > I don’t add NOT NULL constraint optimization to the patch because cached > plan is not invalidation in case of a change in NOT NULL constraint That's actually not a problem, even though some people (including me) have bandied about such suppositions in the past. Relying on attnotnull in the planner is perfectly safe [1]. Plus it'd likely be cheaper as well as more general than looking up pkey information. If we did need to explicitly record the plan's dependency on a constraint, this patch would be wrong anyhow because it fails to make any such notation about the pkey constraint it relied on. The "check_null_side" code you're proposing seems really horrid. For one thing, it seems quite out of place for eval_const_expressions to be doing that. For another, it's wrong in principle because eval_const_expressions doesn't know which part of the query tree it's being invoked on, so it cannot know whether outer-join nullability is an issue. For another, doing that work over again from scratch every time we see a potentially optimizable NullTest looks expensive. (I wonder whether you have tried to measure the performance penalty imposed by this patch in cases where it fails to make any proof.) I've been doing some handwaving about changing the representation of Vars, with an eye to making it clear by inspection whether a given Var is nullable by some lower outer join [2]. If that work ever comes to fruition then the need for "check_null_side" would go away. So maybe we should put this idea on the back burner until that happens. I'm not sure what I think about Ashutosh's ideas about doing this somewhere else than eval_const_expressions. I do not buy the argument that it's interesting to do this separately for each child partition. Child partitions that have attnotnull constraints different from their parent's are at best a tiny minority use-case, if indeed we allow them at all (I tend to think we shouldn't). On the other hand it's possible that postponing the check would allow bypassing the outer-join problem, ie if we only do it for quals that have dropped down to the relation scan level then we don't need to worry about outer join effects. Another angle here is that eval_const_expressions runs before reduce_outer_joins, meaning that if it's doing things that depend on outer-join-ness then it will sometimes fail to optimize cases that could be optimized. As a not incidental example, consider select ... from t1 left join t2 on (...) where t2.x is not null; reduce_outer_joins will realize that the left join can be reduced to a plain join, whereupon (if t2.x is attnotnull) the WHERE clause really is constant-true --- and this seems like a poster-child case for it being useful to optimize away the WHERE clause. But we won't be able to detect that if we apply the optimization during eval_const_expressions. So maybe that's a good reason to do it somewhere later. regards, tom lane [1] https://www.postgresql.org/message-id/23564.1585885251%40sss.pgh.pa.us [2] https://www.postgresql.org/message-id/flat/15848.1576515643%40sss.pgh.pa.us
Surafel Temesgen <surafel3000@gmail.com> writes:
> [ null_check_on_pkey_optimization_v1.patch ]
I took a very brief look at this.
> I don’t add NOT NULL constraint optimization to the patch because cached
> plan is not invalidation in case of a change in NOT NULL constraint
That's actually not a problem, even though some people (including me)
have bandied about such suppositions in the past. Relying on attnotnull
in the planner is perfectly safe [1]. Plus it'd likely be cheaper as
well as more general than looking up pkey information. If we did need
to explicitly record the plan's dependency on a constraint, this patch
would be wrong anyhow because it fails to make any such notation about
the pkey constraint it relied on.
ok thank you. I will change my next patch accordingly
The "check_null_side" code you're proposing seems really horrid.
For one thing, it seems quite out of place for eval_const_expressions
to be doing that. For another, it's wrong in principle because
eval_const_expressions doesn't know which part of the query tree
it's being invoked on, so it cannot know whether outer-join
nullability is an issue. For another, doing that work over again
from scratch every time we see a potentially optimizable NullTest
looks expensive. (I wonder whether you have tried to measure the
performance penalty imposed by this patch in cases where it fails
to make any proof.)
I was thinking about collecting data about joins only once at the start of eval_const_expressions but I assume most queries don't have NULL check expressions and postpone it until we find one. Thinking about it again I think it can be done better by storing check_null_side_state into eval_const_expressions_context to use it for subsequent evaluation.
I'm not sure what I think about Ashutosh's ideas about doing this
somewhere else than eval_const_expressions. I do not buy the argument
that it's interesting to do this separately for each child partition.
Child partitions that have attnotnull constraints different from their
parent's are at best a tiny minority use-case, if indeed we allow them
at all (I tend to think we shouldn't). On the other hand it's possible
that postponing the check would allow bypassing the outer-join problem,
ie if we only do it for quals that have dropped down to the relation
scan level then we don't need to worry about outer join effects.
At eval_const_expressions we check every expression and optimize it if possible. Introducing other check and optimization mechanism to same other place just for this optimization seems expensive with respect to performance penalty to me
Another angle here is that eval_const_expressions runs before
reduce_outer_joins, meaning that if it's doing things that depend
on outer-join-ness then it will sometimes fail to optimize cases
that could be optimized. As a not incidental example, consider
select ... from t1 left join t2 on (...) where t2.x is not null;
reduce_outer_joins will realize that the left join can be reduced
to a plain join, whereupon (if t2.x is attnotnull) the WHERE clause
really is constant-true --- and this seems like a poster-child case
for it being useful to optimize away the WHERE clause. But
we won't be able to detect that if we apply the optimization during
eval_const_expressions. So maybe that's a good reason to do it
somewhere later.
In this case the expression not changed to constant-true because the relation is on nullable side of outer join
I'm not sure what I think about Ashutosh's ideas about doing this
somewhere else than eval_const_expressions. I do not buy the argument
that it's interesting to do this separately for each child partition.
Child partitions that have attnotnull constraints different from their
parent's are at best a tiny minority use-case, if indeed we allow them
at all (I tend to think we shouldn't).
As a not incidental example, consider
select ... from t1 left join t2 on (...) where t2.x is not null;
reduce_outer_joins will realize that the left join can be reduced
to a plain join, whereupon (if t2.x is attnotnull) the WHERE clause
really is constant-true --- and this seems like a poster-child case
for it being useful to optimize away the WHERE clause. But
we won't be able to detect that if we apply the optimization during
eval_const_expressions. So maybe that's a good reason to do it
somewhere later.
Hi TomOn Tue, Sep 8, 2020 at 4:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:The "check_null_side" code you're proposing seems really horrid.
For one thing, it seems quite out of place for eval_const_expressions
to be doing that. For another, it's wrong in principle because
eval_const_expressions doesn't know which part of the query tree
it's being invoked on, so it cannot know whether outer-join
nullability is an issue. For another, doing that work over again
from scratch every time we see a potentially optimizable NullTest
looks expensive. (I wonder whether you have tried to measure the
performance penalty imposed by this patch in cases where it fails
to make any proof.)I was thinking about collecting data about joins only once at the start of eval_const_expressions but I assume most queries don't have NULL check expressions and postpone it until we find one. Thinking about it again I think it can be done better by storing check_null_side_state into eval_const_expressions_context to use it for subsequent evaluation.
Attached patch does like the above and includes NOT NULL constraint column.
regards
Surafel
Attachment
Thank you for working on this! I got slightly into this patch. I can be wrong, but my opinion is that planner/optimizer-related changes are not withoutdangers generally. So anyway, they should be justified by performance increase, or the previous behavior should beconsidered totally wrong. Patching the thing which is just a little sub-optimal seems for me seems not necessary. So it would be very good to see measurements of a performance gain from this patch. And also I think tests with partitionedand inherited relations for demonstration of the right work in the cases discussed in the thread should be a must-dofor this patch. -- Best regards, Pavel Borisov Postgres Professional: http://postgrespro.com The new status of this patch is: Waiting on Author
On Tue, 8 Sept 2020 at 13:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I've been doing some handwaving about changing the representation > of Vars, with an eye to making it clear by inspection whether a > given Var is nullable by some lower outer join [2]. If that work > ever comes to fruition then the need for "check_null_side" would > go away. So maybe we should put this idea on the back burner > until that happens. I looked at this patch too. I agree that we should delay adding any new smarts in regards to NULL or NOT NULL until we have some more robust infrastructure to make this sort of patch easier and cheaper. My vote is to just return this patch with feedback. Maybe Surafel will be interested in pursuing this later when we have better infrastructure or perhaps helping review the patch you're talking about. David
On Tue, 9 Mar 2021 at 05:13, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote: > It was a minor change therefore I rebased the patch, please take a look. I only had a quick look at the v3 patch. + rel = table_open(rte->relid, NoLock); + att = TupleDescAttr(rel->rd_att, var->varattno - 1); + if (att->attnotnull && !check_null_side(context->root, relid, context)) This is not really an acceptable way to determine the notnull attribute value. Andy Fan proposes a much better way in [1]. RelOptInfo is meant to cache the required Relation data that we need during query planning. IIRC, Andy's patch correctly uses this and does so in an efficient way. In any case, as you can see there's a bit of other work going on to smarten up the planner around NULL value detection. The UniqueKeys patch requires this and various other things have come up that really should be solved. Surafel, I'd suggest we return this patch with feedback and maybe you could instead help reviewing the other patches in regards to the NOT NULL tracking and maybe come back to this once the dust has settled and everyone is clear on how we determine if a column is NULL or not. Let me know your thoughts. David [1] https://www.postgresql.org/message-id/CAKU4AWpQjAqJwQ2X-aR9g3+ZHRzU1k8hNP7A+_mLuOv-n5aVKA@mail.gmail.com
On Tue, Jul 06, 2021 at 01:09:56PM +1200, David Rowley wrote: > On Tue, 9 Mar 2021 at 05:13, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote: > > It was a minor change therefore I rebased the patch, please take a look. > [...] > > This is not really an acceptable way to determine the notnull > attribute value. Andy Fan proposes a much better way in [1]. > RelOptInfo is meant to cache the required Relation data that we need > during query planning. IIRC, Andy's patch correctly uses this and does > so in an efficient way. > > In any case, as you can see there's a bit of other work going on to > smarten up the planner around NULL value detection. The UniqueKeys > patch requires this and various other things have come up that really > should be solved. > > Surafel, I'd suggest we return this patch with feedback and maybe you > could instead help reviewing the other patches in regards to the NOT > NULL tracking and maybe come back to this once the dust has settled > and everyone is clear on how we determine if a column is NULL or not. > > Let me know your thoughts. > Hi Surafel, We haven't seen an answer from you on this. I'm marking the patch as "Returned with feedback" as was suggested. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL