Tracking notnull attributes inside Var - Mailing list pgsql-hackers

From Andy Fan
Subject Tracking notnull attributes inside Var
Date
Msg-id CAKU4AWoOfVA3ZS8tnrX1dME5vgfj+Jtn_JVMRvwyC6CakX0CbA@mail.gmail.com
Whole thread Raw
Responses Re: Tracking notnull attributes inside Var
Re: Tracking notnull attributes inside Var
List pgsql-hackers
notnulls discussion is forked from UniqueKey stuff, you can see the attachment
for the UnqiueKey introduction. Tom raised his opinion to track the nullability
inside Var[1][2][3], this thread would start from there based on my understanding.

Generally tracking the null attributes inside Var would have something like:

struct Var
{
...;
      int nullable;  // -1 unknown,  0 - not nullable.  1 - nullable
};

and then semantics of Var->nullable must be attached to a RelOptInfo. For
example:

CREATE TABLE t1(a int, b int);

SELECT abs(a) FROM t1 WHERE a > -100;

The var in RelOptInfo->reltarget should have nullable = 0 but the var in
RelOptInfo->baserestrictinfo should have nullable = 1;  The beauty of this
are: a). It can distinguish the two situations perfectly b). Whenever we want
to know the nullable attribute of a Var for an expression, it is super easy to
know. In summary, we need to maintain the nullable attribute at 2 different
places. one is the before the filters are executed(baserestrictinfo, joininfo,
ec_list at least).  one is after the filters are executed (RelOptInfo.reltarget
only?)

Come to JoinRel, we still need to maintain the 2 different cases as well.

As for the joinrel.reltarget, currently it looks up the inputrel's reltarget to
get the Var, so it is easy to inherit from Var->nullable from inputrel, but
we need to consider the new  changes introduced by current join,
Like new NOT nullable attributes because of join clauses OR new nullable
attributes because of outer join.  Everything looks good for now.

The hard part is RelOptInfo.joininfo & root->eq_classes. All of them uses
the shared RestrictInfo, and it is unclear which Var->nullable should be used in
them. To not provide a wrong answer, I think we can assume nullable=-1 (unknown)
and let the upper layer decides what to do (do we have known use cases to use
the nullable attribute here?).

More considerations about this strategy:
1. We might use more memory for different var copies, the only known cases
   RelOptInfo->reltarget for now.
2. _equalVar() has more complex semantics: shall we consider nulls or not.

My recent experience reminds me of another interesting use case of UniqueKey
which may reduce the planning time a lot IIUC (Value 3 in then attachment). Since
PG15 has just been released, I wonder if more people have time to discuss this topic
again. Do I think the way in the right direction?

[1] https://www.postgresql.org/message-id/1551312.1613142245%40sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/CAApHDvrRwhWCPKUD5H-EQoezHf%3DfnUUsPgTAnXsEOV8f8SF7XQ%40mail.gmail.com
[3] https://www.postgresql.org/message-id/1664320.1625577290%40sss.pgh.pa.us

--
Best Regards
Andy Fan
Attachment

pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)
Next
From: Nikolay Shaplov
Date:
Subject: Re: [PATCH] New [relation] option engine