Re: Tracking notnull attributes inside Var - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Tracking notnull attributes inside Var
Date
Msg-id CAKU4AWr=0rm=GoALn3dSwYT7zKFE0VHSt-sAnL1ECCPBV16MQQ@mail.gmail.com
Whole thread Raw
In response to Re: Tracking notnull attributes inside Var  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
List pgsql-hackers
Hi Ashutosh:

   Nice to see you again!

On Tue, May 17, 2022 at 8:50 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Sun, May 15, 2022 at 8:41 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

>
> The var in RelOptInfo->reltarget should have nullable = 0 but the var in
> RelOptInfo->baserestrictinfo should have nullable = 1;  The beauty of this
> are: a). It can distinguish the two situations perfectly b). Whenever we want
> to know the nullable attribute of a Var for an expression, it is super easy to
> know. In summary, we need to maintain the nullable attribute at 2 different
> places. one is the before the filters are executed(baserestrictinfo, joininfo,
> ec_list at least).  one is after the filters are executed (RelOptInfo.reltarget
> only?)

Thanks for identifying this. What you have written makes sense and it
might open a few optimization opportunities. But let me put down some
other thoughts here. You might want to take those into consideration
when designing your solution.

Thanks. 
 

Do we want to just track nullable and non-nullable. May be we want
expand this class to nullable (var may be null), non-nullable (Var is
definitely non-NULL), null (Var will be always NULL).


Currently it doesn't support "Var will be always NULL" .  Do you have any
use cases for this? and I can't think of too many cases where we can get
such information except something like "SELECT a FROM t WHERE a 
IS NULL". 

But the other way to look at this is along the lines of equivalence
classes. Equivalence classes record the expressions which are equal in
the final result of the query. The equivalence class members are not
equal at all the stages of query execution.  But because they are
equal in the final result, we can impose that restriction on the lower
levels as well. Can we think of nullable in that fashion? If a Var is
non-nullable in the final result, we can impose that restriction on
the intermediate stages since rows with NULL values for that Var will
be filtered out somewhere. Similarly we could argue for null Var. But
knowledge that a Var is nullable in the final result does not impose a
NULL, non-NULL restriction on the intermediate stages. If we follow
this thought process, we don't need to differentiate Var at different
stages in query.

I agree this is an option.  If so we need to track it under the PlannerInfo 
struct but it would not be as fine-grained as my previous. Without
intermediate information,  We can't know if a UnqiueKey contains multiple
NULLs, this would not be an issue for the "MARK Distinct as no-op" case,
but I'm not sure it is OK for other UniqueKey user cases.  So my current idea
is I still prefer to maintain the intermediate information, unless we are sure it 
costs too much or it is too complex to implement which I don't think so for now
at least.  So if you have time to look at the attached patch, that would be super
great as well.

--
Best Regards
Andy Fan

pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Re: Tracking notnull attributes inside Var
Next
From: Tom Lane
Date:
Subject: Re: 15beta1 test failure on mips in isolation/expected/stats