Re: Reduce "Var IS [NOT] NULL" quals during constant folding - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Reduce "Var IS [NOT] NULL" quals during constant folding
Date
Msg-id CAMbWs4_soFRubFd64SX-GC+MA6RV=pZrwfn3oq89LNKoDuLjAg@mail.gmail.com
Whole thread Raw
In response to Re: Reduce "Var IS [NOT] NULL" quals during constant folding  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Apr 11, 2025 at 4:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
> OK. Maybe I shouldn't be worrying about the table_open() /
> table_close() here, because I see that you are right that
> has_subclass() is nearby, which admittedly does not involve opening
> the relation, but it does involve fetching from the syscache a tuple
> that we wouldn't need to fetch if we had a Relation available at that
> point. And I see now that expand_virtual_generated_columns() is also
> in that area and works similar to your proposed function in that it
> just opens and closes all the relations. Perhaps that's just the way
> we do things at this very early stage of the planner? But I feel like
> it might make sense to do some reorganization of this code, though, so
> that it more resembles the phase 1 and phase 2 as you describe them.
> Both expand_virtual_generated_columns() and collect_relation_attrs()
> care about exactly those relations with rte->rtekind == RTE_RELATION,
> and even if we have to open and close all of those relations once to
> do this processing, perhaps we can avoid doing it twice, and maybe
> avoid the has_subclass() call along the way?

Yeah, I agree on this.  This is the optimization I mentioned upthread
in the last paragraph of [1] - retrieving the small portion of catalog
information needed at an early stage in one place instead of three.
Hopefully, this way we only need one table_open/table_close at the
early stage of the planner.

> Maybe we can hoist a loop
> over parse->rtable up into subquery_planner and then have it call a
> virtual-column expansion function and a non-null collection function
> once per RTE_RELATION entry?

Hmm, I'm afraid there might be some difficulty with this approach.
The virtual-column expansion needs to be done after sublink pull-up to
ensure that the virtual-column references within the SubLinks that
should be transformed into joins can get expanded, while non-null
collection needs to be done before sublink pull-up since we might want
to leverage the non-null information to convert NOT IN sublinks to
anti joins.

What I had in mind is that we hoist a loop over parse->rtable before
pull_up_sublinks to gather information about which columns of each
relation are defined as NOT NULL and which are virtually generated.
These information will be used in sublink pull-up and virtual-column
expansion.  We also call has_subclass for each relation that is maked
'inh' within that loop and clear the inh flag if needed.

This seems to require a fair amount of code changes, so I'd like to
get some feedback on this approach before proceeding with the
implementation.

> A related point that I'm noticing is that you record the not-NULL
> information in the RangeTblEntry. I wonder whether that's going to be
> a problem, because I think of the RangeTblEntry as a parse-time
> structure and the RelOptInfo as a plan-time structure, meaning that we
> shouldn't scribble on the former and that we should record any
> plan-time details we need in the latter. I understand that the problem
> is precisely that the RelOptInfo isn't yet available, but I'm not sure
> that makes it OK to use the RangeTblEntry instead.

Fair point.  We should do our best to refrain from scribbling on the
parsetree in the planner.  I initially went with the hashtable
approach as Tom suggested, but later found it quite handy to store the
not-null information in the RangeTblEntry, especially since we do
something similar with rte->inh.  However, I've come to realize that
inh may not be a good example to follow after all, so I'll go back to
the hashtable method.

Thank you for pointing that out before I went too far down the wrong
path.

[1] https://postgr.es/m/CAMbWs4-DryEm_U-juPn3HwUiwZRXW3jhfX18b_AFgrgihgq4kg@mail.gmail.com

Thanks
Richard



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding
Next
From: Peter Eisentraut
Date:
Subject: Re: Improve a few appendStringInfo calls new to v18