Re: Reduce "Var IS [NOT] NULL" quals during constant folding - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Reduce "Var IS [NOT] NULL" quals during constant folding
Date
Msg-id CAMbWs4-DryEm_U-juPn3HwUiwZRXW3jhfX18b_AFgrgihgq4kg@mail.gmail.com
Whole thread Raw
In response to Re: Reduce "Var IS [NOT] NULL" quals during constant folding  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Apr 2, 2025 at 4:34 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Apr 1, 2025 at 2:34 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > However, I gave up this idea because I realized it would require
> > retrieving a whole bundle of catalog information that isn't needed
> > until after the RelOptInfos are built, such as max_attr, pages,
> > tuples, reltablespace, parallel_workers, extended statistics, etc.

> Why is that bad? I mean, if we're going to need that information
> anyway, then gathering it at earlier stage doesn't hurt. Of course, if
> we move it too early, say before partition pruning, then we might
> gather information we don't really need and hurt performance. But
> otherwise it doesn't seem to hurt anything.

The attnotnull catalog information being discussed here is intended
for use during constant folding (and possibly sublink pull-up), which
occurs long before partition pruning.  Am I missing something?

Additionally, I'm doubtful that the collection of relhassubclass can
be moved after partition pruning.  How can we determine whether a
relation is inheritable without retrieving its relhassubclass
information?

As for attgenerated, I also doubt that it can be retrieved after
partition pruning.  It is used to expand virtual generated columns,
which can result in new PlaceHolderVars.  This means it must be done
before deconstruct_jointree, as make_outerjoininfo requires all active
PlaceHolderVars to be present in root->placeholder_list.

If these pieces of information cannot be retrieved after partition
pruning, and for performance reasons we don't want to move the
gathering of the majority of the catalog information before partition
pruning, then it seems to me that moving get_relation_info() to an
earlier stage might not be very meaningful.  What do you think?

> > It could be argued that the separation of catalog information
> > collection isn't very great, but it seems this isn't something new in
> > this patch.  So I respectfully disagree with your statement that "all
> > the information that the planner needs about a particular relation is
> > retrieved by get_relation_info()", and that "there's now just exactly
> > this 1 thing that is done in a different place".  For instance,
> > relhassubclass and attgenerated are retrieved before expression
> > preprocessing, a relation's constraint expressions are retrieved when
> > setting the relation's size estimates, and more.

> Nonetheless I think we ought to be trying to consolidate more things
> into get_relation_info(), not disperse some of the things that are
> there to other places.

I agree with the general idea of more consolidation and less
dispersion, but squashing all information collection into
get_relation_info() seems quite challenging.  However, I think we can
make at least one optimization, as I mentioned upthread — retrieving
the small portion of catalog information needed at an early stage in
one place instead of three.  Perhaps we could start with moving the
retrieval of relhassubclass into collect_relation_attrs() and rename
this function to better reflect this change.

Thanks
Richard



pgsql-hackers by date:

Previous
From: Tender Wang
Date:
Subject: Re: bug when apply fast default mechanism for adding new column over domain with default value
Next
From: jian he
Date:
Subject: Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints