On Fri, Apr 11, 2025 at 3:51 PM Richard Guo <guofenglinux@gmail.com> wrote:
> On Fri, Apr 11, 2025 at 4:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > OK. Maybe I shouldn't be worrying about the table_open() /
> > table_close() here, because I see that you are right that
> > has_subclass() is nearby, which admittedly does not involve opening
> > the relation, but it does involve fetching from the syscache a tuple
> > that we wouldn't need to fetch if we had a Relation available at that
> > point. And I see now that expand_virtual_generated_columns() is also
> > in that area and works similar to your proposed function in that it
> > just opens and closes all the relations. Perhaps that's just the way
> > we do things at this very early stage of the planner? But I feel like
> > it might make sense to do some reorganization of this code, though, so
> > that it more resembles the phase 1 and phase 2 as you describe them.
> > Both expand_virtual_generated_columns() and collect_relation_attrs()
> > care about exactly those relations with rte->rtekind == RTE_RELATION,
> > and even if we have to open and close all of those relations once to
> > do this processing, perhaps we can avoid doing it twice, and maybe
> > avoid the has_subclass() call along the way?
>
> Yeah, I agree on this. This is the optimization I mentioned upthread
> in the last paragraph of [1] - retrieving the small portion of catalog
> information needed at an early stage in one place instead of three.
> Hopefully, this way we only need one table_open/table_close at the
> early stage of the planner.
Here is the patchset that implements this optimization. 0001 moves
the expansion of virtual generated columns to occur before sublink
pull-up. 0002 introduces a new function, preprocess_relation_rtes,
which scans the rangetable for relation RTEs and performs inh flag
updates and virtual generated column expansion in a single loop, so
that only one table_open/table_close call is required for each
relation. 0003 collects NOT NULL attribute information for each
relation within the same loop, stores it in a relation OID based hash
table, and uses this information to reduce NullTest quals during
constant folding.
I think the code now more closely resembles the phase 1 and phase 2
described earlier: it collects all required early-stage catalog
information within a single loop over the rangetable, allowing each
relation to be opened and closed only once. It also avoids the
has_subclass() call along the way.
Thanks
Richard