Re: Reduce "Var IS [NOT] NULL" quals during constant folding - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Re: Reduce "Var IS [NOT] NULL" quals during constant folding |
Date | |
Msg-id | CAMbWs4_0B0MjZuQY-az6r=K03ONfdBCKbtoQCHj0p4UwGdOb4A@mail.gmail.com Whole thread Raw |
In response to | Re: Reduce "Var IS [NOT] NULL" quals during constant folding (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Reduce "Var IS [NOT] NULL" quals during constant folding
|
List | pgsql-hackers |
On Tue, Apr 1, 2025 at 1:55 AM Robert Haas <robertmhaas@gmail.com> wrote: > As a general principle, I have found that it's usually a sign that > something has been designed poorly when you find yourself wanting to > open a relation, get exactly one piece of information, and close the > relation again. That is why, today, all the information that the > planner needs about a particular relation is retrieved by > get_relation_info(). We do not just wander around doing random catalog > lookups wherever we need some critical detail. This patch increases > the number of places where we fetch relation data from 1 to 2, but > it's still the case that almost everything happens in > get_relation_info(), and there's now just exactly this 1 thing that is > done in a different place. That doesn't seem especially nice. I > thought the idea was going to be to move get_relation_info() to an > earlier stage, not split one thing out of it while leaving everything > else the same. I initially considered moving get_relation_info() to an earlier stage, where we would collect all the per-relation data by relation OID. Later, when building the RelOptInfos, we could link them to the per-relation-OID data. However, I gave up this idea because I realized it would require retrieving a whole bundle of catalog information that isn't needed until after the RelOptInfos are built, such as max_attr, pages, tuples, reltablespace, parallel_workers, extended statistics, etc. And we may also need to create the IndexOptInfos for the relation's indexes. It seems to me that it's not a trivial task to move get_relation_info() before building the RelOptInfos, and more importantly, it's unnecessary most of the time. In other words, of the many pieces of catalog information we need to retrieve for a relation, only a small portion is needed at an early stage. As far as I can see, this small portion includes: * relhassubclass, which we retrieve immediately after we have finished adding rangetable entries. * attgenerated, which we retrieve in expand_virtual_generated_columns. * attnotnull, as discussed here, which should ideally be retrieved before pull_up_sublinks. My idea is to retrieve only this small portion at an early stage, and still defer collecting the majority of the catalog information until building the RelOptInfos. It might be possible to optimize by retrieving this small portion in one place instead of three, but moving the entire get_relation_info() to an earlier stage doesn't seem like a good idea to me. It could be argued that the separation of catalog information collection isn't very great, but it seems this isn't something new in this patch. So I respectfully disagree with your statement that "all the information that the planner needs about a particular relation is retrieved by get_relation_info()", and that "there's now just exactly this 1 thing that is done in a different place". For instance, relhassubclass and attgenerated are retrieved before expression preprocessing, a relation's constraint expressions are retrieved when setting the relation's size estimates, and more. Thanks Richard
pgsql-hackers by date: