Re: Extracting only the columns needed for a query - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Extracting only the columns needed for a query |
Date | |
Msg-id | 23194.1560618101@sss.pgh.pa.us Whole thread Raw |
In response to | Extracting only the columns needed for a query (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Extracting only the columns needed for a query
Re: Extracting only the columns needed for a query Re: Extracting only the columns needed for a query |
List | pgsql-hackers |
Melanie Plageman <melanieplageman@gmail.com> writes: > While hacking on zedstore, we needed to get a list of the columns to > be projected--basically all of the columns needed to satisfy the > query. The two use cases we have for this is > 1) to pass this column list down to the AM layer for the AM to leverage it > 2) for use during planning to improving costing > In other threads, such as [1], there has been discussion about the > possible benefits for all table types of having access to this set of > columns. The thing that most approaches to this have fallen down on is triggers --- that is, a trigger function might access columns mentioned nowhere in the SQL text. (See 8b6da83d1 for a recent example :-() If you have a plan for dealing with that, then ... > Approach B: after parsing and/or after planning If we wanted to do something about this, making the planner record the set of used columns seems like the thing to do. We could avoid the expense of doing it when it's not needed by setting up an AM/FDW/ etc property or callback to request it. Another reason for having the planner do this is that presumably, in an AM that's excited about this, the set of fetched columns should play into the cost estimates for the scan. I've not been paying enough attention to the tableam work to know if we've got hooks for the AM to affect scan costing ... but if we don't, that seems like a hole that needs plugged. > I'm not sure, however, if just getting the > PathTargets from the RelOptInfos is sufficient for obtaining the > whole set of columns used in the query. The PathTarget only records the columns that need to be *emitted* by the relation scan plan node. You'd also have to look at the baserestrictinfo conditions to see what columns are inspected by filter conditions. But that seems fine to me anyway since the AM might conceivably have different requirements for filter variables than emitted variables. (As a concrete example, filter conditions that are handled by non-lossy index checks might not give rise to any requirement for the table AM to do anything.) So that line of thought confirms that we want to do this at the end of planning when we know the shape of the plan tree; the parser can't do it usefully. > Approach B, however, does not work for utility statements which do > not go through planning. I'm not sure why you're excited about that case? Utility statements tend to be pretty much all-or-nothing as far as data access goes. regards, tom lane
pgsql-hackers by date: