On Sat, Jun 15, 2019 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Approach B: after parsing and/or after planning
If we wanted to do something about this, making the planner record the set of used columns seems like the thing to do. We could avoid the expense of doing it when it's not needed by setting up an AM/FDW/ etc property or callback to request it.
Sounds good. In Zedstore patch, we have added AM property to convey the AM
leverages column projection and currently skip physical tlist optimization based
on it. So, yes can similarly be leveraged for other planning needs.
Another reason for having the planner do this is that presumably, in an AM that's excited about this, the set of fetched columns should play into the cost estimates for the scan. I've not been paying enough attention to the tableam work to know if we've got hooks for the AM to affect scan costing ... but if we don't, that seems like a hole that needs plugged.
AM callback relation_estimate_size exists currently which planner leverages. Via
this callback it fetches tuples, pages, etc.. So, our thought is to extend this
API if possible to pass down needed column and help perform better costing for
the query. Though we think if wish to leverage this function, need to know list
of columns before planning hence might need to use query tree.
> Approach B, however, does not work for utility statements which do > not go through planning.
I'm not sure why you're excited about that case? Utility statements tend to be pretty much all-or-nothing as far as data access goes.
Statements like COPY, CREATE INDEX, CREATE CONSTRAINTS, etc.. can benefit from
subset of columns for scan. For example in Zedstore currently for CREATE
INDEX we extract needed columns by walking indexInfo->ii_Predicate and
indexInfo->ii_Expressions. For COPY, we currently use cstate->attnumlist to know