Re: How to know referenced sub-fields of a composite type? - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: How to know referenced sub-fields of a composite type? |
Date | |
Msg-id | 07d488e8-9ef1-fe2b-d6d5-5236ec0cc3f0@lab.ntt.co.jp Whole thread Raw |
In response to | Re: How to know referenced sub-fields of a composite type? (Kohei KaiGai <kaigai@heterodb.com>) |
Responses |
Re: How to know referenced sub-fields of a composite type?
|
List | pgsql-hackers |
On 2019/05/29 15:50, Kohei KaiGai wrote: > 2019年5月29日(水) 13:26 Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>: >>> It means we can skip to load the sub-fields unreferenced, if >>> query-planner can handle >>> referenced and unreferenced sub-fields correctly. >>> On the other hands, it looks to me RelOptInfo or other optimizer >>> related structure don't have >>> this kind of information. RelOptInfo->attr_needed tells extension >>> which attributes are referenced >>> by other relation, however, its granularity is not sufficient for sub-fields. >> >> Isn't that true for some other cases as well, like when a query accesses >> only some sub-fields of a json(b) column? In that case too, planner >> itself can't optimize away access to other sub-fields. What it can do >> though is match a suitable index to the operator used to access the >> individual sub-fields, so that the index (if one is matched and chosen) >> can optimize away accessing unnecessary sub-fields. IOW, it seems to me >> that the optimizer leaves it up to the indexes (and plan nodes) to further >> optimize access to within a field. How is this case any different? > > I think it is a little bit different scenario. > Even if an index on sub-fields can indicate the tuples to be fetched, > the fetched tuple contains all the sub-fields because heaptuple is > row-oriented data. > > For example, if WHERE-clause checks a sub-field: "x" then aggregate > function references other sub-field "y", Scan/Join node has to return > a tuple that contains both "x" and "y". IndexScan also pops up a tuple > with a full composite type, so here is no problem if we cannot know > which sub-fields are referenced in the later stage. > Maybe, if IndexOnlyScan supports to return a partial composite type, > it needs similar infrastructure that can be used for a better composite > type support on columnar storage. Ah indeed. I think I had misunderstood your intent. Indexes have to do with optimizing the "filtering" of complex/nested type (json, Arrow Struct, etc.) values, where unnecessary sub-fields need not be read before filtering, whereas you're interested in optimizing "projections" of complex types, where sub-fields that are not used anywhere in the query need not be read from the stored values. >>> Probably, all we can do right now is walk-on the RelOptInfo list to >>> lookup FieldSelect node >>> to see the referenced sub-fields. Do we have a good idea instead of >>> this expensive way? >>> # Right now, PG-Strom loads all the sub-fields of Struct column from >>> arrow_fdw foreign-table >>> # regardless of referenced / unreferenced sub-fields. Just a second best. >> >> I'm missing something, but if PG-Strom/arrow_fdw does look at the >> FieldSelect nodes to see which sub-fields are referenced, why doesn't it >> generate a plan that will only access those sub-fields or why can't it? >> > Likely, it is not a technical problem but not a smart implementation. > If I missed some existing infrastructure we can apply, it may be more suitable > than query/expression tree walking. There is no infrastructure for this as far as I know. Maybe, some will be built in the future now that storage format is pluggable. Thanks, Amit
pgsql-hackers by date: