Re: How to know referenced sub-fields of a composite type? - Mailing list pgsql-hackers
From | Kohei KaiGai |
---|---|
Subject | Re: How to know referenced sub-fields of a composite type? |
Date | |
Msg-id | 29E07F2C-6257-4537-8DC1-B952BF08448B@heterodb.com Whole thread Raw |
In response to | Re: How to know referenced sub-fields of a composite type? (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
List | pgsql-hackers |
2019/05/30 16:33、Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>のメール: >> On 2019/05/29 15:50, Kohei KaiGai wrote: >> 2019年5月29日(水) 13:26 Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>: >>>> It means we can skip to load the sub-fields unreferenced, if >>>> query-planner can handle >>>> referenced and unreferenced sub-fields correctly. >>>> On the other hands, it looks to me RelOptInfo or other optimizer >>>> related structure don't have >>>> this kind of information. RelOptInfo->attr_needed tells extension >>>> which attributes are referenced >>>> by other relation, however, its granularity is not sufficient for sub-fields. >>> >>> Isn't that true for some other cases as well, like when a query accesses >>> only some sub-fields of a json(b) column? In that case too, planner >>> itself can't optimize away access to other sub-fields. What it can do >>> though is match a suitable index to the operator used to access the >>> individual sub-fields, so that the index (if one is matched and chosen) >>> can optimize away accessing unnecessary sub-fields. IOW, it seems to me >>> that the optimizer leaves it up to the indexes (and plan nodes) to further >>> optimize access to within a field. How is this case any different? >> >> I think it is a little bit different scenario. >> Even if an index on sub-fields can indicate the tuples to be fetched, >> the fetched tuple contains all the sub-fields because heaptuple is >> row-oriented data. >> >> For example, if WHERE-clause checks a sub-field: "x" then aggregate >> function references other sub-field "y", Scan/Join node has to return >> a tuple that contains both "x" and "y". IndexScan also pops up a tuple >> with a full composite type, so here is no problem if we cannot know >> which sub-fields are referenced in the later stage. >> Maybe, if IndexOnlyScan supports to return a partial composite type, >> it needs similar infrastructure that can be used for a better composite >> type support on columnar storage. > > Ah indeed. I think I had misunderstood your intent. Indexes have to do > with optimizing the "filtering" of complex/nested type (json, Arrow > Struct, etc.) values, where unnecessary sub-fields need not be read before > filtering, whereas you're interested in optimizing "projections" of > complex types, where sub-fields that are not used anywhere in the query > need not be read from the stored values. > >>>> Probably, all we can do right now is walk-on the RelOptInfo list to >>>> lookup FieldSelect node >>>> to see the referenced sub-fields. Do we have a good idea instead of >>>> this expensive way? >>>> # Right now, PG-Strom loads all the sub-fields of Struct column from >>>> arrow_fdw foreign-table >>>> # regardless of referenced / unreferenced sub-fields. Just a second best. >>> >>> I'm missing something, but if PG-Strom/arrow_fdw does look at the >>> FieldSelect nodes to see which sub-fields are referenced, why doesn't it >>> generate a plan that will only access those sub-fields or why can't it? >>> >> Likely, it is not a technical problem but not a smart implementation. >> If I missed some existing infrastructure we can apply, it may be more suitable >> than query/expression tree walking. > > There is no infrastructure for this as far as I know. Maybe, some will be > built in the future now that storage format is pluggable. If we design a common infrastructure for both of built-in and extension features, it makes sense for the kinds of storagesystem. IndexOnlyScan is one of the built-in feature that is beneficial by the information of projection. Currently, we always don’tchoose IndexOnlyScan if index is on sub-field of composite. Best regards,
pgsql-hackers by date: