Re: How to know referenced sub-fields of a composite type? - Mailing list pgsql-hackers

From Amit Langote
Subject Re: How to know referenced sub-fields of a composite type?
Date
Msg-id 07d488e8-9ef1-fe2b-d6d5-5236ec0cc3f0@lab.ntt.co.jp
Whole thread Raw
In response to Re: How to know referenced sub-fields of a composite type?  (Kohei KaiGai <kaigai@heterodb.com>)
Responses Re: How to know referenced sub-fields of a composite type?
List pgsql-hackers
On 2019/05/29 15:50, Kohei KaiGai wrote:
> 2019年5月29日(水) 13:26 Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>:
>>> It means we can skip to load the sub-fields unreferenced, if
>>> query-planner can handle
>>> referenced and unreferenced sub-fields correctly.
>>> On the other hands, it looks to me RelOptInfo or other optimizer
>>> related structure don't have
>>> this kind of information. RelOptInfo->attr_needed tells extension
>>> which attributes are referenced
>>> by other relation, however, its granularity is not sufficient for sub-fields.
>>
>> Isn't that true for some other cases as well, like when a query accesses
>> only some sub-fields of a json(b) column?  In that case too, planner
>> itself can't optimize away access to other sub-fields.  What it can do
>> though is match a suitable index to the operator used to access the
>> individual sub-fields, so that the index (if one is matched and chosen)
>> can optimize away accessing unnecessary sub-fields.  IOW, it seems to me
>> that the optimizer leaves it up to the indexes (and plan nodes) to further
>> optimize access to within a field.  How is this case any different?
>
> I think it is a little bit different scenario.
> Even if an index on sub-fields can indicate the tuples to be fetched,
> the fetched tuple contains all the sub-fields because heaptuple is
> row-oriented data.
>
> For example, if WHERE-clause checks a sub-field: "x" then aggregate
> function references other sub-field "y", Scan/Join node has to return
> a tuple that contains both "x" and "y". IndexScan also pops up a tuple
> with a full composite type, so here is no problem if we cannot know
> which sub-fields are referenced in the later stage.
> Maybe, if IndexOnlyScan supports to return a partial composite type,
> it needs similar infrastructure that can be used for a better composite
> type support on columnar storage.

Ah indeed.  I think I had misunderstood your intent.  Indexes have to do
with optimizing the "filtering" of complex/nested type (json, Arrow
Struct, etc.) values, where unnecessary sub-fields need not be read before
filtering, whereas you're interested in optimizing "projections" of
complex types, where sub-fields that are not used anywhere in the query
need not be read from the stored values.

>>> Probably, all we can do right now is walk-on the RelOptInfo list to
>>> lookup FieldSelect node
>>> to see the referenced sub-fields. Do we have a good idea instead of
>>> this expensive way?
>>> # Right now, PG-Strom loads all the sub-fields of Struct column from
>>> arrow_fdw foreign-table
>>> # regardless of referenced / unreferenced sub-fields. Just a second best.
>>
>> I'm missing something, but if PG-Strom/arrow_fdw does look at the
>> FieldSelect nodes to see which sub-fields are referenced, why doesn't it
>> generate a plan that will only access those sub-fields or why can't it?
>>
> Likely, it is not a technical problem but not a smart implementation.
> If I missed some existing infrastructure we can apply, it may be more suitable
> than query/expression tree walking.

There is no infrastructure for this as far as I know.  Maybe, some will be
built in the future now that storage format is pluggable.

Thanks,
Amit




pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Pinned files at Windows
Next
From: Amit Kapila
Date:
Subject: Re: Fix inconsistencies for v12