Re: How to know referenced sub-fields of a composite type? - Mailing list pgsql-hackers

From Haribabu Kommi
Subject Re: How to know referenced sub-fields of a composite type?
Date
Msg-id CAJrrPGcEnvK86cLsPmE5Eqw-2t9HG-azBUqCqM-f=E+qn2UcxQ@mail.gmail.com
Whole thread Raw
In response to Re: How to know referenced sub-fields of a composite type?  (Kohei KaiGai <kaigai@heterodb.com>)
List pgsql-hackers

On Wed, May 29, 2019 at 4:51 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
Hi Amit,

2019年5月29日(水) 13:26 Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>:
>
> Kaigai-san,
>
> On 2019/05/29 12:13, Kohei KaiGai wrote:
> > One interesting data type in Apache Arrow is "Struct" data type. It is
> > equivalent to composite
> > type in PostgreSQL. The "Struct" type has sub-fields, and individual
> > sub-fields have its own
> > values array for each.
> >
> > It means we can skip to load the sub-fields unreferenced, if
> > query-planner can handle
> > referenced and unreferenced sub-fields correctly.
> > On the other hands, it looks to me RelOptInfo or other optimizer
> > related structure don't have
> > this kind of information. RelOptInfo->attr_needed tells extension
> > which attributes are referenced
> > by other relation, however, its granularity is not sufficient for sub-fields.
>
> Isn't that true for some other cases as well, like when a query accesses
> only some sub-fields of a json(b) column?  In that case too, planner
> itself can't optimize away access to other sub-fields.  What it can do
> though is match a suitable index to the operator used to access the
> individual sub-fields, so that the index (if one is matched and chosen)
> can optimize away accessing unnecessary sub-fields.  IOW, it seems to me
> that the optimizer leaves it up to the indexes (and plan nodes) to further
> optimize access to within a field.  How is this case any different?
>
I think it is a little bit different scenario.
Even if an index on sub-fields can indicate the tuples to be fetched,
the fetched tuple contains all the sub-fields because heaptuple is
row-oriented data.
For example, if WHERE-clause checks a sub-field: "x" then aggregate
function references other sub-field "y", Scan/Join node has to return
a tuple that contains both "x" and "y". IndexScan also pops up a tuple
with a full composite type, so here is no problem if we cannot know
which sub-fields are referenced in the later stage.
Maybe, if IndexOnlyScan supports to return a partial composite type,
it needs similar infrastructure that can be used for a better composite
type support on columnar storage.

There is another issue related to the columnar store that needs targeted
columns for projection from the scan is discussed in zedstore [1]. 
Projecting all columns from a columnar store is quite expensive than
the row store. 

 
Regards,
Haribabu Kommi
Fujitsu Australia

pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: MSVC Build support with visual studio 2019
Next
From: Ashutosh Sharma
Date:
Subject: Server crash due to assertion failure in CheckOpSlotCompatibility()