Re: Extracting only the columns needed for a query - Mailing list pgsql-hackers

From Pengzhou Tang
Subject Re: Extracting only the columns needed for a query
Date
Msg-id CAG4reAQc9vYdmQXh=1D789x8XJ=gEkV+E+fT9+s9tOWDXX3L9Q@mail.gmail.com
Whole thread Raw
In response to Re: Extracting only the columns needed for a query  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-hackers
> > On Sat, Jun 15, 2019 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Another reason for having the planner do this is that presumably, in
> > an AM that's excited about this, the set of fetched columns should
> > play into the cost estimates for the scan.  I've not been paying
> > enough attention to the tableam work to know if we've got hooks for
> > the AM to affect scan costing ... but if we don't, that seems like
> > a hole that needs plugged.
>
> AM callback relation_estimate_size exists currently which planner leverages.
> Via this callback it fetches tuples, pages, etc.. So, our thought is to extend
> this API if possible to pass down needed column and help perform better costing
> for the query. Though we think if wish to leverage this function, need to know
> list of columns before planning hence might need to use query tree.

I believe it would be beneficial to add this potential API extension patch into
the thread (as an example of an interface defining how scanCols could be used)
and review them together.

Thanks for your suggestion, we paste one potential API extension change bellow for zedstore to use scanCols.

The change contains 3 patches to clarify our idea.
0001-ANALYZE.patch is a generic patch for ANALYZE API extension, we develop it to make the
analysis of zedstore tables more accurate. It is more flexible now, eg, TableAm can provide
logical block number as random sample seed; TableAm can only analyze specified columns; TableAm
can provide extra info besides the data tuple.

0002-Planner.patch is the real patch to show how we use rte->scanCols for a cost estimate, the main idea
is adding a new metric 'stadiskfrac' to catalog pg_statistic, 'stadiskfrac' is the physical size ratio of a column,
it is calculated when ANALYZE is performed, 0001-ANALYZE.patch can help to provide extra disk size info.
So when set_plain_rel_size() is called by the planner, it uses rte->scanCols and 'stadiskfrac' to adjust the
rel->pages, please see set_plain_rel_page_estimates(). 

0003-ZedStore.patch is an example of how zedstore uses extended ANALYZE API, I paste it here anywhere, in case someone
is interest in it.

Thanks,
Pengzhou
Attachment

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: assert pg_class.relnatts is consistent
Next
From: Thomas Munro
Date:
Subject: Re: Parallel copy