Re: Extracting only the columns needed for a query - Mailing list pgsql-hackers
From | Dmitry Dolgov |
---|---|
Subject | Re: Extracting only the columns needed for a query |
Date | |
Msg-id | 20191217105950.slmha6jrqy5j3qo7@localhost Whole thread Raw |
In response to | Re: Extracting only the columns needed for a query (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Extracting only the columns needed for a query
Re: Extracting only the columns needed for a query |
List | pgsql-hackers |
> On Tue, Jul 16, 2019 at 06:49:10PM -0700, Melanie Plageman wrote: > > We implemented Approach B in the attached patch set (patch 0001) and > then implemented Approach A (patch 0002) to sanity check the pruned > list of columns to scan we were getting at plan-time. > We emit a notice in SeqNext() if the two column sets differ. > Currently, for all of the queries in the regression suite, no > differences are found. > We did notice that none of the current regression tests for triggers > are showing a difference between columns that can be extracted at plan > time and those that can be extracted at execution time--though we > haven't dug into this at all. Thanks for the patch! If I understand correctly from this thread, approach B is more preferable, so I've concentrated on the patch 0001 and have a few commentaries/questions: * The idea is to collect columns that have being used for selects/updates (where it makes sense for columnar storage to avoid extra work), do I see it right? If it's the case, then scanCols could be a bit misleading, since it gives an impression that it's only about reads. * After a quick experiment, it seems that extract_used_columns is invoked for updates, but collects too many colums, e.g: create table test (id int primary key, a text, b text, c text); update test set a = 'something' where id = 1; collects into scanCols all columns (varattno from 1 to 4) + again the first column from baserestrictinfo. Is it correct? * Not sure if it supposed to be covered by this functionality, but if we do insert ... on conflict (uniq_col) do update set other_col = 'something' and actually have to perform an update, extract_used_columns is not called. * Probably it also makes sense to check IS_DUMMY_REL in extract_used_columns? > > On Sat, Jun 15, 2019 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Another reason for having the planner do this is that presumably, in > > an AM that's excited about this, the set of fetched columns should > > play into the cost estimates for the scan. I've not been paying > > enough attention to the tableam work to know if we've got hooks for > > the AM to affect scan costing ... but if we don't, that seems like > > a hole that needs plugged. > > AM callback relation_estimate_size exists currently which planner leverages. > Via this callback it fetches tuples, pages, etc.. So, our thought is to extend > this API if possible to pass down needed column and help perform better costing > for the query. Though we think if wish to leverage this function, need to know > list of columns before planning hence might need to use query tree. I believe it would be beneficial to add this potential API extension patch into the thread (as an example of an interface defining how scanCols could be used) and review them together.
pgsql-hackers by date: