Thread: Extracting only the columns needed for a query

Extracting only the columns needed for a query

From
Melanie Plageman
Date:
While hacking on zedstore, we needed to get a list of the columns to
be projected--basically all of the columns needed to satisfy the
query. The two use cases we have for this is
1) to pass this column list down to the AM layer for the AM to leverage it
2) for use during planning to improving costing
In other threads, such as [1], there has been discussion about the
possible benefits for all table types of having access to this set of
columns.

Focusing on how to get this used cols list (as opposed to how to pass
it down to the AM), we have tried a few approaches to constructing it
and wanted to get some ideas on how best to do it.

We are trying to determine which phase to get the columns -- after
parsing, after planning, or during execution right before calling the
AM.

Approach A: right before calling AM

    Leverage expression_tree_walker() right before calling beginscan()
    and collecting the columns into a needed columns context. This
    approach is what is currently in the zedstore patch mentioned in
    this thread [2].

    The benefit of this approach is that it walks the tree right
    before the used column set will be used--which makes it easy to
    skip this walk for queries or AMs that don't benefit from this
    used columns list.

Approach B: after parsing and/or after planning

    Add a new member 'used_cols' to PlannedStmt which contains the
    attributes for each relation present in the query. Construct
    'used_cols' at the end of planning using the PathTargets in the
    RelOptInfos in the PlannerInfo->simple_rel_array and the
    RangeTblEntries in PlannerInfo->simple_rte_array.

    The nice thing about this is that it does not require a full walk
    of the plan tree. Approach A could be more expensive if the tree
    is quite large. I'm not sure, however, if just getting the
    PathTargets from the RelOptInfos is sufficient for obtaining the
    whole set of columns used in the query.

    Approach B, however, does not work for utility statements which do
    not go through planning.

    One potential solution to this that we tried was getting the
    columns from the query tree after parse analyze and then in
    exec_simple_query() adding the column list to the PlannedStmt.

    This turned out to be as messy or more than Approach A because
    each kind of utility statement has its own data structure that is
    composed of elements taken from the Query tree but does not
    directly include the original PlannedStmt created for the query
    (the PlannedStmt doesn't contain anything except the query tree
    for utility statements since they do not go through planning). So,
    for each type of utility statement, we would have to separately
    copy over the column list from the PlannedStmt in its own way.

    It is worth noting that Approach A also requires special handling
    for each type of utility statement.

We are wondering about specific benefits of Approach B--that is, is
there some use case (maybe plan re-use) for having the column set
accessible in the PlannedStmt itself?

One potential benefit of Approach B could be for scans of partition
tables. Collecting the used column list could be done once for the
query instead of once for each partition.

Both approaches, however, do not address our second use case, as we
would not have the column list during planning for non-utility
statements. To satisfy this, we would likely have to extract the
columns from the query tree after parse analyze for non-utility
statements as well.

An approach which extracted this list before planning and saved it
somewhere would help avoid having to do the same walk during planning
and then again during execution. Though, using the list constructed
after parsing may not be ideal when some columns were able to be
eliminated during planning.

Melanie & Ashwin

Re: Extracting only the columns needed for a query

From
David Rowley
Date:
On Sat, 15 Jun 2019 at 13:46, Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> While hacking on zedstore, we needed to get a list of the columns to
> be projected--basically all of the columns needed to satisfy the
> query. The two use cases we have for this is
> 1) to pass this column list down to the AM layer for the AM to leverage it
> 2) for use during planning to improving costing
> In other threads, such as [1], there has been discussion about the
> possible benefits for all table types of having access to this set of
> columns.
>
> Focusing on how to get this used cols list (as opposed to how to pass
> it down to the AM), we have tried a few approaches to constructing it
> and wanted to get some ideas on how best to do it.
>
> We are trying to determine which phase to get the columns -- after
> parsing, after planning, or during execution right before calling the
> AM.

For planning, isn't this information already available via either
targetlists or from the RelOptInfo->attr_needed array combined with
min_attr and max_attr?

If it's going to be too much overhead to extract vars from the
targetlist during executor startup then maybe something can be done
during planning to set a new Bitmapset field of attrs in
RangeTblEntry. Likely that can be built easily by looking at
attr_needed in RelOptInfo. Parse is too early to set this as which
attributes are needed can change during planning. For example, look at
remove_rel_from_query() in analyzejoins.c. If you don't need access to
this during planning then maybe setrefs.c is a good place to set
something. Although, it would be nice not to add this overhead when
the executor does not need this information. I'm unsure how the
planner could know that though, other than by having another tableam
callback function to tell it.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Extracting only the columns needed for a query

From
Tom Lane
Date:
Melanie Plageman <melanieplageman@gmail.com> writes:
> While hacking on zedstore, we needed to get a list of the columns to
> be projected--basically all of the columns needed to satisfy the
> query. The two use cases we have for this is
> 1) to pass this column list down to the AM layer for the AM to leverage it
> 2) for use during planning to improving costing
> In other threads, such as [1], there has been discussion about the
> possible benefits for all table types of having access to this set of
> columns.

The thing that most approaches to this have fallen down on is triggers ---
that is, a trigger function might access columns mentioned nowhere in the
SQL text.  (See 8b6da83d1 for a recent example :-()  If you have a plan
for dealing with that, then ...

> Approach B: after parsing and/or after planning

If we wanted to do something about this, making the planner record
the set of used columns seems like the thing to do.  We could avoid
the expense of doing it when it's not needed by setting up an AM/FDW/
etc property or callback to request it.

Another reason for having the planner do this is that presumably, in
an AM that's excited about this, the set of fetched columns should
play into the cost estimates for the scan.  I've not been paying
enough attention to the tableam work to know if we've got hooks for
the AM to affect scan costing ... but if we don't, that seems like
a hole that needs plugged.

>     I'm not sure, however, if just getting the
>     PathTargets from the RelOptInfos is sufficient for obtaining the
>     whole set of columns used in the query.

The PathTarget only records the columns that need to be *emitted*
by the relation scan plan node.  You'd also have to look at the
baserestrictinfo conditions to see what columns are inspected by
filter conditions.  But that seems fine to me anyway since the AM
might conceivably have different requirements for filter variables than
emitted variables.  (As a concrete example, filter conditions that are
handled by non-lossy index checks might not give rise to any requirement
for the table AM to do anything.)  So that line of thought confirms that
we want to do this at the end of planning when we know the shape of the
plan tree; the parser can't do it usefully.

>     Approach B, however, does not work for utility statements which do
>     not go through planning.

I'm not sure why you're excited about that case?  Utility statements
tend to be pretty much all-or-nothing as far as data access goes.

            regards, tom lane



Re: Extracting only the columns needed for a query

From
Corey Huinker
Date:
The thing that most approaches to this have fallen down on is triggers ---
that is, a trigger function might access columns mentioned nowhere in the
SQL text.  (See 8b6da83d1 for a recent example :-()  If you have a plan
for dealing with that, then ...

Well, if we had a trigger language that compiled to <something> at creation time, and that trigger didn't do any dynamic/eval code, we could store which attributes and rels were touched inside the trigger.

I'm not sure if that trigger language would be sql, plpgsql with a "compile" pragma, or maybe we exhume PSM, but it could have some side benefits:

  1. This same issue haunts any attempts at refactoring triggers and referential integrity, so narrowing the scope of what a trigger touches will help there too
  2. additional validity checks
  3. (this is an even bigger stretch) possibly a chance to combine multiple triggers into one statement, or combine mutliple row-based triggers into a statement level trigger

Of course, this all falls apart with one dynamic SQL or one SELECT *, but it would be incentive for the users to refactor code to not do things that impede trigger optimization.

Re: Extracting only the columns needed for a query

From
Ashwin Agrawal
Date:

On Sat, Jun 15, 2019 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Approach B: after parsing and/or after planning

If we wanted to do something about this, making the planner record
the set of used columns seems like the thing to do.  We could avoid
the expense of doing it when it's not needed by setting up an AM/FDW/
etc property or callback to request it.

Sounds good. In Zedstore patch, we have added AM property to convey the AM
leverages column projection and currently skip physical tlist optimization based
on it. So, yes can similarly be leveraged for other planning needs.
 
Another reason for having the planner do this is that presumably, in
an AM that's excited about this, the set of fetched columns should
play into the cost estimates for the scan.  I've not been paying
enough attention to the tableam work to know if we've got hooks for
the AM to affect scan costing ... but if we don't, that seems like
a hole that needs plugged.

AM callback relation_estimate_size exists currently which planner leverages. Via
this callback it fetches tuples, pages, etc.. So, our thought is to extend this
API if possible to pass down needed column and help perform better costing for
the query. Though we think if wish to leverage this function, need to know list
of columns before planning hence might need to use query tree.


>     Approach B, however, does not work for utility statements which do
>     not go through planning.

I'm not sure why you're excited about that case?  Utility statements
tend to be pretty much all-or-nothing as far as data access goes.

Statements like COPY, CREATE INDEX, CREATE CONSTRAINTS, etc.. can benefit from
subset of columns for scan. For example in Zedstore currently for CREATE
INDEX we extract needed columns by walking indexInfo->ii_Predicate and
indexInfo->ii_Expressions. For COPY, we currently use cstate->attnumlist to know
which columns need to be scanned.

Re: Extracting only the columns needed for a query

From
Melanie Plageman
Date:
We implemented Approach B in the attached patch set (patch 0001) and
then implemented Approach A (patch 0002) to sanity check the pruned
list of columns to scan we were getting at plan-time.
We emit a notice in SeqNext() if the two column sets differ.
Currently, for all of the queries in the regression suite, no
differences are found.
We did notice that none of the current regression tests for triggers
are showing a difference between columns that can be extracted at plan
time and those that can be extracted at execution time--though we
haven't dug into this at all.

In our implementation of Approach B, we extract the columns to scan in
make_one_rel() after set_base_rel_sizes() and before
set_base_rel_pathlists() so that the scan cols can be used in costing.
We do it after calling set_base_rel_sizes() because it eventually
calls set_append_rel_size() which adds PathTarget exprs for the
partitions with Vars having the correct varattno and varno.

We added the scanCols to RangeTblEntries because it seemed like the
easiest way to make sure the information was available at scan time
(as suggested by David).

A quirk in the patch for Approach B is that, in inheritance_planner(),
we copy over the scanCols which we populated in each subroot's rtable
to the final rtable.
The rtables for all of the subroots seem to be basically unused after
finishing with inheritance_planner(). That is, many other members of
the modified child PlannerInfos are copied over to the root
PlannerInfo, but the rtables seem to be an exception.
If we want to get at them later, we would have had to go rooting
around in the pathlist of the RelOptInfo, which seemed very complex.

One note: we did not add any logic to make the extraction of scan
columns conditional on the AM. We have code to do it conditionally in
the zedstore patch, but we made it generic here.

While we were implementing this, we found that in the columns
extracted at plan-time, there were excess columns for
UPDATE/DELETE...RETURNING on partition tables.
Vars for these columns are being added to the targetlist in
preprocess_targetlist(). Eventually targetlist items are added to
RelOptInfo->reltarget exprs.
However, when result_relation is 0, this means all of the vars from
the returningList will be added to the targetlist, which seems
incorrect. We included a patch (0003) to only add the vars if
result_relation is not 0.

Adding the scanCols in RangeTblEntry, we noticed that a few other
members of RangeTblEntry are consistently initialized to NULL whenever
a new RangeTblEntry is being made.
This is confusing because makeNode() for a RangeTblEntry does
palloc0() anyway, so, why is this necessary?
If it is necessary, why not create some kind of generic initialization
function to do this?

Thanks,
Ashwin & Melanie
Attachment

Re: Extracting only the columns needed for a query

From
Melanie Plageman
Date:

On Sat, Jun 15, 2019 at 10:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Melanie Plageman <melanieplageman@gmail.com> writes:
> While hacking on zedstore, we needed to get a list of the columns to
> be projected--basically all of the columns needed to satisfy the
> query. The two use cases we have for this is
> 1) to pass this column list down to the AM layer for the AM to leverage it
> 2) for use during planning to improving costing
> In other threads, such as [1], there has been discussion about the
> possible benefits for all table types of having access to this set of
> columns.

The thing that most approaches to this have fallen down on is triggers ---
that is, a trigger function might access columns mentioned nowhere in the
SQL text.  (See 8b6da83d1 for a recent example :-()  If you have a plan
for dealing with that, then ...


For triggers, there's not much we can do since we don't know what
columns the trigger function requires. All of the columns will have to
be scanned for GetTupleForTrigger().
The initial scan can still use the scanCols, though.

Currently, when we use the scanCols, triggers work because
GetTupleForTrigger() will call table_tuple_lock(). tuple_table_lock()
expects the return slot to be populated with the latest fetched
tuple--which will have all of the columns.
So, you don't get any kind of optimization, but, really, with the
current TRIGGER/FUNCTION syntax, it doesn't seem like we could do
better than that.

Ashwin & Melanie

Re: Extracting only the columns needed for a query

From
Dmitry Dolgov
Date:
> On Tue, Jul 16, 2019 at 06:49:10PM -0700, Melanie Plageman wrote:
>
> We implemented Approach B in the attached patch set (patch 0001) and
> then implemented Approach A (patch 0002) to sanity check the pruned
> list of columns to scan we were getting at plan-time.
> We emit a notice in SeqNext() if the two column sets differ.
> Currently, for all of the queries in the regression suite, no
> differences are found.
> We did notice that none of the current regression tests for triggers
> are showing a difference between columns that can be extracted at plan
> time and those that can be extracted at execution time--though we
> haven't dug into this at all.

Thanks for the patch! If I understand correctly from this thread,
approach B is more preferable, so I've concentrated on the patch 0001
and have a few commentaries/questions:

* The idea is to collect columns that have being used for selects/updates
  (where it makes sense for columnar storage to avoid extra work), do I see it
  right? If it's the case, then scanCols could be a bit misleading, since it
  gives an impression that it's only about reads.

* After a quick experiment, it seems that extract_used_columns is invoked for
  updates, but collects too many colums, e.g:

    create table test (id int primary key, a text, b text, c text);
    update test set a = 'something' where id = 1;

  collects into scanCols all columns (varattno from 1 to 4) + again the first
  column from baserestrictinfo. Is it correct?

* Not sure if it supposed to be covered by this functionality, but if we do

    insert ... on conflict (uniq_col) do update set other_col = 'something'

  and actually have to perform an update, extract_used_columns is not called.

* Probably it also makes sense to check IS_DUMMY_REL in extract_used_columns?

> > On Sat, Jun 15, 2019 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Another reason for having the planner do this is that presumably, in
> > an AM that's excited about this, the set of fetched columns should
> > play into the cost estimates for the scan.  I've not been paying
> > enough attention to the tableam work to know if we've got hooks for
> > the AM to affect scan costing ... but if we don't, that seems like
> > a hole that needs plugged.
>
> AM callback relation_estimate_size exists currently which planner leverages.
> Via this callback it fetches tuples, pages, etc.. So, our thought is to extend
> this API if possible to pass down needed column and help perform better costing
> for the query. Though we think if wish to leverage this function, need to know
> list of columns before planning hence might need to use query tree.

I believe it would be beneficial to add this potential API extension patch into
the thread (as an example of an interface defining how scanCols could be used)
and review them together.



Re: Extracting only the columns needed for a query

From
Melanie Plageman
Date:


On Tue, Dec 17, 2019 at 2:57 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Thanks for the patch! If I understand correctly from this thread,
approach B is more preferable, so I've concentrated on the patch 0001
and have a few commentaries/questions:

Thanks so much for the review!
 

* The idea is to collect columns that have being used for selects/updates
  (where it makes sense for columnar storage to avoid extra work), do I see it
  right? If it's the case, then scanCols could be a bit misleading, since it
  gives an impression that it's only about reads.

The "scanCols" columns are only what will need to be scanned in order
to execute a query, so, even if a column is being "used", it may not
be set in "scanCols" if it is not required to scan it.

For example, a column which does not need to be scanned but is "used"
-- e.g. in UPDATE x SET col = 2; "col" will not be in "scanCols" because
it is known that it will be 2.

That makes me think that maybe the function name,
extract_used_columns() is bad, though. Maybe extract_scan_columns()?
I tried this in the attached, updated patch.
 

* After a quick experiment, it seems that extract_used_columns is invoked for
  updates, but collects too many colums, e.g:

    create table test (id int primary key, a text, b text, c text);
    update test set a = 'something' where id = 1;

  collects into scanCols all columns (varattno from 1 to 4) + again the first
  column from baserestrictinfo. Is it correct?

For UPDATE, we need all of the columns in the table because of the
table_lock() API's current expectation that the slot has all of the
columns populated. If we want UPDATE to only need to insert the column
values which have changed, table_tuple_lock() will have to change.

Collecting columns from the baserestrictinfo is important for when
that column isn't present in another part of the query, but it won't
double count it in the bitmap (when it is already present).
 

* Not sure if it supposed to be covered by this functionality, but if we do

    insert ... on conflict (uniq_col) do update set other_col = 'something'

  and actually have to perform an update, extract_used_columns is not called.


For UPSERT, you are correct that it will not extract scan columns.
It wasn't by design. It is because that UPDATE is planned as part of
an INSERT.
For an INSERT, in query_planner(), because the jointree has only one
relation and that relation is an RTE_RESULT planner does not continue
on to make_one_rel() and thus doesn't extract scan columns.

This means that for INSERT ON CONFLICT DO UPDATE, "scanCols" is not
populated, however, since UPDATE needs to scan all of the columns
anyway, I don't think populating "scanCols" would have any impact.
This does mean that that bitmap would be different for a regular
UPDATE vs an UPSERT, however, I don't think that doing the extra work
to populate it makes sense if it won't be used. What do you think?
 
* Probably it also makes sense to check IS_DUMMY_REL in extract_used_columns?


I am wondering, when IS_DUMMY_REL is true for a relation, do we
reference the associated RTE later? It seems like if it is a dummy
rel, we wouldn't scan it. It still makes sense to add it to
extract_used_columns(), I think, to avoid any wasted loops through the
rel's expressions. Thanks for the idea!

--
Melanie Plageman
Attachment

Re: Extracting only the columns needed for a query

From
Melanie Plageman
Date:
I just wanted to address a question we got about making scanCols
instead of using RelOptInfo->attr_needed.

David Rowley had asked:

> For planning, isn't this information already available via either
> targetlists or from the RelOptInfo->attr_needed array combined with
> min_attr and max_attr?

attr_needed does not have all of the attributes needed set. Attributes
are only added in add_vars_to_targetlist() and this is only called for
certain query classes.

Also, Jeff Davis had asked off-list why we didn't start using
RelOptInfo->attr_needed for our purpose (marking which columns will
need to be scanned for the use of table access methods) instead of
adding a new member to RangeTblEntry.

The primary reason is that RelOptInfos are used during planning and
not execution. We need access to this information somewhere in the
PlannedStmt.

Even if we used attr_needed, we would, at some point, need to transfer
the columns needed to be scanned to a data structure available during
execution.

However, the next question was why not use attr_needed during costing
(which is the other time the table access method can use scanCols).

David Kimura and I dug into how attr_needed is used currently in
Postgres to understand if its meaning and usage is consistent with
using it instead of scanCols during costing.

We could set attr_needed in the same way we are now setting scanCols
and then use it during costing, however, besides the fact that we
would then have to create a member like scanCols anyway during
execution, it seems like adding an additional meaning to attr_needed
during planning is confusing and dangerous.

attr_needed is documented as being used to determine the highest
joinrel in which attribute is needed (when it was introduced
835bb975d8d).
Since then it has been extended to be used for removing joins and
relations from queries b78f6264eba33 and to determine if whole row
vars are used in a baserel (which isn't supported as a participant in
a partition-wise join) 7cfdc77023ad507317.

It actually seems like attr_needed might be too general a name for
this field.
It isn't set for every attribute in a query -- only in specific cases
for attributes in certain parts of the query. If a developer checks
attr_needed for the columns in his/her query, many times those
columns will not be present.
It might actually be better to rename attr_needed to clarify its
usage.
scanCols, on the other hand, has a clear meaning and usage. For table
access methods, scanCols are the columns that need to be scanned.
There might even be cases where this ends up being a different set
than attr_needed for a base rel.

Melanie & David

Re: Extracting only the columns needed for a query

From
Dmitry Dolgov
Date:
> On Thu, Jan 02, 2020 at 05:21:55PM -0800, Melanie Plageman wrote:
>
> That makes me think that maybe the function name,
> extract_used_columns() is bad, though. Maybe extract_scan_columns()?
> I tried this in the attached, updated patch.

Thanks, I'll take a look at the new version. Following your explanation
extract_scan_columns sounds better, but at the same time scan is pretty
broad term and one can probably misunderstand what kind of scan is that
in the name.

> For UPDATE, we need all of the columns in the table because of the
> table_lock() API's current expectation that the slot has all of the
> columns populated. If we want UPDATE to only need to insert the column
> values which have changed, table_tuple_lock() will have to change.

Can you please elaborate on this part? I'm probably missing something,
since I don't see immediately where are those expectations expressed.

> > AM callback relation_estimate_size exists currently which planner leverages.
> > Via this callback it fetches tuples, pages, etc.. So, our thought is to extend
> > this API if possible to pass down needed column and help perform better costing
> > for the query. Though we think if wish to leverage this function, need to know
> > list of columns before planning hence might need to use query tree.
>
> I believe it would be beneficial to add this potential API extension patch into
> the thread (as an example of an interface defining how scanCols could be used)
> and review them together.

I still find this question from my previous email interesting to explore.



Re: Extracting only the columns needed for a query

From
Melanie Plageman
Date:
I'm bumping this to the next commitfest because I haven't had a chance
to address the questions posed by Dmitry. I'm sure I'll get to it in
the next few weeks.

> I believe it would be beneficial to add this potential API extension patch into
> the thread (as an example of an interface defining how scanCols could be used)
> and review them together.

As for including some code that uses the scanCols, after discussing
off-list with a few folks, there are three options I would like to
pursue for doing this.

One option I will pursue is using the scanCols to inform the columns
needed to be spilled for memory-bounded hashagg (mentioned by Jeff
here [1]).

The second is potentially using the scanCols in the context of FDW.
Corey, would you happen to have any specific examples handy of when
this might be useful?

The third is exercising it with a test only but providing an example
of how a table AM API user like Zedstore uses the columns during
planning.

[1] https://www.postgresql.org/message-id/e5566f7def33a9e9fdff337cca32d07155d7b635.camel%40j-davis.com

--
Melanie Plageman

Re: Extracting only the columns needed for a query

From
Ashwin Agrawal
Date:

On Wed, Jan 15, 2020 at 7:54 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> For UPDATE, we need all of the columns in the table because of the
> table_lock() API's current expectation that the slot has all of the
> columns populated. If we want UPDATE to only need to insert the column
> values which have changed, table_tuple_lock() will have to change.

Can you please elaborate on this part? I'm probably missing something,
since I don't see immediately where are those expectations expressed.

The table_tuple_lock() has TupleTableSlot as output argument. Comment for the function states "*slot: contains the target tuple". Usage of table_tuple_lock() in places like ExecUpdate() and GetTupleForTrigger() use the returned tuple for EvalPlanQual. Also, it's unknown within table_tuple_lock() what is expectation and what would be performed on the returned slot/tuple. Hence, it becomes tricky for any AM currently to guess and hence need to return full tuple, as API doesn't state only which columns it would be interested in or just wishes to take the lock and needs nothing back in slot.

Re: Extracting only the columns needed for a query

From
Pengzhou Tang
Date:
> > On Sat, Jun 15, 2019 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Another reason for having the planner do this is that presumably, in
> > an AM that's excited about this, the set of fetched columns should
> > play into the cost estimates for the scan.  I've not been paying
> > enough attention to the tableam work to know if we've got hooks for
> > the AM to affect scan costing ... but if we don't, that seems like
> > a hole that needs plugged.
>
> AM callback relation_estimate_size exists currently which planner leverages.
> Via this callback it fetches tuples, pages, etc.. So, our thought is to extend
> this API if possible to pass down needed column and help perform better costing
> for the query. Though we think if wish to leverage this function, need to know
> list of columns before planning hence might need to use query tree.

I believe it would be beneficial to add this potential API extension patch into
the thread (as an example of an interface defining how scanCols could be used)
and review them together.

Thanks for your suggestion, we paste one potential API extension change bellow for zedstore to use scanCols.

The change contains 3 patches to clarify our idea.
0001-ANALYZE.patch is a generic patch for ANALYZE API extension, we develop it to make the
analysis of zedstore tables more accurate. It is more flexible now, eg, TableAm can provide
logical block number as random sample seed; TableAm can only analyze specified columns; TableAm
can provide extra info besides the data tuple.

0002-Planner.patch is the real patch to show how we use rte->scanCols for a cost estimate, the main idea
is adding a new metric 'stadiskfrac' to catalog pg_statistic, 'stadiskfrac' is the physical size ratio of a column,
it is calculated when ANALYZE is performed, 0001-ANALYZE.patch can help to provide extra disk size info.
So when set_plain_rel_size() is called by the planner, it uses rte->scanCols and 'stadiskfrac' to adjust the
rel->pages, please see set_plain_rel_page_estimates(). 

0003-ZedStore.patch is an example of how zedstore uses extended ANALYZE API, I paste it here anywhere, in case someone
is interest in it.

Thanks,
Pengzhou
Attachment

Re: Extracting only the columns needed for a query

From
Melanie Plageman
Date:

On Fri, Jan 31, 2020 at 9:52 AM Melanie Plageman <melanieplageman@gmail.com> wrote:
I'm bumping this to the next commitfest because I haven't had a chance
to address the questions posed by Dmitry. I'm sure I'll get to it in
the next few weeks.

> I believe it would be beneficial to add this potential API extension patch into
> the thread (as an example of an interface defining how scanCols could be used)
> and review them together.

As for including some code that uses the scanCols, after discussing
off-list with a few folks, there are three options I would like to
pursue for doing this.

One option I will pursue is using the scanCols to inform the columns
needed to be spilled for memory-bounded hashagg (mentioned by Jeff
here [1]).


The third is exercising it with a test only but providing an example
of how a table AM API user like Zedstore uses the columns during
planning.

Outside of the use case that Pengzhou has provided in [1], we started
looking into using scanCols for extracting the subset of columns
needed in two cases:

1) columns required to be spilled for memory-bounded hashagg
2) referenced CTE columns which must be materialized into tuplestore

However, implementing these optimization with the scanCols patch
wouldn't work with its current implementation.

The scanCols are extracted from PlannerInfo->simple_rel_array and
PlannerInfo->simple_rte_array, at which point, we have no way of
knowing if the column was aggregated or if it was part of a CTE or
anything else about how it is used in the query.

We could solve this by creating multiple bitmaps at the time that we
create the scanCols field -- one for aggregated columns, one for
unaggregated columns, one for CTEs. However, that seems like it would
add a lot of extra complexity to the common code path during planning.

Basically, scanCols are simply columns that need to be scanned. It is
probably okay if it is only used by table access method API users, as
Pengzhou's patch illustrates.

Given that we have addressed the feedback about showing a use case,
this patch is probably ready for a once over from Dmitry again. (It is
registered for the March fest).

[1] https://www.postgresql.org/message-id/CAG4reAQc9vYdmQXh%3D1D789x8XJ%3DgEkV%2BE%2BfT9%2Bs9tOWDXX3L9Q%40mail.gmail.com

--
Melanie Plageman

Re: Extracting only the columns needed for a query

From
Dmitry Dolgov
Date:
> On Tue, Feb 18, 2020 at 03:26:16PM -0800, Melanie Plageman wrote:
>
> > > I believe it would be beneficial to add this potential API extension
> > patch into
> > > the thread (as an example of an interface defining how scanCols could be
> > used)
> > > and review them together.
> >
> > As for including some code that uses the scanCols, after discussing
> > off-list with a few folks, there are three options I would like to
> > pursue for doing this.
> >
> > One option I will pursue is using the scanCols to inform the columns
> > needed to be spilled for memory-bounded hashagg (mentioned by Jeff
> > here [1]).
> >
> >
> > The third is exercising it with a test only but providing an example
> > of how a table AM API user like Zedstore uses the columns during
> > planning.
> >
>
> Basically, scanCols are simply columns that need to be scanned. It is
> probably okay if it is only used by table access method API users, as
> Pengzhou's patch illustrates.

Thanks for update. Sure, that would be fine. At the moment I have couple
of intermediate commentaries.

In general implemented functionality looks good. I've checked how it
works on the existing tests, almost everywhere required columns were not
missing in scanCols (which is probably the most important part).
Sometimes exressions were checked multiple times, which could
potentially introduce some overhead, but I believe this effect is
negligible. Just to mention some counterintuitive examples, for this
kind of query

    SELECT min(q1) FROM INT8_TBL;

the same column was checked 5 times in my tests, since it's present also
in baserestrictinfo, and build_minmax_path does one extra round of
planning and invoking make_one_rel. I've also noticed that for
partitioned tables every partition is evaluated separately. IIRC they
structure cannot differ, does it makes sense then? Another interesting
example is Values Scan (e.g. in an insert statements with multiple
records), can an abstract table AM user leverage information about
columns in it?

One case, where I believe columns were missing, is statements with
returning:

    INSERT INTO foo (col1)
      VALUES ('col1'), ('col2'), ('col3')
      RETURNING *;

Looks like in this situation there is only expression in reltarget is
for col1, but returning result contains all columns.

And just out of curiosity, what do you think about table AM specific
columns e.g. ctid, xmin/xmax etc? I mean, they're not included into
scanCols and should not be since they're heap AM related. But is there a
chance that there would be some AM specific columns relevant to e.g.
the columnar storage that would also make sense to put into scanCols?



Re: Extracting only the columns needed for a query

From
Melanie Plageman
Date:

On Fri, Mar 13, 2020 at 12:09 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
In general implemented functionality looks good. I've checked how it
works on the existing tests, almost everywhere required columns were not
missing in scanCols (which is probably the most important part).
Sometimes exressions were checked multiple times, which could
potentially introduce some overhead, but I believe this effect is
negligible. Just to mention some counterintuitive examples, for this
kind of query

    SELECT min(q1) FROM INT8_TBL;

the same column was checked 5 times in my tests, since it's present also
in baserestrictinfo, and build_minmax_path does one extra round of
planning and invoking make_one_rel.

Thanks so much for the very thorough review, Dmitry.

These extra calls to extract_scan_cols() should be okay in this case.
As you mentioned, for min/max queries, planner attempts an optimization
with an indexscan and, to do this, it modifies the querytree and then
calls query_planner() on it.
It tries it with NULLs first and then NULLs last -- each of which
invokes query_planner(), so that is two out of three calls. The
third is the normal invocation. I'm not sure how you would get five,
though.
 
Another interesting
example is Values Scan (e.g. in an insert statements with multiple
records), can an abstract table AM user leverage information about
columns in it?

One case, where I believe columns were missing, is statements with
returning:

    INSERT INTO foo (col1)
      VALUES ('col1'), ('col2'), ('col3')
      RETURNING *;

Looks like in this situation there is only expression in reltarget is
for col1, but returning result contains all columns.

This relates to both of your above points:
For this RETURNING query, it is a ValuesScan, so no columns have to be
scanned. We actually do add the reference to col1 to the scanCols
bitmap, though. I'm not sure we want to do so, since we don't want to
scan col1 in this case. I wonder what cases we would miss if we special
cased RTEKind RTE_VALUES when doing extract_scan_cols().

--
Melanie

Re: Extracting only the columns needed for a query

From
Melanie Plageman
Date:

On Fri, Mar 13, 2020 at 12:09 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>  I've also noticed that for partitioned tables every partition is
>  evaluated separately. IIRC they structure cannot differ, does it
>  makes sense then?

Good point. At some point, we had discussed only collecting the columns
for one of the child partitions and then using that for all partitions.

It is definitely worthwhile to try that optimization.

Another interesting
example is Values Scan (e.g. in an insert statements with multiple
records), can an abstract table AM user leverage information about
columns in it?

For ValuesScan, I can't see a use case yet for including those in
scanCols, since it is not required to scan the existing columns in the
table, but I am open to a use case.
 

One case, where I believe columns were missing, is statements with
returning:

    INSERT INTO foo (col1)
      VALUES ('col1'), ('col2'), ('col3')
      RETURNING *;

Looks like in this situation there is only expression in reltarget is
for col1, but returning result contains all columns.


So, you are correct, for INSERT and DELETE queries with RETURNING, the
scanCols should only include columns needed to INSERT or DELETE data.

For DELETE RETURNING, the RETURNING expression is executed separately in
ExecDelete, so scanCols should basically reflect only what needs to be
scanned to evaluate the qual.

For INSERT RETURNING, the scanCols should only reflect those columns
needing to be scanned to perform the INSERT.

There are several reasons why different kinds of RETURNING queries have
too many scanCols. Soumyadeep and I did some investigation into this.

Given:
t1(a, b)
t2(a, b, c)

For:
INSERT INTO t1(a) VALUES (1) RETURNING *;

There is no need to scan t1(a) in order to satisfy the RETURNING
expression here. Because this INSERT doesn't go through make_one_rel(),
scanCols shouldn't be populated.

For:
INSERT INTO t2 SELECT a,a,a FROM t1 RETURNING *;

For this INSERT, the patch correctly identifies that no columns from t2
need be scanned and only t1(a) needs be scanned.

For:
DELETE FROM t1 WHERE a = 2 RETURNING *;

scanCols correctly has only t1(a) which is needed to evaluate the qual.

For:
DELETE FROM t1 USING t2 where a = a RETURNING *;

scanCols should have t1(a) and t2(a), however, it has t1(a) and t2(a,b,c).

This is because in preprocess_targetlist(), all of the returningList
vars from t2 are added to the query tree processed_tlist to make sure
the RETURNING expression can be evaluated later.

However, the query tree processed_tlist items for each rel seem to be
added to the RelOptInfo for t2 later, so, in extract_scan_columns(), we
mistakenly add these to the scanCols.

This is tricky to change because we don't want to change what gets added
to the RelOptInfo->reltarget->exprs (I think), so, we'd need some way to
know which columns are from the RETURNING expression, which are from the
qual, etc. And, we'd need to know the query was a DELETE (since an
UPDATE would need all the columns anyway with the current API, for
example). This is pretty different than the current logic in
extract_scan_cols().

A separate issue with DELETE USING RETURNING queries scanCols arises
with partition tables.

Given this additional table:
t(a, b, c) partition table with partitions
tp1(a, b, c) and
tp2(a, b, c)

the same query above with different relations
DELETE FROM t USING t1 WHERE a = a RETURNING *;

scanCols will say it requires t(a,b,c) and t1(a,b) (all of the columns
in both relations).
t1 columns are wrong for the same reason as in the non-partition example
query described above.

However, the partition table scanCols are wrong for a related but
different reason. This, however, is much more easily fixed. The same
code in to add returningList to processed_tlist for a querytree in
preprocess_targetlist() doesn't handle the case of an inherited table or
partition child (since their result_relation is set to imitate a SELECT
instead of an UPDATE/DELETE/INSERT). I started a different thread on
this topic [1].

Lastly is UPDATE...RETURNING queries.
UPDATE queries will require scan of all of the columns with the current
tuple_table_lock() API (mentioned upthread).
 
And just out of curiosity, what do you think about table AM specific
columns e.g. ctid, xmin/xmax etc? I mean, they're not included into
scanCols and should not be since they're heap AM related. But is there a
chance that there would be some AM specific columns relevant to e.g.
the columnar storage that would also make sense to put into scanCols?

Maybe tid? But, I'm not sure. What do you think?

Re: Extracting only the columns needed for a query

From
Soumyadeep Chakraborty
Date:
Hello,

Melanie and me will be posting a separate thread with the scanCols patch
listed here, a patch to capture the set of cols in RETURNING and a group
of patches to pass down the list of cols to various table AM functions
together as a patch set. This will take some time. Thus, we are
deregistering this patch for the July commitfest and will come back.

Regards,
Soumyadeep (VMware)