Thread: Fix to not check included columns in ANALYZE on indexes
Hi, I found that both key columns and included columns are checked when analyze is run on indexes. This is almost harmless because non-expression columns are not processed. However, this check is obviously unnecessary and we can fix this to not check included columns. If we decide to support expressions in included columns in future, this must be fixed eventually. Attached is a patch to fix this. Regards, -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
Yugo Nagata <nagata@sraoss.co.jp> writes: > I found that both key columns and included columns are checked when analyze > is run on indexes. This is almost harmless because non-expression columns > are not processed. However, this check is obviously unnecessary and we > can fix this to not check included columns. If we decide to support expressions > in included columns in future, this must be fixed eventually. AFAICS, we'd just have to revert this patch later, so I don't see much value in it. Also, is it really true that we don't support included expression columns now? In what way would that not be a bug? regards, tom lane
On June 28, 2018 4:18:36 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Yugo Nagata <nagata@sraoss.co.jp> writes: >> I found that both key columns and included columns are checked when >analyze >> is run on indexes. This is almost harmless because non-expression >columns >> are not processed. However, this check is obviously unnecessary and >we >> can fix this to not check included columns. If we decide to support >expressions >> in included columns in future, this must be fixed eventually. > >AFAICS, we'd just have to revert this patch later, so I don't see >much value in it. > >Also, is it really true that we don't support included expression >columns now? In what way would that not be a bug? I don't think IOS supports expression columns, right? Away from code for a bit, so can't check. If indeed true, there'dbe little point in allowing it, right? Andres Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund <andres@anarazel.de> writes: > On June 28, 2018 4:18:36 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Also, is it really true that we don't support included expression >> columns now? In what way would that not be a bug? > I don't think IOS supports expression columns, right? Away from code for a bit, so can't check. If indeed true, there'dbe little point in allowing it, right? The point of ANALYZE on an expression column is that you can direct ANALYZE to collect stats on that expression. This is potentially valuable for rowcount estimation whether or not the planner notices that it can fetch the expression value from the index, or chooses to do so even if it did notice. (In principle, CREATE STATISTICS might someday obsolete this use-case for expression indexes, but it hasn't done so yet AFAIK.) regards, tom lane
On June 28, 2018 4:28:39 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Andres Freund <andres@anarazel.de> writes: >> On June 28, 2018 4:18:36 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Also, is it really true that we don't support included expression >>> columns now? In what way would that not be a bug? > >> I don't think IOS supports expression columns, right? Away from code >for a bit, so can't check. If indeed true, there'd be little point in >allowing it, right? > >The point of ANALYZE on an expression column is that you can direct >ANALYZE to collect stats on that expression. This is potentially >valuable >for rowcount estimation whether or not the planner notices that it can >fetch the expression value from the index, or chooses to do so even if >it >did notice. The whole point of including additional columns in the index is that they allow IOSs. It seems more likely that people willexpect that included expressions actually are usable, than enlarging the index just to get a bit better stats. >(In principle, CREATE STATISTICS might someday obsolete this use-case >for expression indexes, but it hasn't done so yet AFAIK.) You mean stats on them, or the feature entirely? If the latter, how? Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund <andres@anarazel.de> writes: > On June 28, 2018 4:28:39 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (In principle, CREATE STATISTICS might someday obsolete this use-case >> for expression indexes, but it hasn't done so yet AFAIK.) > You mean stats on them, or the feature entirely? If the latter, how? No, just the collection-of-stats-on-an-expression part. regards, tom lane
On Thu, 28 Jun 2018 19:18:36 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yugo Nagata <nagata@sraoss.co.jp> writes: > > I found that both key columns and included columns are checked when analyze > > is run on indexes. This is almost harmless because non-expression columns > > are not processed. However, this check is obviously unnecessary and we > > can fix this to not check included columns. If we decide to support expressions > > in included columns in future, this must be fixed eventually. > > AFAICS, we'd just have to revert this patch later, so I don't see > much value in it. I'm sorry but I don't understand why we'd just have to revert this patch later. Do you mean that if we decide to support expressions in included columns in future, this patch would be reverted? This is wrong. To my understanding, statistics on included (= non-key) columns in index is never used by the planner whether this is expression or not. So, we don't have to examin these columns in ANALYZE. > Also, is it really true that we don't support included expression > columns now? In what way would that not be a bug? Currently, included expression columns are not supported. postgres=# create index on test(i) include ((d+1)); ERROR: expressions are not supported in included columns > > regards, tom lane > -- Yugo Nagata <nagata@sraoss.co.jp>
> AFAICS, we'd just have to revert this patch later, so I don't see > much value in it. True, I suppose we should apply this patch just for consistency, because we don't allow expression in included columns. > > Also, is it really true that we don't support included expression > columns now? In what way would that not be a bug? Because index only scan doesn't support expression index and, hence, expressions in included columns are not useful. It could be changed in future but, suppose, not in v11. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Fri, Jun 29, 2018 at 1:31 AM, Yugo Nagata <nagata@sraoss.co.jp> wrote: > I'm sorry but I don't understand why we'd just have to revert this patch later. > > Do you mean that if we decide to support expressions in included columns in future, > this patch would be reverted? This is wrong. To my understanding, statistics on > included (= non-key) columns in index is never used by the planner whether this > is expression or not. So, we don't have to examin these columns in ANALYZE. I think that the argument Tom is making is that it might be useful to have statistics on the expression regardless of this -- the expression may be interesting in some general sense. For example, one can imagine the planner creating a plan with a hash aggregate rather than a group aggregate, but only when statistics on an expression are available, somehow. Those statistics may only be available because of the appearance of the expression as a non-key attribute (in a future world where expressions as non-key attributes are accepted, and index-only scans work with expressions). In the past, we talked about adding a feature that made this happen for expressions without requiring any index at all, which Tom referred to in passing. That's why I understand his remarks in this way. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > I think that the argument Tom is making is that it might be useful to > have statistics on the expression regardless of this -- the expression > may be interesting in some general sense. For example, one can imagine > the planner creating a plan with a hash aggregate rather than a group > aggregate, but only when statistics on an expression are available, > somehow. Right. For instance, "select sum(x) from ... group by y+z" is only suitable for hash aggregation if we can predict that there's a fairly small number of distinct values of y+z. This makes it useful to have stats on the expression y+z, independently of whether any related index actually gets used in the plan. regards, tom lane
On Sat, 30 Jun 2018 14:13:49 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@bowt.ie> writes: > > I think that the argument Tom is making is that it might be useful to > > have statistics on the expression regardless of this -- the expression > > may be interesting in some general sense. For example, one can imagine > > the planner creating a plan with a hash aggregate rather than a group > > aggregate, but only when statistics on an expression are available, > > somehow. > > Right. For instance, "select sum(x) from ... group by y+z" is only > suitable for hash aggregation if we can predict that there's a fairly > small number of distinct values of y+z. This makes it useful to have > stats on the expression y+z, independently of whether any related index > actually gets used in the plan. Thank you for your explanation. I understand the usefulness of the statistics on non-key expression attributions and that "CREATE INDEX ... INCLUDE" migth be a means to collect the statistics on "non-key" expressions in future. > > regards, tom lane > -- Yugo Nagata <nagata@sraoss.co.jp>
On Fri, 29 Jun 2018 17:31:51 +0300 Teodor Sigaev <teodor@sigaev.ru> wrote: > > AFAICS, we'd just have to revert this patch later, so I don't see > > much value in it. > True, I suppose we should apply this patch just for consistency, because we > don't allow expression in included columns. Yes, this is what I intend in my patch, but I don't persist in this if there is a reason to leave the code as it is, since the current code is alomot harmless. Thanks, -- Yugo Nagata <nagata@sraoss.co.jp>