Thread: Fix to not check included columns in ANALYZE on indexes

Fix to not check included columns in ANALYZE on indexes

From
Yugo Nagata
Date:
Hi,

I found that both key columns and included columns are checked when analyze 
is run on indexes. This is almost harmless because non-expression columns
are not processed. However, this check is obviously unnecessary and we
can fix this to not check included columns. If we decide to support expressions
in included columns in future, this must be fixed eventually.

Attached is a patch to fix this.

Regards,

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Attachment

Re: Fix to not check included columns in ANALYZE on indexes

From
Tom Lane
Date:
Yugo Nagata <nagata@sraoss.co.jp> writes:
> I found that both key columns and included columns are checked when analyze
> is run on indexes. This is almost harmless because non-expression columns
> are not processed. However, this check is obviously unnecessary and we
> can fix this to not check included columns. If we decide to support expressions
> in included columns in future, this must be fixed eventually.

AFAICS, we'd just have to revert this patch later, so I don't see
much value in it.

Also, is it really true that we don't support included expression
columns now?  In what way would that not be a bug?

            regards, tom lane


Re: Fix to not check included columns in ANALYZE on indexes

From
Andres Freund
Date:

On June 28, 2018 4:18:36 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Yugo Nagata <nagata@sraoss.co.jp> writes:
>> I found that both key columns and included columns are checked when
>analyze
>> is run on indexes. This is almost harmless because non-expression
>columns
>> are not processed. However, this check is obviously unnecessary and
>we
>> can fix this to not check included columns. If we decide to support
>expressions
>> in included columns in future, this must be fixed eventually.
>
>AFAICS, we'd just have to revert this patch later, so I don't see
>much value in it.
>
>Also, is it really true that we don't support included expression
>columns now?  In what way would that not be a bug?

I don't think IOS supports expression columns, right?  Away from code for a bit, so can't check.  If indeed true,
there'dbe little point in allowing it, right? 

Andres

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Fix to not check included columns in ANALYZE on indexes

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On June 28, 2018 4:18:36 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, is it really true that we don't support included expression
>> columns now?  In what way would that not be a bug?

> I don't think IOS supports expression columns, right?  Away from code for a bit, so can't check.  If indeed true,
there'dbe little point in allowing it, right? 

The point of ANALYZE on an expression column is that you can direct
ANALYZE to collect stats on that expression.  This is potentially valuable
for rowcount estimation whether or not the planner notices that it can
fetch the expression value from the index, or chooses to do so even if it
did notice.

(In principle, CREATE STATISTICS might someday obsolete this use-case
for expression indexes, but it hasn't done so yet AFAIK.)

            regards, tom lane


Re: Fix to not check included columns in ANALYZE on indexes

From
Andres Freund
Date:

On June 28, 2018 4:28:39 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Andres Freund <andres@anarazel.de> writes:
>> On June 28, 2018 4:18:36 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Also, is it really true that we don't support included expression
>>> columns now?  In what way would that not be a bug?
>
>> I don't think IOS supports expression columns, right?  Away from code
>for a bit, so can't check.  If indeed true, there'd be little point in
>allowing it, right?
>
>The point of ANALYZE on an expression column is that you can direct
>ANALYZE to collect stats on that expression.  This is potentially
>valuable
>for rowcount estimation whether or not the planner notices that it can
>fetch the expression value from the index, or chooses to do so even if
>it
>did notice.

The whole point of including additional columns in the index is that they allow IOSs. It seems more likely that people
willexpect that included expressions actually are usable, than enlarging the index just to get a bit better stats. 


>(In principle, CREATE STATISTICS might someday obsolete this use-case
>for expression indexes, but it hasn't done so yet AFAIK.)

You mean stats on them, or the feature entirely? If the latter, how?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Fix to not check included columns in ANALYZE on indexes

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On June 28, 2018 4:28:39 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (In principle, CREATE STATISTICS might someday obsolete this use-case
>> for expression indexes, but it hasn't done so yet AFAIK.)

> You mean stats on them, or the feature entirely? If the latter, how?

No, just the collection-of-stats-on-an-expression part.

            regards, tom lane


Re: Fix to not check included columns in ANALYZE on indexes

From
Yugo Nagata
Date:
On Thu, 28 Jun 2018 19:18:36 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Yugo Nagata <nagata@sraoss.co.jp> writes:
> > I found that both key columns and included columns are checked when analyze 
> > is run on indexes. This is almost harmless because non-expression columns
> > are not processed. However, this check is obviously unnecessary and we
> > can fix this to not check included columns. If we decide to support expressions
> > in included columns in future, this must be fixed eventually.
> 
> AFAICS, we'd just have to revert this patch later, so I don't see
> much value in it.

I'm sorry but I don't understand why we'd just have to revert this patch later.

Do you mean that if we decide to support expressions in included columns in future,
this patch would be reverted? This is wrong. To my understanding, statistics on
included  (= non-key) columns in index is never used by the planner whether this
is expression or not. So, we don't have to examin these columns in ANALYZE.

> Also, is it really true that we don't support included expression
> columns now?  In what way would that not be a bug?

Currently, included expression columns are not supported.

 postgres=# create index on test(i) include ((d+1));
 ERROR:  expressions are not supported in included columns

> 
>             regards, tom lane
> 


-- 
Yugo Nagata <nagata@sraoss.co.jp>


Re: Fix to not check included columns in ANALYZE on indexes

From
Teodor Sigaev
Date:
> AFAICS, we'd just have to revert this patch later, so I don't see
> much value in it.
True, I suppose we should apply this patch just for consistency, because we 
don't allow expression in included columns.

> 
> Also, is it really true that we don't support included expression
> columns now?  In what way would that not be a bug?
Because index only scan doesn't support expression index and, hence, expressions 
in included columns are not useful. It could be changed in future but, suppose, 
not in v11.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Re: Fix to not check included columns in ANALYZE on indexes

From
Peter Geoghegan
Date:
On Fri, Jun 29, 2018 at 1:31 AM, Yugo Nagata <nagata@sraoss.co.jp> wrote:
> I'm sorry but I don't understand why we'd just have to revert this patch later.
>
> Do you mean that if we decide to support expressions in included columns in future,
> this patch would be reverted? This is wrong. To my understanding, statistics on
> included  (= non-key) columns in index is never used by the planner whether this
> is expression or not. So, we don't have to examin these columns in ANALYZE.

I think that the argument Tom is making is that it might be useful to
have statistics on the expression regardless of this -- the expression
may be interesting in some general sense. For example, one can imagine
the planner creating a plan with a hash aggregate rather than a group
aggregate, but only when statistics on an expression are available,
somehow. Those statistics may only be available because of the
appearance of the expression as a non-key attribute (in a future world
where expressions as non-key attributes are accepted, and index-only
scans work with expressions).

In the past, we talked about adding a feature that made this happen
for expressions without requiring any index at all, which Tom referred
to in passing. That's why I understand his remarks in this way.

-- 
Peter Geoghegan


Re: Fix to not check included columns in ANALYZE on indexes

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> I think that the argument Tom is making is that it might be useful to
> have statistics on the expression regardless of this -- the expression
> may be interesting in some general sense. For example, one can imagine
> the planner creating a plan with a hash aggregate rather than a group
> aggregate, but only when statistics on an expression are available,
> somehow.

Right.  For instance, "select sum(x) from ... group by y+z" is only
suitable for hash aggregation if we can predict that there's a fairly
small number of distinct values of y+z.  This makes it useful to have
stats on the expression y+z, independently of whether any related index
actually gets used in the plan.

            regards, tom lane


Re: Fix to not check included columns in ANALYZE on indexes

From
Yugo Nagata
Date:
On Sat, 30 Jun 2018 14:13:49 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Peter Geoghegan <pg@bowt.ie> writes:
> > I think that the argument Tom is making is that it might be useful to
> > have statistics on the expression regardless of this -- the expression
> > may be interesting in some general sense. For example, one can imagine
> > the planner creating a plan with a hash aggregate rather than a group
> > aggregate, but only when statistics on an expression are available,
> > somehow.
> 
> Right.  For instance, "select sum(x) from ... group by y+z" is only
> suitable for hash aggregation if we can predict that there's a fairly
> small number of distinct values of y+z.  This makes it useful to have
> stats on the expression y+z, independently of whether any related index
> actually gets used in the plan.

Thank you for your explanation. I understand the usefulness of the statistics 
on non-key expression attributions and that "CREATE INDEX ... INCLUDE" migth be
a means to collect the statistics on "non-key" expressions in future.

> 
>             regards, tom lane
> 


-- 
Yugo Nagata <nagata@sraoss.co.jp>


Re: Fix to not check included columns in ANALYZE on indexes

From
Yugo Nagata
Date:
On Fri, 29 Jun 2018 17:31:51 +0300
Teodor Sigaev <teodor@sigaev.ru> wrote:

> > AFAICS, we'd just have to revert this patch later, so I don't see
> > much value in it.
> True, I suppose we should apply this patch just for consistency, because we 
> don't allow expression in included columns.

Yes, this is what I intend in my patch, but I don't persist in this if there
is a reason to leave the code as it is, since the current code is alomot harmless.

Thanks,


-- 
Yugo Nagata <nagata@sraoss.co.jp>