Re: BUG #17220: ALTER INDEX ALTER COLUMN SET (..) with an optionless opclass makes index and table unusable - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: BUG #17220: ALTER INDEX ALTER COLUMN SET (..) with an optionless opclass makes index and table unusable
Date
Msg-id YWeQ2UXvXUH/Gt4T@paquier.xyz
Whole thread Raw
In response to Re: BUG #17220: ALTER INDEX ALTER COLUMN SET (..) with an optionless opclass makes index and table unusable  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: BUG #17220: ALTER INDEX ALTER COLUMN SET (..) with an optionless opclass makes index and table unusable  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Wed, Oct 13, 2021 at 05:20:56PM +0000, Bossart, Nathan wrote:
> AFAICT the fact that these commands can succeed at all seems to be
> unintentional, and I wonder if modifying these options requires extra
> steps such as rebuilding the index.

I was looking at all this business with more attention, and this code
block is standing out in analyze.c:
/*
 * Now we can compute the statistics for the expression columns.
 */
if (numindexrows > 0)
{
    MemoryContextSwitchTo(col_context);
    for (i = 0; i < attr_cnt; i++)
    {
        VacAttrStats *stats = thisdata->vacattrstats[i];
        AttributeOpts *aopt =
        get_attribute_options(stats->attr->attrelid,
                              stats->attr->attnum);

        stats->exprvals = exprvals + i;
        stats->exprnulls = exprnulls + i;
        stats->rowstride = attr_cnt;
        stats->compute_stats(stats,
                             ind_fetch_func,
                             numindexrows,
                             totalindexrows);

        /*
         * If the n_distinct option is specified, it overrides the
         * above computation.  For indices, we always use just
         * n_distinct, not n_distinct_inherited.
         */
        if (aopt != NULL && aopt->n_distinct != 0.0)
            stats->stadistinct = aopt->n_distinct;

        MemoryContextResetAndDeleteChildren(col_context);
    }
}

When computing statistics on an index expression, this code means that
we would grab the value of n_distinct from the *index* if set and
force the stats to use it, and not use what the parent table has.  For
example, say:
create table aa (a int);
insert into aa values (generate_series(1,1000));
create index aai on aa((a+a)) where a > 500;
alter index aai alter column expr set (n_distinct = 2);
analyze aa; -- n_distinct forced to 2.0 for the index stats

This code comes from 76a47c0 back in 2010.  In PG <= 12, this would
work, but that does not as of 13~.  Enforcing n_distinct for index
attributes was discussed back when this code was introduced:
https://www.postgresql.org/message-id/603c8f071001101127w3253899vb3f3e15073638774@mail.gmail.com

This means that we've lost the ability to enforce n_distinct for
expression indexes for two years.  But, do we really care about this
case?  My answer to that would be "no" as long as we don't have a
documented grammar rather, and we don't dump them either.  But I think
that we'd better do something with the code in analyze.c rather than
letting it just dead, and my take is that we should remove the call to
get_attribute_options() for this code path.

Any opinions?  @Robert: you were involved in 76a47c0, so I am adding
you in CC.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns