Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: PoC/WIP: Extended statistics on expressions
Date
Msg-id CAEZATCXsfFcXW_A1XCd5QK6zT2M_V+P3aPSktjibHHPzoV+Y-A@mail.gmail.com
Whole thread Raw
In response to Re: PoC/WIP: Extended statistics on expressions  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: PoC/WIP: Extended statistics on expressions  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Thu, 25 Mar 2021 at 19:59, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Attached is an updated patch series, with all the changes discussed
> here. I've cleaned up the ndistinct stuff a bit more (essentially
> reverting back from GroupExprInfo to GroupVarInfo name), and got rid of
> the UpdateStatisticsForTypeChange.
>

I've looked over all that and I think it's in pretty good shape. I
particularly like how much simpler the ndistinct code has now become.

Some (hopefully final) review comments:

1). I don't think index.c is the right place for
StatisticsGetRelation(). I appreciate that it is very similar to the
adjacent IndexGetRelation() function, but index.c is really only for
index-related code, so I think StatisticsGetRelation() should go in
statscmds.c

2). Perhaps the error message at statscmds.c:293 should read

   "expression cannot be used in multivariate statistics because its
type %s has no default btree operator class"

(i.e., add the word "multivariate", since the same expression *can* be
used in univariate statistics even though it has no less-than
operator).

3). The comment for ATExecAddStatistics() should probably mention that
"ALTER TABLE ADD STATISTICS" isn't a command in the grammar, in a
similar way to other similar functions, e.g.:

/*
 * ALTER TABLE ADD STATISTICS
 *
 * This is no such command in the grammar, but we use this internally to add
 * AT_ReAddStatistics subcommands to rebuild extended statistics after a table
 * column type change.
 */

4). The comment at the start of ATPostAlterTypeParse() needs updating
to mention CREATE STATISTICS statements.

5). I think ATPostAlterTypeParse() should also attempt to preserve any
COMMENTs attached to statistics objects, i.e., something like:

--- src/backend/commands/tablecmds.c.orig    2021-03-26 10:39:38.328631864 +0000
+++ src/backend/commands/tablecmds.c    2021-03-26 10:47:21.042279580 +0000
@@ -12619,6 +12619,9 @@
             CreateStatsStmt  *stmt = (CreateStatsStmt *) stm;
             AlterTableCmd *newcmd;

+            /* keep the statistics object's comment */
+            stmt->stxcomment = GetComment(oldId, StatisticExtRelationId, 0);
+
             newcmd = makeNode(AlterTableCmd);
             newcmd->subtype = AT_ReAddStatistics;
             newcmd->def = (Node *) stmt;

6). Comment typo at extended_stats.c:2532 - s/statitics/statistics/

7). I don't think that the big XXX comment near the start of
estimate_multivariate_ndistinct() is really relevant anymore, now that
the code has been simplified and we no longer extract Vars from
expressions, so perhaps it can just be deleted.

Regards,
Dean



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [PATCH] add concurrent_abort callback for output plugin
Next
From: Markus Wanner
Date:
Subject: Re: [PATCH] add concurrent_abort callback for output plugin