Re: Statistics Import and Export - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Statistics Import and Export
Date
Msg-id 1457469.1740419458@sss.pgh.pa.us
Whole thread Raw
In response to Re: Statistics Import and Export  (Andres Freund <andres@anarazel.de>)
Responses Re: Statistics Import and Export
Re: Statistics Import and Export
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2025-02-24 05:11:48 -0500, Corey Huinker wrote:
>> * relpages/reltuples/relallvisible are now char[32] buffers in RelStatsInfo
>> and nowhere else (existing relpages conversion remains, however)

> I don't see the point. This will use more memory and if we can't get
> conversions between integers and strings right we have much bigger
> problems. The same code was used in the backend too!

I don't like that either.  But there's a bigger problem with 0002:
it's still got mostly table-driven output.  I've been working on
fixing the problem discussed over in the -committers thread about how
we need to identify index-expression columns by number not name [1].
It's not too awful in the backend (WIP patch attached), but
getting appendAttStatsImport to do it seems like a complete disaster,
and this patch fails to make that any easier.  It'd be much better
if you gave up on that table-driven business and just open-coded the
handling of the successive output values as was discussed upthread.

I don't think the table-driven approach has anything to recommend it
anyway.  It requires keeping att_stats_arginfo[] in sync with the
query in getAttStatsExportQuery, an extremely nonobvious (and
undocumented) connection.  Personally I would nuke the separate
getAttStatsExportQuery and appendAttStatsImport functions altogether,
and have one function that executes a query and immediately interprets
the PGresult.

Also, while working on the attached, I couldn't help forming the
opinion that we'd be better off to nuke pg_set_attribute_stats()
from orbit and require people to use pg_restore_attribute_stats().
pg_set_attribute_stats() would be fine if we had a way to force
people to call it with only named-argument notation, but we don't.
So I'm afraid that its existence will encourage people to rely
on a specific parameter order, and then they'll whine if we
add/remove/reorder parameters, as indeed I had to do below.

BTW, I pushed the 0003 patch with minor adjustments.

            regards, tom lane

[1] https://www.postgresql.org/message-id/816167.1740278884%40sss.pgh.pa.us

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9f60a476eb..ad59e3be9d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -30302,6 +30302,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
          <function>pg_set_attribute_stats</function> (
          <parameter>relation</parameter> <type>regclass</type>,
          <parameter>attname</parameter> <type>name</type>,
+         <parameter>attnum</parameter> <type>integer</type>,
          <parameter>inherited</parameter> <type>boolean</type>
          <optional>, <parameter>null_frac</parameter> <type>real</type></optional>
          <optional>, <parameter>avg_width</parameter> <type>integer</type></optional>
@@ -30318,14 +30319,17 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
         </para>
         <para>
          Creates or updates attribute-level statistics for the given relation
-         and attribute name to the specified values. The parameters correspond
+         and attribute name (or number) to the specified values. The
+         parameters correspond
          to attributes of the same name found in the <link
          linkend="view-pg-stats"><structname>pg_stats</structname></link>
          view.
         </para>
         <para>
          Optional parameters default to <literal>NULL</literal>, which leave
-         the corresponding statistic unchanged.
+         the corresponding statistic unchanged.  Exactly one
+         of <parameter>attname</parameter> and <parameter>attnum</parameter>
+         must be non-<literal>NULL</literal>.
         </para>
         <para>
          Ordinarily, these statistics are collected automatically or updated
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 591157b1d1..876500824e 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -648,8 +648,9 @@ AS 'pg_set_relation_stats';

 CREATE OR REPLACE FUNCTION
   pg_set_attribute_stats(relation regclass,
-                         attname name,
-                         inherited bool,
+                         attname name DEFAULT NULL,
+                         attnum integer DEFAULT NULL,
+                         inherited bool DEFAULT NULL,
                          null_frac real DEFAULT NULL,
                          avg_width integer DEFAULT NULL,
                          n_distinct real DEFAULT NULL,
diff --git a/src/backend/statistics/attribute_stats.c b/src/backend/statistics/attribute_stats.c
index c0c398a4bb..4886f79611 100644
--- a/src/backend/statistics/attribute_stats.c
+++ b/src/backend/statistics/attribute_stats.c
@@ -38,6 +38,7 @@ enum attribute_stats_argnum
 {
     ATTRELATION_ARG = 0,
     ATTNAME_ARG,
+    ATTNUM_ARG,
     INHERITED_ARG,
     NULL_FRAC_ARG,
     AVG_WIDTH_ARG,
@@ -59,6 +60,7 @@ static struct StatsArgInfo attarginfo[] =
 {
     [ATTRELATION_ARG] = {"relation", REGCLASSOID},
     [ATTNAME_ARG] = {"attname", NAMEOID},
+    [ATTNUM_ARG] = {"attnum", INT4OID},
     [INHERITED_ARG] = {"inherited", BOOLOID},
     [NULL_FRAC_ARG] = {"null_frac", FLOAT4OID},
     [AVG_WIDTH_ARG] = {"avg_width", INT4OID},
@@ -76,6 +78,22 @@ static struct StatsArgInfo attarginfo[] =
     [NUM_ATTRIBUTE_STATS_ARGS] = {0}
 };

+enum clear_attribute_stats_argnum
+{
+    C_ATTRELATION_ARG = 0,
+    C_ATTNAME_ARG,
+    C_INHERITED_ARG,
+    C_NUM_ATTRIBUTE_STATS_ARGS
+};
+
+static struct StatsArgInfo cleararginfo[] =
+{
+    [C_ATTRELATION_ARG] = {"relation", REGCLASSOID},
+    [C_ATTNAME_ARG] = {"attname", NAMEOID},
+    [C_INHERITED_ARG] = {"inherited", BOOLOID},
+    [C_NUM_ATTRIBUTE_STATS_ARGS] = {0}
+};
+
 static bool attribute_statistics_update(FunctionCallInfo fcinfo, int elevel);
 static Node *get_attr_expr(Relation rel, int attnum);
 static void get_attr_stat_type(Oid reloid, AttrNumber attnum, int elevel,
@@ -116,9 +134,9 @@ static bool
 attribute_statistics_update(FunctionCallInfo fcinfo, int elevel)
 {
     Oid            reloid;
-    Name        attname;
-    bool        inherited;
+    char       *attname;
     AttrNumber    attnum;
+    bool        inherited;

     Relation    starel;
     HeapTuple    statup;
@@ -164,21 +182,51 @@ attribute_statistics_update(FunctionCallInfo fcinfo, int elevel)
     /* lock before looking up attribute */
     stats_lock_check_privileges(reloid);

-    stats_check_required_arg(fcinfo, attarginfo, ATTNAME_ARG);
-    attname = PG_GETARG_NAME(ATTNAME_ARG);
-    attnum = get_attnum(reloid, NameStr(*attname));
+    /* user can specify either attname or attnum, but not both */
+    if (!PG_ARGISNULL(ATTNAME_ARG))
+    {
+        Name        attnamename;
+
+        if (!PG_ARGISNULL(ATTNUM_ARG))
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("must specify one of attname and attnum")));
+        attnamename = PG_GETARG_NAME(ATTNAME_ARG);
+        attname = NameStr(*attnamename);
+        attnum = get_attnum(reloid, attname);
+        /* note that this test covers attisdropped cases too: */
+        if (attnum == InvalidAttrNumber)
+            ereport(ERROR,
+                    (errcode(ERRCODE_UNDEFINED_COLUMN),
+                     errmsg("column \"%s\" of relation \"%s\" does not exist",
+                            attname, get_rel_name(reloid))));
+    }
+    else if (!PG_ARGISNULL(ATTNUM_ARG))
+    {
+        attnum = PG_GETARG_INT32(ATTNUM_ARG);
+        attname = get_attname(reloid, attnum, true);
+        /* Annoyingly, get_attname doesn't check attisdropped */
+        if (attname == NULL ||
+            !SearchSysCacheExistsAttName(reloid, attname))
+            ereport(ERROR,
+                    (errcode(ERRCODE_UNDEFINED_COLUMN),
+                     errmsg("column %d of relation \"%s\" does not exist",
+                            attnum, get_rel_name(reloid))));
+    }
+    else
+    {
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("must specify one of attname and attnum")));
+        attname = NULL;            /* keep compiler quiet */
+        attnum = 0;
+    }

     if (attnum < 0)
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                  errmsg("cannot modify statistics on system column \"%s\"",
-                        NameStr(*attname))));
-
-    if (attnum == InvalidAttrNumber)
-        ereport(ERROR,
-                (errcode(ERRCODE_UNDEFINED_COLUMN),
-                 errmsg("column \"%s\" of relation \"%s\" does not exist",
-                        NameStr(*attname), get_rel_name(reloid))));
+                        attname)));

     stats_check_required_arg(fcinfo, attarginfo, INHERITED_ARG);
     inherited = PG_GETARG_BOOL(INHERITED_ARG);
@@ -245,7 +293,7 @@ attribute_statistics_update(FunctionCallInfo fcinfo, int elevel)
                                 &elemtypid, &elem_eq_opr))
         {
             ereport(elevel,
-                    (errmsg("unable to determine element type of attribute \"%s\"", NameStr(*attname)),
+                    (errmsg("unable to determine element type of attribute \"%s\"", attname),
                      errdetail("Cannot set STATISTIC_KIND_MCELEM or STATISTIC_KIND_DECHIST.")));
             elemtypid = InvalidOid;
             elem_eq_opr = InvalidOid;
@@ -261,7 +309,7 @@ attribute_statistics_update(FunctionCallInfo fcinfo, int elevel)
     {
         ereport(elevel,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("could not determine less-than operator for attribute \"%s\"", NameStr(*attname)),
+                 errmsg("could not determine less-than operator for attribute \"%s\"", attname),
                  errdetail("Cannot set STATISTIC_KIND_HISTOGRAM or STATISTIC_KIND_CORRELATION.")));

         do_histogram = false;
@@ -275,7 +323,7 @@ attribute_statistics_update(FunctionCallInfo fcinfo, int elevel)
     {
         ereport(elevel,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("attribute \"%s\" is not a range type", NameStr(*attname)),
+                 errmsg("attribute \"%s\" is not a range type", attname),
                  errdetail("Cannot set STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM or STATISTIC_KIND_BOUNDS_HISTOGRAM.")));

         do_bounds_histogram = false;
@@ -855,8 +903,8 @@ init_empty_stats_tuple(Oid reloid, int16 attnum, bool inherited,
  * Import statistics for a given relation attribute.
  *
  * Inserts or replaces a row in pg_statistic for the given relation and
- * attribute name. It takes input parameters that correspond to columns in the
- * view pg_stats.
+ * attribute name or number. It takes input parameters that correspond to
+ * columns in the view pg_stats.
  *
  * Parameters null_frac, avg_width, and n_distinct all correspond to NOT NULL
  * columns in pg_statistic. The remaining parameters all belong to a specific
@@ -889,8 +937,8 @@ pg_clear_attribute_stats(PG_FUNCTION_ARGS)
     AttrNumber    attnum;
     bool        inherited;

-    stats_check_required_arg(fcinfo, attarginfo, ATTRELATION_ARG);
-    reloid = PG_GETARG_OID(ATTRELATION_ARG);
+    stats_check_required_arg(fcinfo, cleararginfo, C_ATTRELATION_ARG);
+    reloid = PG_GETARG_OID(C_ATTRELATION_ARG);

     if (RecoveryInProgress())
         ereport(ERROR,
@@ -900,8 +948,8 @@ pg_clear_attribute_stats(PG_FUNCTION_ARGS)

     stats_lock_check_privileges(reloid);

-    stats_check_required_arg(fcinfo, attarginfo, ATTNAME_ARG);
-    attname = PG_GETARG_NAME(ATTNAME_ARG);
+    stats_check_required_arg(fcinfo, cleararginfo, C_ATTNAME_ARG);
+    attname = PG_GETARG_NAME(C_ATTNAME_ARG);
     attnum = get_attnum(reloid, NameStr(*attname));

     if (attnum < 0)
@@ -916,8 +964,8 @@ pg_clear_attribute_stats(PG_FUNCTION_ARGS)
                  errmsg("column \"%s\" of relation \"%s\" does not exist",
                         NameStr(*attname), get_rel_name(reloid))));

-    stats_check_required_arg(fcinfo, attarginfo, INHERITED_ARG);
-    inherited = PG_GETARG_BOOL(INHERITED_ARG);
+    stats_check_required_arg(fcinfo, cleararginfo, C_INHERITED_ARG);
+    inherited = PG_GETARG_BOOL(C_INHERITED_ARG);

     delete_pg_statistic(reloid, attnum, inherited);
     PG_RETURN_VOID();
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index af9546de23..ce714b1fd1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12433,8 +12433,8 @@
   descr => 'set statistics on attribute',
   proname => 'pg_set_attribute_stats', provolatile => 'v', proisstrict => 'f',
   proparallel => 'u', prorettype => 'void',
-  proargtypes => 'regclass name bool float4 int4 float4 text _float4 text float4 text _float4 _float4 text float4
text',
-  proargnames =>
'{relation,attname,inherited,null_frac,avg_width,n_distinct,most_common_vals,most_common_freqs,histogram_bounds,correlation,most_common_elems,most_common_elem_freqs,elem_count_histogram,range_length_histogram,range_empty_frac,range_bounds_histogram}',
+  proargtypes => 'regclass name int4 bool float4 int4 float4 text _float4 text float4 text _float4 _float4 text float4
text',
+  proargnames =>
'{relation,attname,attnum,inherited,null_frac,avg_width,n_distinct,most_common_vals,most_common_freqs,histogram_bounds,correlation,most_common_elems,most_common_elem_freqs,elem_count_histogram,range_length_histogram,range_empty_frac,range_bounds_histogram}',
   prosrc => 'pg_set_attribute_stats' },
 { oid => '9163',
   descr => 'clear statistics on attribute',
diff --git a/src/test/regress/expected/stats_import.out b/src/test/regress/expected/stats_import.out
index 0e8491131e..a020ff015d 100644
--- a/src/test/regress/expected/stats_import.out
+++ b/src/test/regress/expected/stats_import.out
@@ -364,7 +364,7 @@ SELECT pg_catalog.pg_set_attribute_stats(
     null_frac => 0.1::real,
     avg_width => 2::integer,
     n_distinct => 0.3::real);
-ERROR:  "attname" cannot be NULL
+ERROR:  must specify one of attname and attnum
 -- error: inherited null
 SELECT pg_catalog.pg_set_attribute_stats(
     relation => 'stats_import.test'::regclass,
@@ -968,7 +968,7 @@ SELECT pg_catalog.pg_restore_attribute_stats(
     'null_frac', 0.1::real,
     'avg_width', 2::integer,
     'n_distinct', 0.3::real);
-ERROR:  "attname" cannot be NULL
+ERROR:  must specify one of attname and attnum
 -- error: attname doesn't exist
 SELECT pg_catalog.pg_restore_attribute_stats(
     'relation', 'stats_import.test'::regclass,

pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Next
From: Corey Huinker
Date:
Subject: Re: Statistics Import and Export