Re: Bogus collation version recording in recordMultipleDependencies - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Bogus collation version recording in recordMultipleDependencies
Date
Msg-id 40190.1618605579@sss.pgh.pa.us
Whole thread Raw
In response to Re: Bogus collation version recording in recordMultipleDependencies  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bogus collation version recording in recordMultipleDependencies  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
I wrote:
> I feel like this is telling us that there's a fundamental
> misunderstanding in find_expr_references_walker about which
> collation dependencies to report.  It's reporting all the
> leaf-node collations, and ignoring the ones that actually
> count semantically, that is the inputcollid fields of
> function and operator nodes.
> Not sure what's the best thing to do here.  Redesigning
> this post-feature-freeze doesn't seem terribly appetizing,
> but on the other hand, this index collation recording
> feature has put a premium on not overstating the collation
> dependencies of an expression.  We don't want to tell users
> that an index is broken when it isn't really.

I felt less hesitant to modify find_expr_references_walker's
behavior w.r.t. collations after realizing that most of it
was not of long standing, but came in with 257836a75.
So here's a draft patch that redesigns it as suggested above.
Along the way I discovered that GetTypeCollations was quite
broken for ranges and arrays, so this fixes that too.

Per the changes in collate.icu.utf8.out, this gets rid of
a lot of imaginary collation dependencies, but it also gets
rid of some arguably-real ones.  In particular, calls of
record_eq and its siblings will be considered not to have
any collation dependencies, although we know that internally
those will look up per-column collations of their input types.
We could imagine special-casing record_eq etc here, but that
sure seems like a hack.

I"m starting to have a bad feeling about 257836a75 overall.
As I think I've complained before, I do not like anything about
what it's done to pg_depend; it's forcing that relation to serve
two masters, neither one well.  We now see that the same remark
applies to find_expr_references(), because the semantics of
"which collations does this expression's behavior depend on" aren't
identical to "which collations need to be recorded as direct
dependencies of this expression", especially not if you'd prefer
to minimize either list.  (Which is important.)  Moreover, for all
the complexity it's introducing, it's next door to useless for
glibc collations --- we might as well tell people "reindex
everything when your glibc version changes", which could be done
with a heck of a lot less infrastructure.  The situation on Windows
looks pretty user-unfriendly as well, per the other thread.

So I wonder if, rather than continuing to pursue this right now,
we shouldn't revert 257836a75 and try again later with a new design
that doesn't try to commandeer the existing dependency infrastructure.
We might have a better idea about what to do on Windows by the time
that's done, too.

            regards, tom lane

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 8d8e926c21..41093ea6ae 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1835,8 +1835,17 @@ recordDependencyOnSingleRelExpr(const ObjectAddress *depender,
  * the datatype.  However we do need a type dependency if there is no such
  * indirect dependency, as for example in Const and CoerceToDomain nodes.
  *
- * Similarly, we don't need to create dependencies on collations except where
- * the collation is being freshly introduced to the expression.
+ * Collations are handled primarily by recording the inputcollid's of node
+ * types that have them, as those are the ones that are semantically
+ * significant during expression evaluation.  We also record the collation of
+ * CollateExpr nodes, since those will be needed to print such nodes even if
+ * they don't really affect semantics.  Collations of leaf nodes such as Vars
+ * can be ignored on the grounds that if they're not default, they came from
+ * the referenced object (e.g., a table column), so the dependency on that
+ * object is enough.  (Note: in a post-const-folding expression tree, a
+ * CollateExpr's collation could have been absorbed into a Const or
+ * RelabelType node.  While ruleutils.c prints such collations for clarity,
+ * we may ignore them here as they have no semantic effect.)
  */
 static bool
 find_expr_references_walker(Node *node,
@@ -1876,29 +1885,6 @@ find_expr_references_walker(Node *node,
             /* If it's a plain relation, reference this column */
             add_object_address(OCLASS_CLASS, rte->relid, var->varattno,
                                context->addrs);
-
-            /* Top-level collation if valid */
-            if (OidIsValid(var->varcollid))
-                add_object_address(OCLASS_COLLATION, var->varcollid, 0,
-                                   context->addrs);
-            /* Otherwise, it may be a type with internal collations */
-            else if (var->vartype >= FirstNormalObjectId)
-            {
-                List       *collations;
-                ListCell   *lc;
-
-                collations = GetTypeCollations(var->vartype);
-
-                foreach(lc, collations)
-                {
-                    Oid            coll = lfirst_oid(lc);
-
-                    if (OidIsValid(coll))
-                        add_object_address(OCLASS_COLLATION,
-                                           lfirst_oid(lc), 0,
-                                           context->addrs);
-                }
-            }
         }

         /*
@@ -1920,15 +1906,6 @@ find_expr_references_walker(Node *node,
         add_object_address(OCLASS_TYPE, con->consttype, 0,
                            context->addrs);

-        /*
-         * We must also depend on the constant's collation: it could be
-         * different from the datatype's, if a CollateExpr was const-folded to
-         * a simple constant.
-         */
-        if (OidIsValid(con->constcollid))
-            add_object_address(OCLASS_COLLATION, con->constcollid, 0,
-                               context->addrs);
-
         /*
          * If it's a regclass or similar literal referring to an existing
          * object, add a reference to that object.  (Currently, only the
@@ -2013,10 +1990,6 @@ find_expr_references_walker(Node *node,
         /* A parameter must depend on the parameter's datatype */
         add_object_address(OCLASS_TYPE, param->paramtype, 0,
                            context->addrs);
-        /* and its collation, just as for Consts */
-        if (OidIsValid(param->paramcollid))
-            add_object_address(OCLASS_COLLATION, param->paramcollid, 0,
-                               context->addrs);
     }
     else if (IsA(node, FuncExpr))
     {
@@ -2024,6 +1997,9 @@ find_expr_references_walker(Node *node,

         add_object_address(OCLASS_PROC, funcexpr->funcid, 0,
                            context->addrs);
+        if (OidIsValid(funcexpr->inputcollid))
+            add_object_address(OCLASS_COLLATION, funcexpr->inputcollid, 0,
+                               context->addrs);
         /* fall through to examine arguments */
     }
     else if (IsA(node, OpExpr))
@@ -2032,6 +2008,9 @@ find_expr_references_walker(Node *node,

         add_object_address(OCLASS_OPERATOR, opexpr->opno, 0,
                            context->addrs);
+        if (OidIsValid(opexpr->inputcollid))
+            add_object_address(OCLASS_COLLATION, opexpr->inputcollid, 0,
+                               context->addrs);
         /* fall through to examine arguments */
     }
     else if (IsA(node, DistinctExpr))
@@ -2040,6 +2019,9 @@ find_expr_references_walker(Node *node,

         add_object_address(OCLASS_OPERATOR, distinctexpr->opno, 0,
                            context->addrs);
+        if (OidIsValid(distinctexpr->inputcollid))
+            add_object_address(OCLASS_COLLATION, distinctexpr->inputcollid, 0,
+                               context->addrs);
         /* fall through to examine arguments */
     }
     else if (IsA(node, NullIfExpr))
@@ -2048,6 +2030,9 @@ find_expr_references_walker(Node *node,

         add_object_address(OCLASS_OPERATOR, nullifexpr->opno, 0,
                            context->addrs);
+        if (OidIsValid(nullifexpr->inputcollid))
+            add_object_address(OCLASS_COLLATION, nullifexpr->inputcollid, 0,
+                               context->addrs);
         /* fall through to examine arguments */
     }
     else if (IsA(node, ScalarArrayOpExpr))
@@ -2056,6 +2041,9 @@ find_expr_references_walker(Node *node,

         add_object_address(OCLASS_OPERATOR, opexpr->opno, 0,
                            context->addrs);
+        if (OidIsValid(opexpr->inputcollid))
+            add_object_address(OCLASS_COLLATION, opexpr->inputcollid, 0,
+                               context->addrs);
         /* fall through to examine arguments */
     }
     else if (IsA(node, Aggref))
@@ -2064,6 +2052,9 @@ find_expr_references_walker(Node *node,

         add_object_address(OCLASS_PROC, aggref->aggfnoid, 0,
                            context->addrs);
+        if (OidIsValid(aggref->inputcollid))
+            add_object_address(OCLASS_COLLATION, aggref->inputcollid, 0,
+                               context->addrs);
         /* fall through to examine arguments */
     }
     else if (IsA(node, WindowFunc))
@@ -2072,6 +2063,9 @@ find_expr_references_walker(Node *node,

         add_object_address(OCLASS_PROC, wfunc->winfnoid, 0,
                            context->addrs);
+        if (OidIsValid(wfunc->inputcollid))
+            add_object_address(OCLASS_COLLATION, wfunc->inputcollid, 0,
+                               context->addrs);
         /* fall through to examine arguments */
     }
     else if (IsA(node, SubscriptingRef))
@@ -2116,10 +2110,6 @@ find_expr_references_walker(Node *node,
         else
             add_object_address(OCLASS_TYPE, fselect->resulttype, 0,
                                context->addrs);
-        /* the collation might not be referenced anywhere else, either */
-        if (OidIsValid(fselect->resultcollid))
-            add_object_address(OCLASS_COLLATION, fselect->resultcollid, 0,
-                               context->addrs);
     }
     else if (IsA(node, FieldStore))
     {
@@ -2146,10 +2136,6 @@ find_expr_references_walker(Node *node,
         /* since there is no function dependency, need to depend on type */
         add_object_address(OCLASS_TYPE, relab->resulttype, 0,
                            context->addrs);
-        /* the collation might not be referenced anywhere else, either */
-        if (OidIsValid(relab->resultcollid))
-            add_object_address(OCLASS_COLLATION, relab->resultcollid, 0,
-                               context->addrs);
     }
     else if (IsA(node, CoerceViaIO))
     {
@@ -2158,10 +2144,6 @@ find_expr_references_walker(Node *node,
         /* since there is no exposed function, need to depend on type */
         add_object_address(OCLASS_TYPE, iocoerce->resulttype, 0,
                            context->addrs);
-        /* the collation might not be referenced anywhere else, either */
-        if (OidIsValid(iocoerce->resultcollid))
-            add_object_address(OCLASS_COLLATION, iocoerce->resultcollid, 0,
-                               context->addrs);
     }
     else if (IsA(node, ArrayCoerceExpr))
     {
@@ -2170,10 +2152,6 @@ find_expr_references_walker(Node *node,
         /* as above, depend on type */
         add_object_address(OCLASS_TYPE, acoerce->resulttype, 0,
                            context->addrs);
-        /* the collation might not be referenced anywhere else, either */
-        if (OidIsValid(acoerce->resultcollid))
-            add_object_address(OCLASS_COLLATION, acoerce->resultcollid, 0,
-                               context->addrs);
         /* fall through to examine arguments */
     }
     else if (IsA(node, ConvertRowtypeExpr))
@@ -2213,6 +2191,24 @@ find_expr_references_walker(Node *node,
             add_object_address(OCLASS_OPFAMILY, lfirst_oid(l), 0,
                                context->addrs);
         }
+        foreach(l, rcexpr->inputcollids)
+        {
+            Oid            inputcollid = lfirst_oid(l);
+
+            if (OidIsValid(inputcollid))
+                add_object_address(OCLASS_COLLATION, inputcollid, 0,
+                                   context->addrs);
+        }
+        /* fall through to examine arguments */
+    }
+    else if (IsA(node, MinMaxExpr))
+    {
+        MinMaxExpr *mmexpr = (MinMaxExpr *) node;
+
+        /* minmaxtype will match one of the inputs, so no need to record it */
+        if (OidIsValid(mmexpr->inputcollid))
+            add_object_address(OCLASS_COLLATION, mmexpr->inputcollid, 0,
+                               context->addrs);
         /* fall through to examine arguments */
     }
     else if (IsA(node, CoerceToDomain))
diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c
index ec5d122432..22f289d9f8 100644
--- a/src/backend/catalog/pg_type.c
+++ b/src/backend/catalog/pg_type.c
@@ -564,13 +564,19 @@ GetTypeCollations(Oid typeoid)
     }
     else if (typeTup->typtype == TYPTYPE_RANGE)
     {
-        Oid            rangeid = get_range_subtype(typeTup->oid);
+        Oid            rangecoll = get_range_collation(typeTup->oid);

-        Assert(OidIsValid(rangeid));
+        if (OidIsValid(rangecoll))
+            result = list_append_unique_oid(result, rangecoll);
+        else
+        {
+            Oid            rangeid = get_range_subtype(typeTup->oid);

-        result = list_concat_unique_oid(result, GetTypeCollations(rangeid));
+            Assert(OidIsValid(rangeid));
+            result = list_concat_unique_oid(result, GetTypeCollations(rangeid));
+        }
     }
-    else if (OidIsValid(typeTup->typelem))
+    else if (IsTrueArrayType(typeTup))
         result = list_concat_unique_oid(result,
                                         GetTypeCollations(typeTup->typelem));

diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 779405ef32..faf99f76b5 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1958,7 +1958,7 @@ CREATE DOMAIN d_custom AS t_custom;
 CREATE COLLATION custom (
     LOCALE = 'fr-x-icu', PROVIDER = 'icu'
 );
-CREATE TYPE myrange AS range (subtype = text);
+CREATE TYPE myrange AS range (subtype = text, collation = "POSIX");
 CREATE TYPE myrange_en_fr_ga AS range(subtype = t_en_fr_ga);
 CREATE TABLE collate_test (
     id integer,
@@ -2054,36 +2054,17 @@ ORDER BY 1, 2, 3;
  icuidx05_d_en_fr_ga_arr           | "en-x-icu" | up to date
  icuidx05_d_en_fr_ga_arr           | "fr-x-icu" | up to date
  icuidx05_d_en_fr_ga_arr           | "ga-x-icu" | up to date
- icuidx06_d_en_fr_ga               | "default"  | XXX
- icuidx06_d_en_fr_ga               | "en-x-icu" | up to date
  icuidx06_d_en_fr_ga               | "fr-x-icu" | up to date
- icuidx06_d_en_fr_ga               | "ga-x-icu" | up to date
- icuidx07_d_en_fr_ga               | "default"  | XXX
- icuidx07_d_en_fr_ga               | "en-x-icu" | up to date
- icuidx07_d_en_fr_ga               | "fr-x-icu" | up to date
  icuidx07_d_en_fr_ga               | "ga-x-icu" | up to date
- icuidx08_d_en_fr_ga               | "en-x-icu" | up to date
- icuidx08_d_en_fr_ga               | "fr-x-icu" | up to date
- icuidx08_d_en_fr_ga               | "ga-x-icu" | up to date
- icuidx09_d_en_fr_ga               | "en-x-icu" | up to date
- icuidx09_d_en_fr_ga               | "fr-x-icu" | up to date
- icuidx09_d_en_fr_ga               | "ga-x-icu" | up to date
- icuidx10_d_en_fr_ga_es            | "en-x-icu" | up to date
  icuidx10_d_en_fr_ga_es            | "es-x-icu" | up to date
- icuidx10_d_en_fr_ga_es            | "fr-x-icu" | up to date
- icuidx10_d_en_fr_ga_es            | "ga-x-icu" | up to date
- icuidx11_d_es                     | "default"  | XXX
  icuidx11_d_es                     | "es-x-icu" | up to date
- icuidx12_custom                   | "default"  | XXX
  icuidx12_custom                   | custom     | up to date
- icuidx13_custom                   | "default"  | XXX
  icuidx13_custom                   | custom     | up to date
- icuidx14_myrange                  | "default"  | XXX
  icuidx15_myrange_en_fr_ga         | "en-x-icu" | up to date
  icuidx15_myrange_en_fr_ga         | "fr-x-icu" | up to date
  icuidx15_myrange_en_fr_ga         | "ga-x-icu" | up to date
  icuidx17_part                     | "en-x-icu" | up to date
-(57 rows)
+(38 rows)

 -- Validate that REINDEX will update the stored version.
 UPDATE pg_depend SET refobjversion = 'not a version'
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 7f40c56039..4c71f4d249 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -755,7 +755,7 @@ CREATE COLLATION custom (
     LOCALE = 'fr-x-icu', PROVIDER = 'icu'
 );

-CREATE TYPE myrange AS range (subtype = text);
+CREATE TYPE myrange AS range (subtype = text, collation = "POSIX");
 CREATE TYPE myrange_en_fr_ga AS range(subtype = t_en_fr_ga);

 CREATE TABLE collate_test (

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: default_tablespace doc and partitioned rels
Next
From: Andrew Dunstan
Date:
Subject: Re: Error when defining a set returning function