Thread: pg_upgrade fails to detect unsupported arrays and ranges

pg_upgrade fails to detect unsupported arrays and ranges

From
Tom Lane
Date:
While composing the release note entry for commits 8d48e6a72 et al
(handle recursive type dependencies while checking for unsupported
types in pg_upgrade), I realized that there's a huge hole in
pg_upgrade's test for such cases.  It looks for domains containing
the unsupported type, and for composites containing it, but not
for arrays or ranges containing it.  It's definitely possible to
create tables containing arrays of lines, or arrays of composites
containing line, etc etc.  A range over line is harder for lack of
a btree opclass, but a range over sql_identifier is possible.

The attached patches fix this.  0001 refactors the code in question
so that we have only one copy not three-and-growing.  The only
difference between the three copies was that one case didn't bother
to search indexes, but I judged that that wasn't an optimization we
need to preserve.  (Note: this patch is shown with --ignore-space-change
to make it more reviewable, but I did re-pgindent the code.)  Then
0002 actually adds the array and range cases.

Although this is a really straightforward patch and I've tested it
against appropriate old versions (9.1 and 9.2), I'm very hesitant
to shove it in so soon before a release wrap.  Should I do that, or
let it wait till after the wrap?

            regards, tom lane

diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 9deb53c..0dff96b 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -97,27 +97,27 @@ new_9_0_populate_pg_largeobject_metadata(ClusterInfo *cluster, bool check_mode)


 /*
- * old_9_3_check_for_line_data_type_usage()
- *    9.3 -> 9.4
- *    Fully implement the 'line' data type in 9.4, which previously returned
- *    "not enabled" by default and was only functionally enabled with a
- *    compile-time switch;  9.4 "line" has different binary and text
- *    representation formats;  checks tables and indexes.
+ * check_for_data_type_usage
+ *    Detect whether there are any stored columns depending on the given type
+ *
+ * If so, write a report to the given file name, and return true.
+ *
+ * We check for the type in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
  */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+static bool
+check_for_data_type_usage(ClusterInfo *cluster, const char *typename,
+                          char *output_path)
 {
-    int            dbnum;
-    FILE       *script = NULL;
     bool        found = false;
-    char        output_path[MAXPGPATH];
-
-    prep_status("Checking for incompatible \"line\" data type");
-
-    snprintf(output_path, sizeof(output_path), "tables_using_line.txt");
+    FILE       *script = NULL;
+    int            dbnum;

     for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
     {
+        DbInfo       *active_db = &cluster->dbarr.dbs[dbnum];
+        PGconn       *conn = connectToServer(cluster, active_db->db_name);
+        PQExpBufferData querybuf;
         PGresult   *res;
         bool        db_used = false;
         int            ntups;
@@ -125,26 +125,26 @@ old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
         int            i_nspname,
                     i_relname,
                     i_attname;
-        DbInfo       *active_db = &cluster->dbarr.dbs[dbnum];
-        PGconn       *conn = connectToServer(cluster, active_db->db_name);

         /*
-         * The pg_catalog.line type may be wrapped in a domain or composite
-         * type, or both (9.3 did not allow domains on composite types, but
-         * there may be multi-level composite type). To detect these cases
-         * we need a recursive CTE.
+         * The type of interest might be wrapped in a domain, array,
+         * composite, or range, and these container types can be nested (to
+         * varying extents depending on server version, but that's not of
+         * concern here).  To handle all these cases we need a recursive CTE.
          */
-        res = executeQueryOrDie(conn,
+        initPQExpBuffer(&querybuf);
+        appendPQExpBuffer(&querybuf,
                           "WITH RECURSIVE oids AS ( "
-        /* the pg_catalog.line type itself */
-                                "    SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid "
+        /* the target type itself */
+                          "    SELECT '%s'::pg_catalog.regtype AS oid "
                           "    UNION ALL "
                           "    SELECT * FROM ( "
-        /* domains on the type */
+        /* inner WITH because we can only reference the CTE once */
                           "        WITH x AS (SELECT oid FROM oids) "
+        /* domains on any type selected so far */
                           "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype
='d' " 
-                                "            UNION "
-        /* composite types containing the type */
+                          "            UNION ALL "
+        /* composite types containing any type selected so far */
                           "            SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c,
pg_catalog.pg_attributea, x " 
                           "            WHERE t.typtype = 'c' AND "
                           "                  t.oid = c.reltype AND "
@@ -153,6 +153,7 @@ old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
                           "                  a.atttypid = x.oid "
                           "    ) foo "
                           ") "
+        /* now look for stored columns of any such type */
                           "SELECT n.nspname, c.relname, a.attname "
                           "FROM    pg_catalog.pg_class c, "
                           "        pg_catalog.pg_namespace n, "
@@ -168,7 +169,10 @@ old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
         /* exclude possible orphaned temp tables */
                           "        n.nspname !~ '^pg_temp_' AND "
                           "        n.nspname !~ '^pg_toast_temp_' AND "
-                                "        n.nspname NOT IN ('pg_catalog', 'information_schema')");
+                          "        n.nspname NOT IN ('pg_catalog', 'information_schema')",
+                          typename);
+
+        res = executeQueryOrDie(conn, "%s", querybuf.data);

         ntups = PQntuples(res);
         i_nspname = PQfnumber(res, "nspname");
@@ -193,13 +197,36 @@ old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)

         PQclear(res);

+        termPQExpBuffer(&querybuf);
+
         PQfinish(conn);
     }

     if (script)
         fclose(script);

-    if (found)
+    return found;
+}
+
+
+/*
+ * old_9_3_check_for_line_data_type_usage()
+ *    9.3 -> 9.4
+ *    Fully implement the 'line' data type in 9.4, which previously returned
+ *    "not enabled" by default and was only functionally enabled with a
+ *    compile-time switch; as of 9.4 "line" has a different on-disk
+ *    representation format.
+ */
+void
+old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+{
+    char        output_path[MAXPGPATH];
+
+    prep_status("Checking for incompatible \"line\" data type");
+
+    snprintf(output_path, sizeof(output_path), "tables_using_line.txt");
+
+    if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
     {
         pg_log(PG_REPORT, "fatal\n");
         pg_fatal("Your installation contains the \"line\" data type in user tables.  This\n"
@@ -226,105 +253,17 @@ old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
  *    mid-upgrade.  Worse, if there's a matview with such a column, the
  *    DDL reload will silently change it to "text" which won't match the
  *    on-disk storage (which is like "cstring").  So we *must* reject that.
- *    Also check composite types and domains on the "unknwown" type (even
- *    combinations of both), in case they are used for table columns.
- *    We needn't check indexes, because "unknown" has no opclasses.
  */
 void
 old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
 {
-    int            dbnum;
-    FILE       *script = NULL;
-    bool        found = false;
     char        output_path[MAXPGPATH];

     prep_status("Checking for invalid \"unknown\" user columns");

     snprintf(output_path, sizeof(output_path), "tables_using_unknown.txt");

-    for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-    {
-        PGresult   *res;
-        bool        db_used = false;
-        int            ntups;
-        int            rowno;
-        int            i_nspname,
-                    i_relname,
-                    i_attname;
-        DbInfo       *active_db = &cluster->dbarr.dbs[dbnum];
-        PGconn       *conn = connectToServer(cluster, active_db->db_name);
-
-        /*
-         * The pg_catalog.unknown type may be wrapped in a domain or composite
-         * type, or both (9.3 did not allow domains on composite types, but
-         * there may be multi-level composite type). To detect these cases
-         * we need a recursive CTE.
-         */
-        res = executeQueryOrDie(conn,
-                                "WITH RECURSIVE oids AS ( "
-        /* the pg_catalog.unknown type itself */
-                                "    SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid "
-                                "    UNION ALL "
-                                "    SELECT * FROM ( "
-        /* domains on the type */
-                                "        WITH x AS (SELECT oid FROM oids) "
-                                "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND
typtype= 'd' " 
-                                "            UNION "
-        /* composite types containing the type */
-                                "            SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c,
pg_catalog.pg_attributea, x " 
-                                "            WHERE t.typtype = 'c' AND "
-                                "                  t.oid = c.reltype AND "
-                                "                  c.oid = a.attrelid AND "
-                                "                  NOT a.attisdropped AND "
-                                "                  a.atttypid = x.oid "
-                                "    ) foo "
-                                ") "
-                                "SELECT n.nspname, c.relname, a.attname "
-                                "FROM    pg_catalog.pg_class c, "
-                                "        pg_catalog.pg_namespace n, "
-                                "        pg_catalog.pg_attribute a "
-                                "WHERE    c.oid = a.attrelid AND "
-                                "        NOT a.attisdropped AND "
-                                "        a.atttypid IN (SELECT oid FROM oids) AND "
-                                "        c.relkind IN ("
-                                CppAsString2(RELKIND_RELATION) ", "
-                                CppAsString2(RELKIND_MATVIEW) ") AND "
-                                "        c.relnamespace = n.oid AND "
-        /* exclude possible orphaned temp tables */
-                                "        n.nspname !~ '^pg_temp_' AND "
-                                "        n.nspname !~ '^pg_toast_temp_' AND "
-                                "        n.nspname NOT IN ('pg_catalog', 'information_schema')");
-
-        ntups = PQntuples(res);
-        i_nspname = PQfnumber(res, "nspname");
-        i_relname = PQfnumber(res, "relname");
-        i_attname = PQfnumber(res, "attname");
-        for (rowno = 0; rowno < ntups; rowno++)
-        {
-            found = true;
-            if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-                pg_fatal("could not open file \"%s\": %s\n", output_path,
-                         strerror(errno));
-            if (!db_used)
-            {
-                fprintf(script, "In database: %s\n", active_db->db_name);
-                db_used = true;
-            }
-            fprintf(script, "  %s.%s.%s\n",
-                    PQgetvalue(res, rowno, i_nspname),
-                    PQgetvalue(res, rowno, i_relname),
-                    PQgetvalue(res, rowno, i_attname));
-        }
-
-        PQclear(res);
-
-        PQfinish(conn);
-    }
-
-    if (script)
-        fclose(script);
-
-    if (found)
+    if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
     {
         pg_log(PG_REPORT, "fatal\n");
         pg_fatal("Your installation contains the \"unknown\" data type in user tables.  This\n"
@@ -456,105 +395,18 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
  *    which does affect the storage (name is by-ref, but not varlena). This
  *    means user tables using sql_identifier for columns are broken because
  *    the on-disk format is different.
- *
- *    We need to check all objects that might store sql_identifier on disk,
- *    i.e. tables, matviews and indexes. Also check composite types in case
- *    they are used in this context.
  */
 void
 old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
 {
-    int            dbnum;
-    FILE       *script = NULL;
-    bool        found = false;
     char        output_path[MAXPGPATH];

     prep_status("Checking for invalid \"sql_identifier\" user columns");

     snprintf(output_path, sizeof(output_path), "tables_using_sql_identifier.txt");

-    for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-    {
-        PGresult   *res;
-        bool        db_used = false;
-        int            ntups;
-        int            rowno;
-        int            i_nspname,
-                    i_relname,
-                    i_attname;
-        DbInfo       *active_db = &cluster->dbarr.dbs[dbnum];
-        PGconn       *conn = connectToServer(cluster, active_db->db_name);
-
-        /*
-         * We need the recursive CTE because the sql_identifier may be wrapped
-         * either in a domain or composite type, or both (in arbitrary order).
-         */
-        res = executeQueryOrDie(conn,
-                                "WITH RECURSIVE oids AS ( "
-        /* the sql_identifier type itself */
-                                "    SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid "
-                                "    UNION ALL "
-                                "    SELECT * FROM ( "
-        /* domains on the type */
-                                "        WITH x AS (SELECT oid FROM oids) "
-                                "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND
typtype= 'd' " 
-                                "            UNION "
-        /* composite types containing the type */
-                                "            SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c,
pg_catalog.pg_attributea, x " 
-                                "            WHERE t.typtype = 'c' AND "
-                                "                  t.oid = c.reltype AND "
-                                "                  c.oid = a.attrelid AND "
-                                "                  NOT a.attisdropped AND "
-                                "                  a.atttypid = x.oid "
-                                "    ) foo "
-                                ") "
-                                "SELECT n.nspname, c.relname, a.attname "
-                                "FROM    pg_catalog.pg_class c, "
-                                "        pg_catalog.pg_namespace n, "
-                                "        pg_catalog.pg_attribute a "
-                                "WHERE    c.oid = a.attrelid AND "
-                                "        NOT a.attisdropped AND "
-                                "        a.atttypid IN (SELECT oid FROM oids) AND "
-                                "        c.relkind IN ("
-                                CppAsString2(RELKIND_RELATION) ", "
-                                CppAsString2(RELKIND_MATVIEW) ", "
-                                CppAsString2(RELKIND_INDEX) ") AND "
-                                "        c.relnamespace = n.oid AND "
-        /* exclude possible orphaned temp tables */
-                                "        n.nspname !~ '^pg_temp_' AND "
-                                "        n.nspname !~ '^pg_toast_temp_' AND "
-                                "        n.nspname NOT IN ('pg_catalog', 'information_schema')");
-
-        ntups = PQntuples(res);
-        i_nspname = PQfnumber(res, "nspname");
-        i_relname = PQfnumber(res, "relname");
-        i_attname = PQfnumber(res, "attname");
-        for (rowno = 0; rowno < ntups; rowno++)
-        {
-            found = true;
-            if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-                pg_fatal("could not open file \"%s\": %s\n", output_path,
-                         strerror(errno));
-            if (!db_used)
-            {
-                fprintf(script, "In database: %s\n", active_db->db_name);
-                db_used = true;
-            }
-            fprintf(script, "  %s.%s.%s\n",
-                    PQgetvalue(res, rowno, i_nspname),
-                    PQgetvalue(res, rowno, i_relname),
-                    PQgetvalue(res, rowno, i_attname));
-        }
-
-        PQclear(res);
-
-        PQfinish(conn);
-    }
-
-    if (script)
-        fclose(script);
-
-    if (found)
+    if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
+                                  output_path))
     {
         pg_log(PG_REPORT, "fatal\n");
         pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables\n"
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 0dff96b..a4d9f7b 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -144,13 +144,27 @@ check_for_data_type_usage(ClusterInfo *cluster, const char *typename,
         /* domains on any type selected so far */
                           "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype
='d' " 
                           "            UNION ALL "
+        /* arrays over any type selected so far */
+                          "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype =
'b'" 
+                          "            UNION ALL "
         /* composite types containing any type selected so far */
                           "            SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c,
pg_catalog.pg_attributea, x " 
                           "            WHERE t.typtype = 'c' AND "
                           "                  t.oid = c.reltype AND "
                           "                  c.oid = a.attrelid AND "
                           "                  NOT a.attisdropped AND "
-                          "                  a.atttypid = x.oid "
+                          "                  a.atttypid = x.oid ",
+                          typename);
+
+        /* Ranges came in in 9.2 */
+        if (GET_MAJOR_VERSION(cluster->major_version) >= 902)
+            appendPQExpBuffer(&querybuf,
+                              "            UNION ALL "
+            /* ranges containing any type selected so far */
+                              "            SELECT t.oid FROM pg_catalog.pg_type t, pg_range r, x "
+                              "            WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid");
+
+        appendPQExpBuffer(&querybuf,
                           "    ) foo "
                           ") "
         /* now look for stored columns of any such type */
@@ -169,8 +183,7 @@ check_for_data_type_usage(ClusterInfo *cluster, const char *typename,
         /* exclude possible orphaned temp tables */
                           "        n.nspname !~ '^pg_temp_' AND "
                           "        n.nspname !~ '^pg_toast_temp_' AND "
-                          "        n.nspname NOT IN ('pg_catalog', 'information_schema')",
-                          typename);
+                          "        n.nspname NOT IN ('pg_catalog', 'information_schema')");

         res = executeQueryOrDie(conn, "%s", querybuf.data);


Re: pg_upgrade fails to detect unsupported arrays and ranges

From
Daniel Gustafsson
Date:
> On 10 Nov 2019, at 20:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> 0001 refactors the code in question
> so that we have only one copy not three-and-growing.  The only
> difference between the three copies was that one case didn't bother
> to search indexes, but I judged that that wasn't an optimization we
> need to preserve.

A big +1 on this refactoring.

> (Note: this patch is shown with --ignore-space-change
> to make it more reviewable, but I did re-pgindent the code.)  Then
> 0002 actually adds the array and range cases.

Was the source pgindented, but not committed, before generating the patches?  I
fail to apply them on master (or REL_12_STABLE) on what seems to be only
whitespace changes.

> Although this is a really straightforward patch and I've tested it
> against appropriate old versions (9.1 and 9.2), I'm very hesitant
> to shove it in so soon before a release wrap.  Should I do that, or
> let it wait till after the wrap?

Having read the patch I agree that it's trivial enough that I wouldn't be
worried to let it slip through.  However, given that we've lacked the check for
a few releases, is it worth rushing with the potential for a last-minute
"oh-shit"?

> +        /* arrays over any type selected so far */
> +                          "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype =
'b'" 

No need to check typlen?

cheers ./daniel


Re: pg_upgrade fails to detect unsupported arrays and ranges

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
> On 10 Nov 2019, at 20:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Although this is a really straightforward patch and I've tested it
>> against appropriate old versions (9.1 and 9.2), I'm very hesitant
>> to shove it in so soon before a release wrap.  Should I do that, or
>> let it wait till after the wrap?

> Having read the patch I agree that it's trivial enough that I wouldn't be
> worried to let it slip through.  However, given that we've lacked the check for
> a few releases, is it worth rushing with the potential for a last-minute
> "oh-shit"?

Probably not, really --- the main argument for that is just that it'd fit
well with the fixes Tomas already made.

>> +        /* arrays over any type selected so far */
>> +                          "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype
='b' " 

> No need to check typlen?

Yeah, that's intentional.  A fixed-length array type over a problematic
type would be just as much of a problem as a varlena array type.
The case shouldn't apply to any of the existing problematic types,
but I was striving for generality.

            regards, tom lane



Re: pg_upgrade fails to detect unsupported arrays and ranges

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
> On 10 Nov 2019, at 20:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (Note: this patch is shown with --ignore-space-change
>> to make it more reviewable, but I did re-pgindent the code.)  Then
>> 0002 actually adds the array and range cases.

> Was the source pgindented, but not committed, before generating the patches?  I
> fail to apply them on master (or REL_12_STABLE) on what seems to be only
> whitespace changes.

Hm, I suppose it might be hard to apply the combination of the patches
(maybe something involving patch -l would work).  For simplicity, here's
the complete patch for HEAD.  I fixed a missing schema qualification.

            regards, tom lane

diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index 9deb53c..37cb853 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -97,27 +97,27 @@ new_9_0_populate_pg_largeobject_metadata(ClusterInfo *cluster, bool check_mode)


 /*
- * old_9_3_check_for_line_data_type_usage()
- *    9.3 -> 9.4
- *    Fully implement the 'line' data type in 9.4, which previously returned
- *    "not enabled" by default and was only functionally enabled with a
- *    compile-time switch;  9.4 "line" has different binary and text
- *    representation formats;  checks tables and indexes.
+ * check_for_data_type_usage
+ *    Detect whether there are any stored columns depending on the given type
+ *
+ * If so, write a report to the given file name, and return true.
+ *
+ * We check for the type in tables, matviews, and indexes, but not views;
+ * there's no storage involved in a view.
  */
-void
-old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+static bool
+check_for_data_type_usage(ClusterInfo *cluster, const char *typename,
+                          char *output_path)
 {
-    int            dbnum;
-    FILE       *script = NULL;
     bool        found = false;
-    char        output_path[MAXPGPATH];
-
-    prep_status("Checking for incompatible \"line\" data type");
-
-    snprintf(output_path, sizeof(output_path), "tables_using_line.txt");
+    FILE       *script = NULL;
+    int            dbnum;

     for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
     {
+        DbInfo       *active_db = &cluster->dbarr.dbs[dbnum];
+        PGconn       *conn = connectToServer(cluster, active_db->db_name);
+        PQExpBufferData querybuf;
         PGresult   *res;
         bool        db_used = false;
         int            ntups;
@@ -125,50 +125,68 @@ old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
         int            i_nspname,
                     i_relname,
                     i_attname;
-        DbInfo       *active_db = &cluster->dbarr.dbs[dbnum];
-        PGconn       *conn = connectToServer(cluster, active_db->db_name);

         /*
-         * The pg_catalog.line type may be wrapped in a domain or composite
-         * type, or both (9.3 did not allow domains on composite types, but
-         * there may be multi-level composite type). To detect these cases
-         * we need a recursive CTE.
+         * The type of interest might be wrapped in a domain, array,
+         * composite, or range, and these container types can be nested (to
+         * varying extents depending on server version, but that's not of
+         * concern here).  To handle all these cases we need a recursive CTE.
          */
-        res = executeQueryOrDie(conn,
-                                "WITH RECURSIVE oids AS ( "
-        /* the pg_catalog.line type itself */
-                                "    SELECT 'pg_catalog.line'::pg_catalog.regtype AS oid "
-                                "    UNION ALL "
-                                "    SELECT * FROM ( "
-        /* domains on the type */
-                                "        WITH x AS (SELECT oid FROM oids) "
-                                "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND
typtype= 'd' " 
-                                "            UNION "
-        /* composite types containing the type */
-                                "            SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c,
pg_catalog.pg_attributea, x " 
-                                "            WHERE t.typtype = 'c' AND "
-                                "                  t.oid = c.reltype AND "
-                                "                  c.oid = a.attrelid AND "
-                                "                  NOT a.attisdropped AND "
-                                "                  a.atttypid = x.oid "
-                                "    ) foo "
-                                ") "
-                                "SELECT n.nspname, c.relname, a.attname "
-                                "FROM    pg_catalog.pg_class c, "
-                                "        pg_catalog.pg_namespace n, "
-                                "        pg_catalog.pg_attribute a "
-                                "WHERE    c.oid = a.attrelid AND "
-                                "        NOT a.attisdropped AND "
-                                "        a.atttypid IN (SELECT oid FROM oids) AND "
-                                "        c.relkind IN ("
-                                CppAsString2(RELKIND_RELATION) ", "
-                                CppAsString2(RELKIND_MATVIEW) ", "
-                                CppAsString2(RELKIND_INDEX) ") AND "
-                                "        c.relnamespace = n.oid AND "
+        initPQExpBuffer(&querybuf);
+        appendPQExpBuffer(&querybuf,
+                          "WITH RECURSIVE oids AS ( "
+        /* the target type itself */
+                          "    SELECT '%s'::pg_catalog.regtype AS oid "
+                          "    UNION ALL "
+                          "    SELECT * FROM ( "
+        /* inner WITH because we can only reference the CTE once */
+                          "        WITH x AS (SELECT oid FROM oids) "
+        /* domains on any type selected so far */
+                          "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype
='d' " 
+                          "            UNION ALL "
+        /* arrays over any type selected so far */
+                          "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype =
'b'" 
+                          "            UNION ALL "
+        /* composite types containing any type selected so far */
+                          "            SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c,
pg_catalog.pg_attributea, x " 
+                          "            WHERE t.typtype = 'c' AND "
+                          "                  t.oid = c.reltype AND "
+                          "                  c.oid = a.attrelid AND "
+                          "                  NOT a.attisdropped AND "
+                          "                  a.atttypid = x.oid ",
+                          typename);
+
+        /* Ranges came in in 9.2 */
+        if (GET_MAJOR_VERSION(cluster->major_version) >= 902)
+            appendPQExpBuffer(&querybuf,
+                              "            UNION ALL "
+            /* ranges containing any type selected so far */
+                              "            SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_range r, x "
+                              "            WHERE t.typtype = 'r' AND r.rngtypid = t.oid AND r.rngsubtype = x.oid");
+
+        appendPQExpBuffer(&querybuf,
+                          "    ) foo "
+                          ") "
+        /* now look for stored columns of any such type */
+                          "SELECT n.nspname, c.relname, a.attname "
+                          "FROM    pg_catalog.pg_class c, "
+                          "        pg_catalog.pg_namespace n, "
+                          "        pg_catalog.pg_attribute a "
+                          "WHERE    c.oid = a.attrelid AND "
+                          "        NOT a.attisdropped AND "
+                          "        a.atttypid IN (SELECT oid FROM oids) AND "
+                          "        c.relkind IN ("
+                          CppAsString2(RELKIND_RELATION) ", "
+                          CppAsString2(RELKIND_MATVIEW) ", "
+                          CppAsString2(RELKIND_INDEX) ") AND "
+                          "        c.relnamespace = n.oid AND "
         /* exclude possible orphaned temp tables */
-                                "        n.nspname !~ '^pg_temp_' AND "
-                                "        n.nspname !~ '^pg_toast_temp_' AND "
-                                "        n.nspname NOT IN ('pg_catalog', 'information_schema')");
+                          "        n.nspname !~ '^pg_temp_' AND "
+                          "        n.nspname !~ '^pg_toast_temp_' AND "
+        /* exclude system catalogs, too */
+                          "        n.nspname NOT IN ('pg_catalog', 'information_schema')");
+
+        res = executeQueryOrDie(conn, "%s", querybuf.data);

         ntups = PQntuples(res);
         i_nspname = PQfnumber(res, "nspname");
@@ -193,13 +211,36 @@ old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)

         PQclear(res);

+        termPQExpBuffer(&querybuf);
+
         PQfinish(conn);
     }

     if (script)
         fclose(script);

-    if (found)
+    return found;
+}
+
+
+/*
+ * old_9_3_check_for_line_data_type_usage()
+ *    9.3 -> 9.4
+ *    Fully implement the 'line' data type in 9.4, which previously returned
+ *    "not enabled" by default and was only functionally enabled with a
+ *    compile-time switch; as of 9.4 "line" has a different on-disk
+ *    representation format.
+ */
+void
+old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
+{
+    char        output_path[MAXPGPATH];
+
+    prep_status("Checking for incompatible \"line\" data type");
+
+    snprintf(output_path, sizeof(output_path), "tables_using_line.txt");
+
+    if (check_for_data_type_usage(cluster, "pg_catalog.line", output_path))
     {
         pg_log(PG_REPORT, "fatal\n");
         pg_fatal("Your installation contains the \"line\" data type in user tables.  This\n"
@@ -226,105 +267,17 @@ old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
  *    mid-upgrade.  Worse, if there's a matview with such a column, the
  *    DDL reload will silently change it to "text" which won't match the
  *    on-disk storage (which is like "cstring").  So we *must* reject that.
- *    Also check composite types and domains on the "unknwown" type (even
- *    combinations of both), in case they are used for table columns.
- *    We needn't check indexes, because "unknown" has no opclasses.
  */
 void
 old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
 {
-    int            dbnum;
-    FILE       *script = NULL;
-    bool        found = false;
     char        output_path[MAXPGPATH];

     prep_status("Checking for invalid \"unknown\" user columns");

     snprintf(output_path, sizeof(output_path), "tables_using_unknown.txt");

-    for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-    {
-        PGresult   *res;
-        bool        db_used = false;
-        int            ntups;
-        int            rowno;
-        int            i_nspname,
-                    i_relname,
-                    i_attname;
-        DbInfo       *active_db = &cluster->dbarr.dbs[dbnum];
-        PGconn       *conn = connectToServer(cluster, active_db->db_name);
-
-        /*
-         * The pg_catalog.unknown type may be wrapped in a domain or composite
-         * type, or both (9.3 did not allow domains on composite types, but
-         * there may be multi-level composite type). To detect these cases
-         * we need a recursive CTE.
-         */
-        res = executeQueryOrDie(conn,
-                                "WITH RECURSIVE oids AS ( "
-        /* the pg_catalog.unknown type itself */
-                                "    SELECT 'pg_catalog.unknown'::pg_catalog.regtype AS oid "
-                                "    UNION ALL "
-                                "    SELECT * FROM ( "
-        /* domains on the type */
-                                "        WITH x AS (SELECT oid FROM oids) "
-                                "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND
typtype= 'd' " 
-                                "            UNION "
-        /* composite types containing the type */
-                                "            SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c,
pg_catalog.pg_attributea, x " 
-                                "            WHERE t.typtype = 'c' AND "
-                                "                  t.oid = c.reltype AND "
-                                "                  c.oid = a.attrelid AND "
-                                "                  NOT a.attisdropped AND "
-                                "                  a.atttypid = x.oid "
-                                "    ) foo "
-                                ") "
-                                "SELECT n.nspname, c.relname, a.attname "
-                                "FROM    pg_catalog.pg_class c, "
-                                "        pg_catalog.pg_namespace n, "
-                                "        pg_catalog.pg_attribute a "
-                                "WHERE    c.oid = a.attrelid AND "
-                                "        NOT a.attisdropped AND "
-                                "        a.atttypid IN (SELECT oid FROM oids) AND "
-                                "        c.relkind IN ("
-                                CppAsString2(RELKIND_RELATION) ", "
-                                CppAsString2(RELKIND_MATVIEW) ") AND "
-                                "        c.relnamespace = n.oid AND "
-        /* exclude possible orphaned temp tables */
-                                "        n.nspname !~ '^pg_temp_' AND "
-                                "        n.nspname !~ '^pg_toast_temp_' AND "
-                                "        n.nspname NOT IN ('pg_catalog', 'information_schema')");
-
-        ntups = PQntuples(res);
-        i_nspname = PQfnumber(res, "nspname");
-        i_relname = PQfnumber(res, "relname");
-        i_attname = PQfnumber(res, "attname");
-        for (rowno = 0; rowno < ntups; rowno++)
-        {
-            found = true;
-            if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-                pg_fatal("could not open file \"%s\": %s\n", output_path,
-                         strerror(errno));
-            if (!db_used)
-            {
-                fprintf(script, "In database: %s\n", active_db->db_name);
-                db_used = true;
-            }
-            fprintf(script, "  %s.%s.%s\n",
-                    PQgetvalue(res, rowno, i_nspname),
-                    PQgetvalue(res, rowno, i_relname),
-                    PQgetvalue(res, rowno, i_attname));
-        }
-
-        PQclear(res);
-
-        PQfinish(conn);
-    }
-
-    if (script)
-        fclose(script);
-
-    if (found)
+    if (check_for_data_type_usage(cluster, "pg_catalog.unknown", output_path))
     {
         pg_log(PG_REPORT, "fatal\n");
         pg_fatal("Your installation contains the \"unknown\" data type in user tables.  This\n"
@@ -456,105 +409,18 @@ old_9_6_invalidate_hash_indexes(ClusterInfo *cluster, bool check_mode)
  *    which does affect the storage (name is by-ref, but not varlena). This
  *    means user tables using sql_identifier for columns are broken because
  *    the on-disk format is different.
- *
- *    We need to check all objects that might store sql_identifier on disk,
- *    i.e. tables, matviews and indexes. Also check composite types in case
- *    they are used in this context.
  */
 void
 old_11_check_for_sql_identifier_data_type_usage(ClusterInfo *cluster)
 {
-    int            dbnum;
-    FILE       *script = NULL;
-    bool        found = false;
     char        output_path[MAXPGPATH];

     prep_status("Checking for invalid \"sql_identifier\" user columns");

     snprintf(output_path, sizeof(output_path), "tables_using_sql_identifier.txt");

-    for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-    {
-        PGresult   *res;
-        bool        db_used = false;
-        int            ntups;
-        int            rowno;
-        int            i_nspname,
-                    i_relname,
-                    i_attname;
-        DbInfo       *active_db = &cluster->dbarr.dbs[dbnum];
-        PGconn       *conn = connectToServer(cluster, active_db->db_name);
-
-        /*
-         * We need the recursive CTE because the sql_identifier may be wrapped
-         * either in a domain or composite type, or both (in arbitrary order).
-         */
-        res = executeQueryOrDie(conn,
-                                "WITH RECURSIVE oids AS ( "
-        /* the sql_identifier type itself */
-                                "    SELECT 'information_schema.sql_identifier'::pg_catalog.regtype AS oid "
-                                "    UNION ALL "
-                                "    SELECT * FROM ( "
-        /* domains on the type */
-                                "        WITH x AS (SELECT oid FROM oids) "
-                                "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND
typtype= 'd' " 
-                                "            UNION "
-        /* composite types containing the type */
-                                "            SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c,
pg_catalog.pg_attributea, x " 
-                                "            WHERE t.typtype = 'c' AND "
-                                "                  t.oid = c.reltype AND "
-                                "                  c.oid = a.attrelid AND "
-                                "                  NOT a.attisdropped AND "
-                                "                  a.atttypid = x.oid "
-                                "    ) foo "
-                                ") "
-                                "SELECT n.nspname, c.relname, a.attname "
-                                "FROM    pg_catalog.pg_class c, "
-                                "        pg_catalog.pg_namespace n, "
-                                "        pg_catalog.pg_attribute a "
-                                "WHERE    c.oid = a.attrelid AND "
-                                "        NOT a.attisdropped AND "
-                                "        a.atttypid IN (SELECT oid FROM oids) AND "
-                                "        c.relkind IN ("
-                                CppAsString2(RELKIND_RELATION) ", "
-                                CppAsString2(RELKIND_MATVIEW) ", "
-                                CppAsString2(RELKIND_INDEX) ") AND "
-                                "        c.relnamespace = n.oid AND "
-        /* exclude possible orphaned temp tables */
-                                "        n.nspname !~ '^pg_temp_' AND "
-                                "        n.nspname !~ '^pg_toast_temp_' AND "
-                                "        n.nspname NOT IN ('pg_catalog', 'information_schema')");
-
-        ntups = PQntuples(res);
-        i_nspname = PQfnumber(res, "nspname");
-        i_relname = PQfnumber(res, "relname");
-        i_attname = PQfnumber(res, "attname");
-        for (rowno = 0; rowno < ntups; rowno++)
-        {
-            found = true;
-            if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-                pg_fatal("could not open file \"%s\": %s\n", output_path,
-                         strerror(errno));
-            if (!db_used)
-            {
-                fprintf(script, "In database: %s\n", active_db->db_name);
-                db_used = true;
-            }
-            fprintf(script, "  %s.%s.%s\n",
-                    PQgetvalue(res, rowno, i_nspname),
-                    PQgetvalue(res, rowno, i_relname),
-                    PQgetvalue(res, rowno, i_attname));
-        }
-
-        PQclear(res);
-
-        PQfinish(conn);
-    }
-
-    if (script)
-        fclose(script);
-
-    if (found)
+    if (check_for_data_type_usage(cluster, "information_schema.sql_identifier",
+                                  output_path))
     {
         pg_log(PG_REPORT, "fatal\n");
         pg_fatal("Your installation contains the \"sql_identifier\" data type in user tables\n"

Re: pg_upgrade fails to detect unsupported arrays and ranges

From
Daniel Gustafsson
Date:
> On 10 Nov 2019, at 22:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Daniel Gustafsson <daniel@yesql.se> writes:

>>> +        /* arrays over any type selected so far */
>>> +                          "            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typelem = x.oid AND typtype
='b' " 
>
>> No need to check typlen?
>
> Yeah, that's intentional.  A fixed-length array type over a problematic
> type would be just as much of a problem as a varlena array type.
> The case shouldn't apply to any of the existing problematic types,
> but I was striving for generality.

That makes a lot of sense, thanks for the explanation.

cheers ./daniel


Re: pg_upgrade fails to detect unsupported arrays and ranges

From
Daniel Gustafsson
Date:
> On 10 Nov 2019, at 22:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> On 10 Nov 2019, at 20:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> (Note: this patch is shown with --ignore-space-change
>>> to make it more reviewable, but I did re-pgindent the code.)  Then
>>> 0002 actually adds the array and range cases.
>
>> Was the source pgindented, but not committed, before generating the patches?  I
>> fail to apply them on master (or REL_12_STABLE) on what seems to be only
>> whitespace changes.
>
> Hm, I suppose it might be hard to apply the combination of the patches
> (maybe something involving patch -l would work).  For simplicity, here's
> the complete patch for HEAD.  I fixed a missing schema qualification.

Applies, builds clean and passes light testing.  I can see the appeal of
including it before the wrap, even though I personally would've held off.

cheers ./daniel


Re: pg_upgrade fails to detect unsupported arrays and ranges

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
> Applies, builds clean and passes light testing.

Thanks for checking!

> I can see the appeal of
> including it before the wrap, even though I personally would've held off.

Nah, I'm not gonna risk it at this stage.  I concur with your point
that this is an ancient bug, and one that is unlikely to bite many
people.  I'll push it Wednesday or so.

            regards, tom lane



Re: pg_upgrade fails to detect unsupported arrays and ranges

From
Tom Lane
Date:
[ blast-from-the-past department ]

I wrote:
> Daniel Gustafsson <daniel@yesql.se> writes:
>> I can see the appeal of
>> including it before the wrap, even though I personally would've held off.

> Nah, I'm not gonna risk it at this stage.  I concur with your point
> that this is an ancient bug, and one that is unlikely to bite many
> people.  I'll push it Wednesday or so.

I happened across a couple of further pg_upgrade oversights in the
same vein as 29aeda6e4 et al:

* Those commits fixed the bugs in pg_upgrade/version.c about not
checking the contents of arrays/ranges/etc, but there are two
similar functions in pg_upgrade/check.c that I failed to notice
(probably due to the haste with which that patch was prepared).

* We really need to also reject user tables that contain instances
of system-defined composite types (i.e. catalog rowtypes), because
except for a few bootstrap catalogs, those type OIDs are assigned by
genbki.pl not by hand, so they aren't stable across major versions.
For example, in HEAD I get

regression=# select 'pg_enum'::regtype::oid;
  oid  
-------
 13045
(1 row)

but the same OID was 12022 in v13, 11551 in v11, etc.  So if you
had a column of type pg_enum, you'd likely get no-such-type-OID
failures when reading the values after an upgrade.  I don't see
much use-case for doing such a thing, so it seems saner to just
block off the possibility rather than trying to support it.
(We'd have little choice in the back branches anyway, as their
OID values are locked down now.)

The attached proposed patch fixes these cases too.  I generalized
the recursive query a little more so that it could start from an
arbitrary query yielding pg_type OIDs, rather than just one type
name; otherwise it's pretty straightforward.

Barring objections I'll apply and back-patch this soon.

            regards, tom lane

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 1c1c908664..ce821a46b3 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -24,6 +24,7 @@ static void check_for_prepared_transactions(ClusterInfo *cluster);
 static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
 static void check_for_user_defined_postfix_ops(ClusterInfo *cluster);
 static void check_for_tables_with_oids(ClusterInfo *cluster);
+static void check_for_composite_data_type_usage(ClusterInfo *cluster);
 static void check_for_reg_data_type_usage(ClusterInfo *cluster);
 static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
@@ -100,6 +101,7 @@ check_and_dump_old_cluster(bool live_check)
     check_is_install_user(&old_cluster);
     check_proper_datallowconn(&old_cluster);
     check_for_prepared_transactions(&old_cluster);
+    check_for_composite_data_type_usage(&old_cluster);
     check_for_reg_data_type_usage(&old_cluster);
     check_for_isn_and_int8_passing_mismatch(&old_cluster);

@@ -1043,6 +1045,63 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 }


+/*
+ * check_for_composite_data_type_usage()
+ *    Check for system-defined composite types used in user tables.
+ *
+ *    The OIDs of rowtypes of system catalogs and information_schema views
+ *    can change across major versions; unlike user-defined types, we have
+ *    no mechanism for forcing them to be the same in the new cluster.
+ *    Hence, if any user table uses one, that's problematic for pg_upgrade.
+ */
+static void
+check_for_composite_data_type_usage(ClusterInfo *cluster)
+{
+    bool        found;
+    Oid            firstUserOid;
+    char        output_path[MAXPGPATH];
+    char       *base_query;
+
+    prep_status("Checking for system-defined composite types in user tables");
+
+    snprintf(output_path, sizeof(output_path), "tables_using_composite.txt");
+
+    /*
+     * Look for composite types that were made during initdb *or* belong to
+     * information_schema; that's important in case information_schema was
+     * dropped and reloaded.
+     *
+     * The cutoff OID here should match the source cluster's value of
+     * FirstNormalObjectId.  We hardcode it rather than using that C #define
+     * because, if that #define is ever changed, our own version's value is
+     * NOT what to use.  Eventually we may need a test on the source cluster's
+     * version to select the correct value.
+     */
+    firstUserOid = 16384;
+
+    base_query = psprintf("SELECT t.oid FROM pg_catalog.pg_type t "
+                          "LEFT JOIN pg_catalog.pg_namespace n ON t.typnamespace = n.oid "
+                          " WHERE typtype = 'c' AND (t.oid < %u OR nspname = 'information_schema')",
+                          firstUserOid);
+
+    found = check_for_data_types_usage(cluster, base_query, output_path);
+
+    free(base_query);
+
+    if (found)
+    {
+        pg_log(PG_REPORT, "fatal\n");
+        pg_fatal("Your installation contains system-defined composite type(s) in user tables.\n"
+                 "These type OIDs are not stable across PostgreSQL versions,\n"
+                 "so this cluster cannot currently be upgraded.  You can\n"
+                 "remove the problem tables and restart the upgrade.\n"
+                 "A list of the problem columns is in the file:\n"
+                 "    %s\n\n", output_path);
+    }
+    else
+        check_ok();
+}
+
 /*
  * check_for_reg_data_type_usage()
  *    pg_upgrade only preserves these system values:
@@ -1057,88 +1116,29 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 static void
 check_for_reg_data_type_usage(ClusterInfo *cluster)
 {
-    int            dbnum;
-    FILE       *script = NULL;
-    bool        found = false;
+    bool        found;
     char        output_path[MAXPGPATH];

     prep_status("Checking for reg* data types in user tables");

     snprintf(output_path, sizeof(output_path), "tables_using_reg.txt");

-    for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-    {
-        PGresult   *res;
-        bool        db_used = false;
-        int            ntups;
-        int            rowno;
-        int            i_nspname,
-                    i_relname,
-                    i_attname;
-        DbInfo       *active_db = &cluster->dbarr.dbs[dbnum];
-        PGconn       *conn = connectToServer(cluster, active_db->db_name);
-
-        /*
-         * While several relkinds don't store any data, e.g. views, they can
-         * be used to define data types of other columns, so we check all
-         * relkinds.
-         */
-        res = executeQueryOrDie(conn,
-                                "SELECT n.nspname, c.relname, a.attname "
-                                "FROM    pg_catalog.pg_class c, "
-                                "        pg_catalog.pg_namespace n, "
-                                "        pg_catalog.pg_attribute a, "
-                                "        pg_catalog.pg_type t "
-                                "WHERE    c.oid = a.attrelid AND "
-                                "        NOT a.attisdropped AND "
-                                "       a.atttypid = t.oid AND "
-                                "       t.typnamespace = "
-                                "           (SELECT oid FROM pg_namespace "
-                                "            WHERE nspname = 'pg_catalog') AND"
-                                "        t.typname IN ( "
-        /* regclass.oid is preserved, so 'regclass' is OK */
-                                "           'regcollation', "
-                                "           'regconfig', "
-                                "           'regdictionary', "
-                                "           'regnamespace', "
-                                "           'regoper', "
-                                "           'regoperator', "
-                                "           'regproc', "
-                                "           'regprocedure' "
-        /* regrole.oid is preserved, so 'regrole' is OK */
-        /* regtype.oid is preserved, so 'regtype' is OK */
-                                "            ) AND "
-                                "        c.relnamespace = n.oid AND "
-                                "        n.nspname NOT IN ('pg_catalog', 'information_schema')");
-
-        ntups = PQntuples(res);
-        i_nspname = PQfnumber(res, "nspname");
-        i_relname = PQfnumber(res, "relname");
-        i_attname = PQfnumber(res, "attname");
-        for (rowno = 0; rowno < ntups; rowno++)
-        {
-            found = true;
-            if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-                pg_fatal("could not open file \"%s\": %s\n",
-                         output_path, strerror(errno));
-            if (!db_used)
-            {
-                fprintf(script, "In database: %s\n", active_db->db_name);
-                db_used = true;
-            }
-            fprintf(script, "  %s.%s.%s\n",
-                    PQgetvalue(res, rowno, i_nspname),
-                    PQgetvalue(res, rowno, i_relname),
-                    PQgetvalue(res, rowno, i_attname));
-        }
-
-        PQclear(res);
-
-        PQfinish(conn);
-    }
-
-    if (script)
-        fclose(script);
+    found = check_for_data_types_usage(cluster,
+                                       "SELECT tname::pg_catalog.regtype AS oid"
+                                       " FROM (VALUES "
+    /* pg_class.oid is preserved, so 'regclass' is OK */
+                                       "('pg_catalog.regcollation'), "
+                                       "('pg_catalog.regconfig'), "
+                                       "('pg_catalog.regdictionary'), "
+                                       "('pg_catalog.regnamespace'), "
+                                       "('pg_catalog.regoper'), "
+                                       "('pg_catalog.regoperator'), "
+                                       "('pg_catalog.regproc'), "
+                                       "('pg_catalog.regprocedure') "
+    /* pg_authid.oid is preserved, so 'regrole' is OK */
+    /* pg_type.oid is preserved, so 'regtype' is OK */
+                                       ") v(tname)",
+                                       output_path);

     if (found)
     {
@@ -1163,75 +1163,13 @@ check_for_reg_data_type_usage(ClusterInfo *cluster)
 static void
 check_for_jsonb_9_4_usage(ClusterInfo *cluster)
 {
-    int            dbnum;
-    FILE       *script = NULL;
-    bool        found = false;
     char        output_path[MAXPGPATH];

     prep_status("Checking for incompatible \"jsonb\" data type");

     snprintf(output_path, sizeof(output_path), "tables_using_jsonb.txt");

-    for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
-    {
-        PGresult   *res;
-        bool        db_used = false;
-        int            ntups;
-        int            rowno;
-        int            i_nspname,
-                    i_relname,
-                    i_attname;
-        DbInfo       *active_db = &cluster->dbarr.dbs[dbnum];
-        PGconn       *conn = connectToServer(cluster, active_db->db_name);
-
-        /*
-         * While several relkinds don't store any data, e.g. views, they can
-         * be used to define data types of other columns, so we check all
-         * relkinds.
-         */
-        res = executeQueryOrDie(conn,
-                                "SELECT n.nspname, c.relname, a.attname "
-                                "FROM    pg_catalog.pg_class c, "
-                                "        pg_catalog.pg_namespace n, "
-                                "        pg_catalog.pg_attribute a "
-                                "WHERE    c.oid = a.attrelid AND "
-                                "        NOT a.attisdropped AND "
-                                "        a.atttypid = 'pg_catalog.jsonb'::pg_catalog.regtype AND "
-                                "        c.relnamespace = n.oid AND "
-        /* exclude possible orphaned temp tables */
-                                "        n.nspname !~ '^pg_temp_' AND "
-                                "        n.nspname NOT IN ('pg_catalog', 'information_schema')");
-
-        ntups = PQntuples(res);
-        i_nspname = PQfnumber(res, "nspname");
-        i_relname = PQfnumber(res, "relname");
-        i_attname = PQfnumber(res, "attname");
-        for (rowno = 0; rowno < ntups; rowno++)
-        {
-            found = true;
-            if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
-                pg_fatal("could not open file \"%s\": %s\n",
-                         output_path, strerror(errno));
-            if (!db_used)
-            {
-                fprintf(script, "In database: %s\n", active_db->db_name);
-                db_used = true;
-            }
-            fprintf(script, "  %s.%s.%s\n",
-                    PQgetvalue(res, rowno, i_nspname),
-                    PQgetvalue(res, rowno, i_relname),
-                    PQgetvalue(res, rowno, i_attname));
-        }
-
-        PQclear(res);
-
-        PQfinish(conn);
-    }
-
-    if (script)
-        fclose(script);
-
-    if (found)
+    if (check_for_data_type_usage(cluster, "pg_catalog.jsonb", output_path))
     {
         pg_log(PG_REPORT, "fatal\n");
         pg_fatal("Your installation contains the \"jsonb\" data type in user tables.\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd..d7666da3f2 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -440,6 +440,12 @@ unsigned int str2uint(const char *str);

 /* version.c */

+bool        check_for_data_types_usage(ClusterInfo *cluster,
+                                       const char *base_query,
+                                       const char *output_path);
+bool        check_for_data_type_usage(ClusterInfo *cluster,
+                                      const char *typename,
+                                      const char *output_path);
 void        new_9_0_populate_pg_largeobject_metadata(ClusterInfo *cluster,
                                                      bool check_mode);
 void        old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
index a41247b33d..e7935ce359 100644
--- a/src/bin/pg_upgrade/version.c
+++ b/src/bin/pg_upgrade/version.c
@@ -97,17 +97,22 @@ new_9_0_populate_pg_largeobject_metadata(ClusterInfo *cluster, bool check_mode)


 /*
- * check_for_data_type_usage
- *    Detect whether there are any stored columns depending on the given type
+ * check_for_data_types_usage()
+ *    Detect whether there are any stored columns depending on given type(s)
  *
  * If so, write a report to the given file name, and return true.
  *
- * We check for the type in tables, matviews, and indexes, but not views;
+ * base_query should be a SELECT yielding a single column named "oid",
+ * containing the pg_type OIDs of one or more types that are known to have
+ * inconsistent on-disk representations across server versions.
+ *
+ * We check for the type(s) in tables, matviews, and indexes, but not views;
  * there's no storage involved in a view.
  */
-static bool
-check_for_data_type_usage(ClusterInfo *cluster, const char *typename,
-                          char *output_path)
+bool
+check_for_data_types_usage(ClusterInfo *cluster,
+                           const char *base_query,
+                           const char *output_path)
 {
     bool        found = false;
     FILE       *script = NULL;
@@ -127,7 +132,7 @@ check_for_data_type_usage(ClusterInfo *cluster, const char *typename,
                     i_attname;

         /*
-         * The type of interest might be wrapped in a domain, array,
+         * The types of interest might be wrapped in a domain, array,
          * composite, or range, and these container types can be nested (to
          * varying extents depending on server version, but that's not of
          * concern here).  To handle all these cases we need a recursive CTE.
@@ -135,8 +140,8 @@ check_for_data_type_usage(ClusterInfo *cluster, const char *typename,
         initPQExpBuffer(&querybuf);
         appendPQExpBuffer(&querybuf,
                           "WITH RECURSIVE oids AS ( "
-        /* the target type itself */
-                          "    SELECT '%s'::pg_catalog.regtype AS oid "
+        /* start with the type(s) returned by base_query */
+                          "    %s "
                           "    UNION ALL "
                           "    SELECT * FROM ( "
         /* inner WITH because we can only reference the CTE once */
@@ -154,7 +159,7 @@ check_for_data_type_usage(ClusterInfo *cluster, const char *typename,
                           "                  c.oid = a.attrelid AND "
                           "                  NOT a.attisdropped AND "
                           "                  a.atttypid = x.oid ",
-                          typename);
+                          base_query);

         /* Ranges were introduced in 9.2 */
         if (GET_MAJOR_VERSION(cluster->major_version) >= 902)
@@ -222,6 +227,34 @@ check_for_data_type_usage(ClusterInfo *cluster, const char *typename,
     return found;
 }

+/*
+ * check_for_data_type_usage()
+ *    Detect whether there are any stored columns depending on the given type
+ *
+ * If so, write a report to the given file name, and return true.
+ *
+ * typename should be a fully qualified type name.  This is just a
+ * trivial wrapper around check_for_data_types_usage() to convert a
+ * type name into a base query.
+ */
+bool
+check_for_data_type_usage(ClusterInfo *cluster,
+                          const char *typename,
+                          const char *output_path)
+{
+    bool        found;
+    char       *base_query;
+
+    base_query = psprintf("SELECT '%s'::pg_catalog.regtype AS oid",
+                          typename);
+
+    found = check_for_data_types_usage(cluster, base_query, output_path);
+
+    free(base_query);
+
+    return found;
+}
+

 /*
  * old_9_3_check_for_line_data_type_usage()

Re: pg_upgrade fails to detect unsupported arrays and ranges

From
Daniel Gustafsson
Date:
> On 28 Apr 2021, at 17:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> [ blast-from-the-past department ]
>
> I wrote:
>> Daniel Gustafsson <daniel@yesql.se> writes:
>>> I can see the appeal of
>>> including it before the wrap, even though I personally would've held off.
>
>> Nah, I'm not gonna risk it at this stage.  I concur with your point
>> that this is an ancient bug, and one that is unlikely to bite many
>> people.  I'll push it Wednesday or so.
>
> I happened across a couple of further pg_upgrade oversights in the
> same vein as 29aeda6e4 et al:

Nice find, this makes a lot of sense.

> ..the same OID was 12022 in v13, 11551 in v11, etc.  So if you
> had a column of type pg_enum, you'd likely get no-such-type-OID
> failures when reading the values after an upgrade.  I don't see
> much use-case for doing such a thing, so it seems saner to just
> block off the possibility rather than trying to support it.

Agreed.  Having implemented basically this for Greenplum I think it’s wise to
avoid it unless we really have to, it gets very complicated once the layers of
worms are peeled back.

> The attached proposed patch fixes these cases too.  I generalized
> the recursive query a little more so that it could start from an
> arbitrary query yielding pg_type OIDs, rather than just one type
> name; otherwise it's pretty straightforward.
>
> Barring objections I'll apply and back-patch this soon.

Patch LGTM on reading, +1 on applying.  Being on parental leave I don’t have my
dev env ready to go so I didn’t perform testing; sorry about that.

> +        pg_fatal("Your installation contains system-defined composite type(s) in user tables.\n"
> +                 "These type OIDs are not stable across PostgreSQL versions,\n"
> +                 "so this cluster cannot currently be upgraded.  You can\n"
> +                 "remove the problem tables and restart the upgrade.\n"
> +                 "A list of the problem columns is in the file:\n"

Would it be helpful to inform the user that they can alter/drop just the
problematic columns as a potentially less scary alternative to dropping the
entire table?

> -         * The type of interest might be wrapped in a domain, array,
> +         * The types of interest might be wrapped in a domain, array,

Shouldn't this be "type(s)” as in the other changes here?

--
Daniel Gustafsson        https://vmware.com/




Re: pg_upgrade fails to detect unsupported arrays and ranges

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
>> On 28 Apr 2021, at 17:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> +        pg_fatal("Your installation contains system-defined composite type(s) in user tables.\n"
>> +                 "These type OIDs are not stable across PostgreSQL versions,\n"
>> +                 "so this cluster cannot currently be upgraded.  You can\n"
>> +                 "remove the problem tables and restart the upgrade.\n"
>> +                 "A list of the problem columns is in the file:\n"

> Would it be helpful to inform the user that they can alter/drop just the
> problematic columns as a potentially less scary alternative to dropping the
> entire table?

This wording is copied-and-pasted from the other similar tests.  I agree
that it's advocating a solution that might be overkill, but if we change
it we should also change the existing messages.  I don't mind doing
that in HEAD; less sure about the back branches, as (I think) these
are translatable strings.

Thoughts?

>> -         * The type of interest might be wrapped in a domain, array,
>> +         * The types of interest might be wrapped in a domain, array,

> Shouldn't this be "type(s)” as in the other changes here?

Fair enough.

            regards, tom lane



Re: pg_upgrade fails to detect unsupported arrays and ranges

From
Daniel Gustafsson
Date:
> On 28 Apr 2021, at 22:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>>> On 28 Apr 2021, at 17:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> +        pg_fatal("Your installation contains system-defined composite type(s) in user tables.\n"
>>> +                 "These type OIDs are not stable across PostgreSQL versions,\n"
>>> +                 "so this cluster cannot currently be upgraded.  You can\n"
>>> +                 "remove the problem tables and restart the upgrade.\n"
>>> +                 "A list of the problem columns is in the file:\n"
>
>> Would it be helpful to inform the user that they can alter/drop just the
>> problematic columns as a potentially less scary alternative to dropping the
>> entire table?
>
> This wording is copied-and-pasted from the other similar tests.  I agree
> that it's advocating a solution that might be overkill, but if we change
> it we should also change the existing messages.

Good point.

> I don't mind doing that in HEAD; less sure about the back branches, as

I think it would be helpful for users to try and give slightly more expanded
advice while (obviously) still always being safe.  I’m happy to take a crack at
that once back unless someone beats me to it.

> (I think) these are translatable strings.


If they aren't I think we should try and make them so to as far as we can
reduce language barrier problems in such important messages.

--
Daniel Gustafsson        https://vmware.com/




Re: pg_upgrade fails to detect unsupported arrays and ranges

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
> On 28 Apr 2021, at 22:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This wording is copied-and-pasted from the other similar tests.  I agree
>> that it's advocating a solution that might be overkill, but if we change
>> it we should also change the existing messages.

> Good point.

>> I don't mind doing that in HEAD; less sure about the back branches, as

> I think it would be helpful for users to try and give slightly more expanded
> advice while (obviously) still always being safe.  I’m happy to take a crack at
> that once back unless someone beats me to it.

Seems like s/remove the problem tables/drop the problem columns/
is easy and sufficient.

>> (I think) these are translatable strings.

> If they aren't I think we should try and make them so to as far as we can
> reduce language barrier problems in such important messages.

Checking, I see they do appear in pg_upgrade's po files.  So I propose
that we change the existing messages in HEAD but not the back branches.

            regards, tom lane