Thread: [PATCH] Allow anonymous rowtypes in function return column definition

[PATCH] Allow anonymous rowtypes in function return column definition

From

Elvis Pranskevichus

Date:

06 December 2018, 22:27:32

Currently, the following query

    SELECT q.b = row(2)
    FROM unnest(ARRAY[row(1, row(2))]) AS q(a int, b record);

would fail with

    ERROR:  column "b" has pseudo-type record

This is due to CheckAttributeNamesTypes() being used on a function
coldeflist as if it was a real relation definition.  But in the context
of a query there seems to be no harm in allowing this, as other ways of
manipulating anonymous rowtypes work well, e.g.:

    SELECT (ARRAY[ROW(1, ROW(2))])[1];



                                Elvis

Attachment

0001-Allow-anonymous-rowtypes-in-function-return-column-d.patch

Re: [PATCH] Allow anonymous rowtypes in function return column definition

From

Tom Lane

Date:

13 January 2019, 21:57:48

Elvis Pranskevichus <el@prans.net> writes:
> Currently, the following query
>     SELECT q.b = row(2)
>     FROM unnest(ARRAY[row(1, row(2))]) AS q(a int, b record);
> would fail with
>     ERROR:  column "b" has pseudo-type record
> This is due to CheckAttributeNamesTypes() being used on a function
> coldeflist as if it was a real relation definition.  But in the context
> of a query there seems to be no harm in allowing this,

Hmm, I'm not entirely convinced of that.  Looking at the conditions
checked by CheckAttributeType, the first question that pops to mind
is whether allowing "record" doesn't break our defenses against
a rowtype recursively containing itself.  Perhaps not --- allowing
an anonymous record to contain another one isn't really recursion,
because since they're both anonymous they can't be the "same" type.
But it could do with more thought than I've given it just now,
and with a comment explaining the reasoning.

(Speaking of which, you did not bother updating the comment immediately
adjacent to the code you changed in CheckAttributeType, even though
this change makes it incomplete/not very sensible.)

I also wonder why we'd allow RECORD but not RECORDARRAY.  More
generally, why not any polymorphic type?  There's probably a
good reason why not, but having a clear explanation why we're
treating RECORD differently from other polymorphics would go
a long way towards convincing me that this is safe.  Again
this is just handwaving, but such an argument might rest on the
fact that a record value is self-identifying as long as you know
it's a record, whereas a random Datum is not a self-identifying
member of the type class "anyelement", for instance.

I feel a bit uncomfortable about defining the new flag to
CheckAttributeNamesTypes as "allow_anonymous_records"; that
seems pretty short-sighted and single-purpose, especially in
view of there being no obvious reason why it shouldn't accept
RECORDARRAY too.  Maybe with a clearer explanation of the
issues above, we could define that flag in a more on-point way.

BTW, it strikes me that maybe the reason ANYARRAY isn't insane
to allow in pg_statistic has to do with arrays also being
self-identifying members of that type class, and so possibly
we could get to a place where we're unifying that hack with
this feature addition.  I don't insist on doing that as part of
this patch, but as long as we're trying to think through these
issues, it's something to think about.

            regards, tom lane

Re: [PATCH] Allow anonymous rowtypes in function return column definition

From

Elvis Pranskevichus

Date:

28 January 2019, 17:52:57

On Sunday, January 13, 2019 4:57:48 PM EST Tom Lane wrote:

> I also wonder why we'd allow RECORD but not RECORDARRAY.  

That's by omission.  There's no reason to refuse RECORDARRAY here for 
the same reason why RECORD is accepted.

> More generally, why not any polymorphic type? [...] the
> fact that a record value is self-identifying as long as you know
> it's a record, whereas a random Datum is not a self-identifying
> member of the type class "anyelement", for instance.

Yes that's the reason.  We allow "record" in coldeflist because it 
only happens near a RangeFunction, and set-returning functions always do 
"BlessTupleDesc", which means that RECORDOID data would always have an 
associated TupleDesc and can be processed as a regular composite type.  
Recursion is not an issue, since every instance would have a separate 
TupleDesc even if the shape is the same.

> I feel a bit uncomfortable about defining the new flag to
> CheckAttributeNamesTypes as "allow_anonymous_records"; that
> seems pretty short-sighted and single-purpose, especially in
> view of there being no obvious reason why it shouldn't accept
> RECORDARRAY too.  Maybe with a clearer explanation of the
> issues above, we could define that flag in a more on-point way.

I renamed it to "in_coldeflist", which seems like a clearer indication 
that we are validating that, and not a regular table definition.

> BTW, it strikes me that maybe the reason ANYARRAY isn't insane
> to allow in pg_statistic has to do with arrays also being
> self-identifying members of that type class

That's true.  Array data encode the OID of their element type, but that 
only allows sending the data out, as subscripting or casting anyarray is 
not allowed.  There also seems to be no guarantee that the actual type 
of the array doesn't change from row to row in such a scenario.  Given 
that, I think it would be best to keep anyarray restricted to the system 
catalogs.

I attached the updated patch.

                               Elvis

Attachment

Allow-anonymous-rowtypes-in-coldeflist-v2.patch

Re: [PATCH] Allow anonymous rowtypes in function return column definition

From

Tom Lane

Date:

30 January 2019, 22:59:41

Elvis Pranskevichus <el@prans.net> writes:
> On Sunday, January 13, 2019 4:57:48 PM EST Tom Lane wrote:
>> I feel a bit uncomfortable about defining the new flag to
>> CheckAttributeNamesTypes as "allow_anonymous_records"; that
>> seems pretty short-sighted and single-purpose, especially in
>> view of there being no obvious reason why it shouldn't accept
>> RECORDARRAY too.  Maybe with a clearer explanation of the
>> issues above, we could define that flag in a more on-point way.

> I renamed it to "in_coldeflist", which seems like a clearer indication 
> that we are validating that, and not a regular table definition.

I still found this pretty disjointed.  After a bit of thought I propose
the attached reformulation, which has the callers just tell the routines
which types to allow explicitly, with the justification comments living
at the call sites instead of within the routines.

One point that we hadn't really talked about is what happens when
CheckArrayTypes recurses.  The existing logic is that it just passes
down the same restrictions it was told at the top level, and I kept
that here.  Right now it hardly matters what we pass down, because
the base type of a domain or array can't be a pseudotype, and we
don't really expect to find one in a named composite type either.
The one case where it could matter is if someone tries to use
"pg_statistic" as a field's composite type: that would be allowed if
allow_system_table_mods and otherwise not.  But it's not really
hard to imagine future additions where it would matter and it'd
be important to restrict things as we recurse down.  I think this
formulation makes it easier to see what to do in such cases.

            regards, tom lane

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 910f651..06d18a1 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -445,11 +445,14 @@ heap_create(const char *relname,
  *        this is used to make certain the tuple descriptor contains a
  *        valid set of attribute names and datatypes.  a problem simply
  *        generates ereport(ERROR) which aborts the current transaction.
+ *
+ *        relkind is the relkind of the relation to be created.
+ *        flags controls which datatypes are allowed, cf CheckAttributeType.
  * --------------------------------
  */
 void
 CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
-                         bool allow_system_table_mods)
+                         int flags)
 {
     int            i;
     int            j;
@@ -507,7 +510,7 @@ CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
                            TupleDescAttr(tupdesc, i)->atttypid,
                            TupleDescAttr(tupdesc, i)->attcollation,
                            NIL, /* assume we're creating a new rowtype */
-                           allow_system_table_mods);
+                           flags);
     }
 }

@@ -524,13 +527,22 @@ CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
  * containing_rowtypes.  When checking a to-be-created rowtype, it's
  * sufficient to pass NIL, because there could not be any recursive reference
  * to a not-yet-existing rowtype.
+ *
+ * flags is a bitmask controlling which datatypes we allow.  For the most
+ * part, pseudo-types are disallowed as attribute types, but there are some
+ * exceptions: ANYARRAYOID, RECORDOID, and RECORDARRAYOID can be allowed
+ * in some cases.  (This works because values of those type classes are
+ * self-identifying to some extent.  However, RECORDOID and RECORDARRAYOID
+ * are reliably identifiable only within a session, since the identity info
+ * may use a typmod that is only locally assigned.  The caller is expected
+ * to know whether these cases are safe.)
  * --------------------------------
  */
 void
 CheckAttributeType(const char *attname,
                    Oid atttypid, Oid attcollation,
                    List *containing_rowtypes,
-                   bool allow_system_table_mods)
+                   int flags)
 {
     char        att_typtype = get_typtype(atttypid);
     Oid            att_typelem;
@@ -538,12 +550,18 @@ CheckAttributeType(const char *attname,
     if (att_typtype == TYPTYPE_PSEUDO)
     {
         /*
-         * Refuse any attempt to create a pseudo-type column, except for a
-         * special hack for pg_statistic: allow ANYARRAY when modifying system
-         * catalogs (this allows creating pg_statistic and cloning it during
-         * VACUUM FULL)
+         * We disallow pseudo-type columns, with the exception of ANYARRAY,
+         * RECORD, and RECORD[] when the caller says that those are OK.
+         *
+         * We don't need to worry about recursive containment for RECORD and
+         * RECORD[] because (a) no named composite type should be allowed to
+         * contain those, and (b) two "anonymous" record types couldn't be
+         * considered to be the same type, so infinite recursion isn't
+         * possible.
          */
-        if (atttypid != ANYARRAYOID || !allow_system_table_mods)
+        if (!((atttypid == ANYARRAYOID && (flags & CHKATYPE_ANYARRAY)) ||
+              (atttypid == RECORDOID && (flags & CHKATYPE_ANYRECORD)) ||
+              (atttypid == RECORDARRAYOID && (flags & CHKATYPE_ANYRECORD))))
             ereport(ERROR,
                     (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
                      errmsg("column \"%s\" has pseudo-type %s",
@@ -556,7 +574,7 @@ CheckAttributeType(const char *attname,
          */
         CheckAttributeType(attname, getBaseType(atttypid), attcollation,
                            containing_rowtypes,
-                           allow_system_table_mods);
+                           flags);
     }
     else if (att_typtype == TYPTYPE_COMPOSITE)
     {
@@ -594,7 +612,7 @@ CheckAttributeType(const char *attname,
             CheckAttributeType(NameStr(attr->attname),
                                attr->atttypid, attr->attcollation,
                                containing_rowtypes,
-                               allow_system_table_mods);
+                               flags);
         }

         relation_close(relation, AccessShareLock);
@@ -608,7 +626,7 @@ CheckAttributeType(const char *attname,
          */
         CheckAttributeType(attname, att_typelem, attcollation,
                            containing_rowtypes,
-                           allow_system_table_mods);
+                           flags);
     }

     /*
@@ -1074,7 +1092,13 @@ heap_create_with_catalog(const char *relname,
      */
     Assert(IsNormalProcessingMode() || IsBootstrapProcessingMode());

-    CheckAttributeNamesTypes(tupdesc, relkind, allow_system_table_mods);
+    /*
+     * Validate proposed tupdesc for the desired relkind.  If
+     * allow_system_table_mods is on, allow ANYARRAY to be used; this is a
+     * hack to allow creating pg_statistic and cloning it during VACUUM FULL.
+     */
+    CheckAttributeNamesTypes(tupdesc, relkind,
+                             allow_system_table_mods ? CHKATYPE_ANYARRAY : 0);

     /*
      * This would fail later on anyway, if the relation already exists.  But
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 169b2de..faf6956 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -410,7 +410,7 @@ ConstructTupleDescriptor(Relation heapRelation,
              */
             CheckAttributeType(NameStr(to->attname),
                                to->atttypid, to->attcollation,
-                               NIL, false);
+                               NIL, 0);
         }

         /*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 434be40..35a9ade 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5505,7 +5505,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
     /* make sure datatype is legal for a column */
     CheckAttributeType(colDef->colname, typeOid, collOid,
                        list_make1_oid(rel->rd_rel->reltype),
-                       false);
+                       0);

     /* construct new attribute's pg_attribute entry */
     attribute.attrelid = myrelid;
@@ -9445,7 +9445,7 @@ ATPrepAlterColumnType(List **wqueue,
     /* make sure datatype is legal for a column */
     CheckAttributeType(colName, targettype, targetcollid,
                        list_make1_oid(rel->rd_rel->reltype),
-                       false);
+                       0);

     if (tab->relkind == RELKIND_RELATION ||
         tab->relkind == RELKIND_PARTITIONED_TABLE)
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 3ff799f..f3b6d19 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -1590,9 +1590,15 @@ addRangeTableEntryForFunction(ParseState *pstate,

             /*
              * Ensure that the coldeflist defines a legal set of names (no
-             * duplicates) and datatypes (no pseudo-types, for instance).
+             * duplicates, but we needn't worry about system column names) and
+             * datatypes.  Although we mostly can't allow pseudo-types, it
+             * seems safe to allow RECORD and RECORD[], since values within
+             * those type classes are self-identifying at runtime, and the
+             * coldeflist doesn't represent anything that will be visible to
+             * other sessions.
              */
-            CheckAttributeNamesTypes(tupdesc, RELKIND_COMPOSITE_TYPE, false);
+            CheckAttributeNamesTypes(tupdesc, RELKIND_COMPOSITE_TYPE,
+                                     CHKATYPE_ANYRECORD);
         }
         else
             ereport(ERROR,
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 625b7e5..50fb62b 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -19,6 +19,10 @@
 #include "parser/parse_node.h"


+/* flag bits for CheckAttributeType/CheckAttributeNamesTypes */
+#define CHKATYPE_ANYARRAY        0x01    /* allow ANYARRAY */
+#define CHKATYPE_ANYRECORD        0x02    /* allow RECORD and RECORD[] */
+
 typedef struct RawColumnDefault
 {
     AttrNumber    attnum;            /* attribute to attach default to */
@@ -130,12 +134,12 @@ extern const FormData_pg_attribute *SystemAttributeDefinition(AttrNumber attno);
 extern const FormData_pg_attribute *SystemAttributeByName(const char *attname);

 extern void CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
-                         bool allow_system_table_mods);
+                         int flags);

 extern void CheckAttributeType(const char *attname,
                    Oid atttypid, Oid attcollation,
                    List *containing_rowtypes,
-                   bool allow_system_table_mods);
+                   int flags);

 /* pg_partitioned_table catalog manipulation functions */
 extern void StorePartitionKey(Relation rel,
diff --git a/src/test/regress/expected/rowtypes.out b/src/test/regress/expected/rowtypes.out
index d6a1a33..054faabb 100644
--- a/src/test/regress/expected/rowtypes.out
+++ b/src/test/regress/expected/rowtypes.out
@@ -667,6 +667,17 @@ select row(1, '(1,2)')::testtype6 *<> row(1, '(1,3)')::testtype6;
  t
 (1 row)

+-- anonymous rowtypes in coldeflists
+select q.a, q.b = row(2), q.c = array[row(3)], q.d = row(row(4)) from
+    unnest(array[row(1, row(2), array[row(3)], row(row(4))),
+                 row(2, row(3), array[row(4)], row(row(5)))])
+      as q(a int, b record, c record[], d record);
+ a | ?column? | ?column? | ?column?
+---+----------+----------+----------
+ 1 | t        | t        | t
+ 2 | f        | f        | f
+(2 rows)
+
 drop type testtype1, testtype2, testtype3, testtype4, testtype5, testtype6;
 --
 -- Test case derived from bug #5716: check multiple uses of a rowtype result
diff --git a/src/test/regress/sql/rowtypes.sql b/src/test/regress/sql/rowtypes.sql
index e6d3898..454d462 100644
--- a/src/test/regress/sql/rowtypes.sql
+++ b/src/test/regress/sql/rowtypes.sql
@@ -259,6 +259,12 @@ select row(1, '(1,2)')::testtype6 *< row(1, '(1,3)')::testtype6;
 select row(1, '(1,2)')::testtype6 *>= row(1, '(1,3)')::testtype6;
 select row(1, '(1,2)')::testtype6 *<> row(1, '(1,3)')::testtype6;

+-- anonymous rowtypes in coldeflists
+select q.a, q.b = row(2), q.c = array[row(3)], q.d = row(row(4)) from
+    unnest(array[row(1, row(2), array[row(3)], row(row(4))),
+                 row(2, row(3), array[row(4)], row(row(5)))])
+      as q(a int, b record, c record[], d record);
+
 drop type testtype1, testtype2, testtype3, testtype4, testtype5, testtype6;

Re: [PATCH] Allow anonymous rowtypes in function return column definition

From

Elvis Pranskevichus

Date:

30 January 2019, 23:35:25

On Wednesday, January 30, 2019 5:59:41 PM EST Tom Lane wrote:
> I still found this pretty disjointed.  After a bit of thought I
> propose the attached reformulation, which has the callers just tell
> the routines which types to allow explicitly, with the justification
> comments living at the call sites instead of within the routines.

This is a much better formulation, thank you!

                                  Elvis

Re: [PATCH] Allow anonymous rowtypes in function return column definition

From

Tom Lane

Date:

31 January 2019, 00:26:27

Elvis Pranskevichus <el@prans.net> writes:
> On Wednesday, January 30, 2019 5:59:41 PM EST Tom Lane wrote:
>> I still found this pretty disjointed.  After a bit of thought I
>> propose the attached reformulation, which has the callers just tell
>> the routines which types to allow explicitly, with the justification
>> comments living at the call sites instead of within the routines.

> This is a much better formulation, thank you!

OK, pushed.

            regards, tom lane