ExecTypeSetColNames is fundamentally broken - Mailing list pgsql-hackers

From Tom Lane
Subject ExecTypeSetColNames is fundamentally broken
Date
Msg-id 2950001.1638729947@sss.pgh.pa.us
Whole thread Raw
Responses Re: ExecTypeSetColNames is fundamentally broken
List pgsql-hackers
Changing
Fcc: +inbox
--------
I looked into the failure reported at [1].  Basically what's happening
there is that we're allowing a composite datum of type RECORD to get
stored into a table, whereupon other backends can't make sense of it
since they lack the appropriate typcache entry.  The reason the datum
is marked as type RECORD is that ExecTypeSetColNames set things up
that way, after observing that the tuple descriptor obtained from
the current table definition didn't have the column names it thought
it should have.

Now, in the test case as given,    ExecTypeSetColNames is in error to
think that, because it fails to account for the possibility that the
tupdesc contains dropped columns that weren't dropped when the relevant
RTE was made.  However, if the test case is modified so that we just
rename rather than drop some pre-existing column, then even with a
fix for that problem ExecTypeSetColNames would do the wrong thing.

I made several attempts to work around this problem, but I've now
concluded that changing the type OID in ExecTypeSetColNames is just
fundamentally broken.  It can't be okay to decide that a Var that
claims to emit type A actually emits type B, and especially not to
do so as late as executor startup.  I speculated in the other thread
that we could do so during planning instead, but that turns out to
just move the problems around.  I think this must be so, because the
whole idea is bogus.  For example, if we have a function that is
declared to take type "ct", it can't be okay in general to pass it
type "record" instead.  We've mistakenly thought that we could fuzz
this as long as the two types are physically compatible --- but how
can we think that the column names of a composite type aren't a
basic part of its definition?

So 0001 attached fixes this by revoking the decision to apply
ExecTypeSetColNames in cases where a Var or RowExpr is declared
to return a named composite type.  This causes a couple of regression
test results to change, but they are ones that were specifically
added to exercise this behavior that we now see to be invalid.
(In passing, this also adjusts ExecEvalWholeRowVar to fail if the
Var claims to be of a domain-over-composite.  I'm not sure what
I was thinking when I changed it to allow that; the case should
not arise, and if it did, it'd be questionable because we're not
checking domain constraints here.)

0002 is some inessential mop-up that avoids storing useless column
name lists in RowExprs where they won't be used.

I'm not really thrilled about back-patching a behavioral change
of this sort, but I don't see any good alternative.  Corrupting
people's tables is not okay.

Thoughts?

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAOFAq3BeawPiw9pc3bVGZ%3DRint2txWEBCeDC2wNAhtCZoo2ZqA%40mail.gmail.com

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 892b4e17e0..f0f3b4ffb0 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -1913,16 +1913,16 @@ ExecInitExprRec(Expr *node, ExprState *state,
                 {
                     /* generic record, use types of given expressions */
                     tupdesc = ExecTypeFromExprList(rowexpr->args);
+                    /* ... but adopt RowExpr's column aliases */
+                    ExecTypeSetColNames(tupdesc, rowexpr->colnames);
+                    /* Bless the tupdesc so it can be looked up later */
+                    BlessTupleDesc(tupdesc);
                 }
                 else
                 {
                     /* it's been cast to a named type, use that */
                     tupdesc = lookup_rowtype_tupdesc_copy(rowexpr->row_typeid, -1);
                 }
-                /* In either case, adopt RowExpr's column aliases */
-                ExecTypeSetColNames(tupdesc, rowexpr->colnames);
-                /* Bless the tupdesc in case it's now of type RECORD */
-                BlessTupleDesc(tupdesc);

                 /*
                  * In the named-type case, the tupdesc could have more columns
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index eb49817cee..b7b79e0b75 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -4021,12 +4021,8 @@ ExecEvalWholeRowVar(ExprState *state, ExprEvalStep *op, ExprContext *econtext)
              * generates an INT4 NULL regardless of the dropped column type).
              * If we find a dropped column and cannot verify that case (1)
              * holds, we have to use the slow path to check (2) for each row.
-             *
-             * If vartype is a domain over composite, just look through that
-             * to the base composite type.
              */
-            var_tupdesc = lookup_rowtype_tupdesc_domain(variable->vartype,
-                                                        -1, false);
+            var_tupdesc = lookup_rowtype_tupdesc(variable->vartype, -1);

             slot_tupdesc = slot->tts_tupleDescriptor;

@@ -4063,9 +4059,8 @@ ExecEvalWholeRowVar(ExprState *state, ExprEvalStep *op, ExprContext *econtext)

             /*
              * Use the variable's declared rowtype as the descriptor for the
-             * output values, modulo possibly assigning new column names
-             * below. In particular, we *must* absorb any attisdropped
-             * markings.
+             * output values.  In particular, we *must* absorb any
+             * attisdropped markings.
              */
             oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
             output_tupdesc = CreateTupleDescCopy(var_tupdesc);
@@ -4083,39 +4078,38 @@ ExecEvalWholeRowVar(ExprState *state, ExprEvalStep *op, ExprContext *econtext)
             oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
             output_tupdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
             MemoryContextSwitchTo(oldcontext);
-        }

-        /*
-         * Construct a tuple descriptor for the composite values we'll
-         * produce, and make sure its record type is "blessed".  The main
-         * reason to do this is to be sure that operations such as
-         * row_to_json() will see the desired column names when they look up
-         * the descriptor from the type information embedded in the composite
-         * values.
-         *
-         * We already got the correct physical datatype info above, but now we
-         * should try to find the source RTE and adopt its column aliases, in
-         * case they are different from the original rowtype's names.  For
-         * example, in "SELECT foo(t) FROM tab t(x,y)", the first two columns
-         * in the composite output should be named "x" and "y" regardless of
-         * tab's column names.
-         *
-         * If we can't locate the RTE, assume the column names we've got are
-         * OK.  (As of this writing, the only cases where we can't locate the
-         * RTE are in execution of trigger WHEN clauses, and then the Var will
-         * have the trigger's relation's rowtype, so its names are fine.)
-         * Also, if the creator of the RTE didn't bother to fill in an eref
-         * field, assume our column names are OK.  (This happens in COPY, and
-         * perhaps other places.)
-         */
-        if (econtext->ecxt_estate &&
-            variable->varno <= econtext->ecxt_estate->es_range_table_size)
-        {
-            RangeTblEntry *rte = exec_rt_fetch(variable->varno,
-                                               econtext->ecxt_estate);
+            /*
+             * It's possible that the input slot is a relation scan slot and
+             * so is marked with that relation's rowtype.  But we're supposed
+             * to be returning RECORD, so reset to that.
+             */
+            output_tupdesc->tdtypeid = RECORDOID;
+            output_tupdesc->tdtypmod = -1;

-            if (rte->eref)
-                ExecTypeSetColNames(output_tupdesc, rte->eref->colnames);
+            /*
+             * We already got the correct physical datatype info above, but
+             * now we should try to find the source RTE and adopt its column
+             * aliases, since it's unlikely that the input slot has the
+             * desired names.
+             *
+             * If we can't locate the RTE, assume the column names we've got
+             * are OK.  (As of this writing, the only cases where we can't
+             * locate the RTE are in execution of trigger WHEN clauses, and
+             * then the Var will have the trigger's relation's rowtype, so its
+             * names are fine.)  Also, if the creator of the RTE didn't bother
+             * to fill in an eref field, assume our column names are OK. (This
+             * happens in COPY, and perhaps other places.)
+             */
+            if (econtext->ecxt_estate &&
+                variable->varno <= econtext->ecxt_estate->es_range_table_size)
+            {
+                RangeTblEntry *rte = exec_rt_fetch(variable->varno,
+                                                   econtext->ecxt_estate);
+
+                if (rte->eref)
+                    ExecTypeSetColNames(output_tupdesc, rte->eref->colnames);
+            }
         }

         /* Bless the tupdesc if needed, and save it in the execution state */
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index f447802843..5004b3b165 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -2022,51 +2022,40 @@ ExecTypeFromExprList(List *exprList)
 }

 /*
- * ExecTypeSetColNames - set column names in a TupleDesc
+ * ExecTypeSetColNames - set column names in a RECORD TupleDesc
  *
  * Column names must be provided as an alias list (list of String nodes).
- *
- * For some callers, the supplied tupdesc has a named rowtype (not RECORD)
- * and it is moderately likely that the alias list matches the column names
- * already present in the tupdesc.  If we do change any column names then
- * we must reset the tupdesc's type to anonymous RECORD; but we avoid doing
- * so if no names change.
  */
 void
 ExecTypeSetColNames(TupleDesc typeInfo, List *namesList)
 {
-    bool        modified = false;
     int            colno = 0;
     ListCell   *lc;

+    /* It's only OK to change col names in a not-yet-blessed RECORD type */
+    Assert(typeInfo->tdtypeid == RECORDOID);
+    Assert(typeInfo->tdtypmod < 0);
+
     foreach(lc, namesList)
     {
         char       *cname = strVal(lfirst(lc));
         Form_pg_attribute attr;

-        /* Guard against too-long names list */
+        /* Guard against too-long names list (probably can't happen) */
         if (colno >= typeInfo->natts)
             break;
         attr = TupleDescAttr(typeInfo, colno);
         colno++;

-        /* Ignore empty aliases (these must be for dropped columns) */
-        if (cname[0] == '\0')
+        /*
+         * Do nothing for empty aliases or dropped columns (these cases
+         * probably can't arise in RECORD types, either)
+         */
+        if (cname[0] == '\0' || attr->attisdropped)
             continue;

-        /* Change tupdesc only if alias is actually different */
-        if (strcmp(cname, NameStr(attr->attname)) != 0)
-        {
-            namestrcpy(&(attr->attname), cname);
-            modified = true;
-        }
-    }
-
-    /* If we modified the tupdesc, it's now a new record type */
-    if (modified)
-    {
-        typeInfo->tdtypeid = RECORDOID;
-        typeInfo->tdtypmod = -1;
+        /* OK, assign the column name */
+        namestrcpy(&(attr->attname), cname);
     }
 }

diff --git a/src/test/regress/expected/rowtypes.out b/src/test/regress/expected/rowtypes.out
index 0e71baf8fb..3245fb6a35 100644
--- a/src/test/regress/expected/rowtypes.out
+++ b/src/test/regress/expected/rowtypes.out
@@ -1010,14 +1010,15 @@ select row_to_json(i) from int8_tbl i;
  {"q1":4567890123456789,"q2":-4567890123456789}
 (5 rows)

+-- since "i" is of type "int8_tbl", attaching aliases doesn't change anything:
 select row_to_json(i) from int8_tbl i(x,y);
-                 row_to_json
-----------------------------------------------
- {"x":123,"y":456}
- {"x":123,"y":4567890123456789}
- {"x":4567890123456789,"y":123}
- {"x":4567890123456789,"y":4567890123456789}
- {"x":4567890123456789,"y":-4567890123456789}
+                  row_to_json
+------------------------------------------------
+ {"q1":123,"q2":456}
+ {"q1":123,"q2":4567890123456789}
+ {"q1":4567890123456789,"q2":123}
+ {"q1":4567890123456789,"q2":4567890123456789}
+ {"q1":4567890123456789,"q2":-4567890123456789}
 (5 rows)

 create temp view vv1 as select * from int8_tbl;
@@ -1032,15 +1033,16 @@ select row_to_json(i) from vv1 i;
 (5 rows)

 select row_to_json(i) from vv1 i(x,y);
-                 row_to_json
-----------------------------------------------
- {"x":123,"y":456}
- {"x":123,"y":4567890123456789}
- {"x":4567890123456789,"y":123}
- {"x":4567890123456789,"y":4567890123456789}
- {"x":4567890123456789,"y":-4567890123456789}
+                  row_to_json
+------------------------------------------------
+ {"q1":123,"q2":456}
+ {"q1":123,"q2":4567890123456789}
+ {"q1":4567890123456789,"q2":123}
+ {"q1":4567890123456789,"q2":4567890123456789}
+ {"q1":4567890123456789,"q2":-4567890123456789}
 (5 rows)

+-- in these examples, we'll report the exposed column names of the subselect:
 select row_to_json(ss) from
   (select q1, q2 from int8_tbl) as ss;
                   row_to_json
diff --git a/src/test/regress/sql/rowtypes.sql b/src/test/regress/sql/rowtypes.sql
index 845e3305f3..b8786603d9 100644
--- a/src/test/regress/sql/rowtypes.sql
+++ b/src/test/regress/sql/rowtypes.sql
@@ -417,12 +417,14 @@ select longname(f) from fullname f;
 --

 select row_to_json(i) from int8_tbl i;
+-- since "i" is of type "int8_tbl", attaching aliases doesn't change anything:
 select row_to_json(i) from int8_tbl i(x,y);

 create temp view vv1 as select * from int8_tbl;
 select row_to_json(i) from vv1 i;
 select row_to_json(i) from vv1 i(x,y);

+-- in these examples, we'll report the exposed column names of the subselect:
 select row_to_json(ss) from
   (select q1, q2 from int8_tbl) as ss;
 select row_to_json(ss) from
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 387a35e112..43674aace4 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -2279,8 +2279,8 @@ pullup_replace_vars_callback(Var *var,
          * If generating an expansion for a var of a named rowtype (ie, this
          * is a plain relation RTE), then we must include dummy items for
          * dropped columns.  If the var is RECORD (ie, this is a JOIN), then
-         * omit dropped columns. Either way, attach column names to the
-         * RowExpr for use of ruleutils.c.
+         * omit dropped columns.  In the latter case, attach column names to
+         * the RowExpr for use of the executor and ruleutils.c.
          *
          * In order to be able to cache the results, we always generate the
          * expansion with varlevelsup = 0, and then adjust if needed.
@@ -2301,7 +2301,7 @@ pullup_replace_vars_callback(Var *var,
         rowexpr->args = fields;
         rowexpr->row_typeid = var->vartype;
         rowexpr->row_format = COERCE_IMPLICIT_CAST;
-        rowexpr->colnames = colnames;
+        rowexpr->colnames = (var->vartype == RECORDOID) ? colnames : NIL;
         rowexpr->location = var->location;
         newnode = (Node *) rowexpr;

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index e57c3aa124..b30e8c9be6 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -802,6 +802,7 @@ flatten_join_alias_vars_mutator(Node *node,
             rowexpr->args = fields;
             rowexpr->row_typeid = var->vartype;
             rowexpr->row_format = COERCE_IMPLICIT_CAST;
+            /* vartype will always be RECORDOID, so we always need colnames */
             rowexpr->colnames = colnames;
             rowexpr->location = var->location;

diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index d4e0b8b4de..30922a006a 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -1424,8 +1424,8 @@ ReplaceVarsFromTargetList_callback(Var *var,
          * If generating an expansion for a var of a named rowtype (ie, this
          * is a plain relation RTE), then we must include dummy items for
          * dropped columns.  If the var is RECORD (ie, this is a JOIN), then
-         * omit dropped columns.  Either way, attach column names to the
-         * RowExpr for use of ruleutils.c.
+         * omit dropped columns.  In the latter case, attach column names to
+         * the RowExpr for use of the executor and ruleutils.c.
          */
         expandRTE(rcon->target_rte,
                   var->varno, var->varlevelsup, var->location,
@@ -1438,7 +1438,7 @@ ReplaceVarsFromTargetList_callback(Var *var,
         rowexpr->args = fields;
         rowexpr->row_typeid = var->vartype;
         rowexpr->row_format = COERCE_IMPLICIT_CAST;
-        rowexpr->colnames = colnames;
+        rowexpr->colnames = (var->vartype == RECORDOID) ? colnames : NIL;
         rowexpr->location = var->location;

         return (Node *) rowexpr;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 433437643e..b039ea305a 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1052,15 +1052,13 @@ typedef struct ArrayExpr
  * than vice versa.)  It is important not to assume that length(args) is
  * the same as the number of columns logically present in the rowtype.
  *
- * colnames provides field names in cases where the names can't easily be
- * obtained otherwise.  Names *must* be provided if row_typeid is RECORDOID.
- * If row_typeid identifies a known composite type, colnames can be NIL to
- * indicate the type's cataloged field names apply.  Note that colnames can
- * be non-NIL even for a composite type, and typically is when the RowExpr
- * was created by expanding a whole-row Var.  This is so that we can retain
- * the column alias names of the RTE that the Var referenced (which would
- * otherwise be very difficult to extract from the parsetree).  Like the
- * args list, colnames is one-for-one with physical fields of the rowtype.
+ * colnames provides field names if the ROW() result is of type RECORD.
+ * Names *must* be provided if row_typeid is RECORDOID; but if it is a
+ * named composite type, colnames will be ignored in favor of using the
+ * type's cataloged field names, so colnames should be NIL.  Like the
+ * args list, colnames is defined to be one-for-one with physical fields
+ * of the rowtype (although dropped columns shouldn't appear in the
+ * RECORD case, so this fine point is currently moot).
  */
 typedef struct RowExpr
 {

pgsql-hackers by date:

Previous
From: Chapman Flack
Date:
Subject: Re: The "char" type versus non-ASCII characters
Next
From: Noah Misch
Date:
Subject: Re: enable certain TAP tests for MSVC builds