Re: problem with RETURNING and update row movement - Mailing list pgsql-hackers

From Tom Lane
Subject Re: problem with RETURNING and update row movement
Date
Msg-id 890077.1619051831@sss.pgh.pa.us
Whole thread Raw
In response to Re: problem with RETURNING and update row movement  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: problem with RETURNING and update row movement  (Amit Langote <amitlangote09@gmail.com>)
List pgsql-hackers
Amit Langote <amitlangote09@gmail.com> writes:
> FWIW, I think we should go ahead and apply the patches for the bug
> reported here.  Anyone who tries to project an updated tuple's system
> columns using RETURNING are likely to face problems one way or
> another, especially if they have partitioned tables containing
> partitions of varying table AMs, but at least they won't face the bug
> discussed here.

Agreed, we should get this fixed in time for the next minor releases.
The issue no longer exists on HEAD, thanks to 86dc90056 having got
rid of per-target-relation variance in the contents of planSlot.
But we still need a fix for the back branches.

So I looked over the v13 patch, and found a couple of things
I didn't like:

* I think what you did in ExecProcessReturning is buggy.  It's
not a great idea to have completely different processes for
getting tableoid set in normal-relation vs foreign-relation
cases, and in this case the foreign-relation case was simply
wrong.  Maybe the bug isn't reachable for lack of support of
cross-partition motion with FDWs, but I'm not sure about that.
We really need to decouple the RETURNING expressions (which
will belong to the source relation) from the value injected
for tableoid (which will belong to the destination).

* I really disliked the API change that ExecInsert is responsible
for computing RETURNING except when it isn't.  That's confusing
and there's no good reason for it, since it's not really any
easier to deal with the case at the call site than inside ExecInsert.

In the attached revision I made ExecInsert handle RETURNING
calculations by asking the callers to pass in the ResultRelInfo
that should be used for the purpose.  We could alternatively
have taken the responsibility for RETURNING out of ExecInsert
altogether, making the callers call ExecProcessReturning.
I think that might have netted out slightly simpler than this.
But we're unlikely to apply such a change in HEAD, so it seemed
better to keep the division of responsibilities the same as it
is in other branches.

Thoughts?  (I've not looked at porting this to v12 or v11 yet.)

            regards, tom lane

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9572ef0690..ff4a2d0e58 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -149,34 +149,40 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
 /*
  * ExecProcessReturning --- evaluate a RETURNING list
  *
- * resultRelInfo: current result rel
+ * projectReturning: the projection to evaluate
+ * resultRelOid: result relation's OID
  * tupleSlot: slot holding tuple actually inserted/updated/deleted
  * planSlot: slot holding tuple returned by top subplan node
  *
+ * In cross-partition UPDATE cases, projectReturning and planSlot are as
+ * for the source partition, and tupleSlot must conform to that.  But
+ * resultRelOid is for the destination partition.
+ *
  * Note: If tupleSlot is NULL, the FDW should have already provided econtext's
  * scan tuple.
  *
  * Returns a slot holding the result tuple
  */
 static TupleTableSlot *
-ExecProcessReturning(ResultRelInfo *resultRelInfo,
+ExecProcessReturning(ProjectionInfo *projectReturning,
+                     Oid resultRelOid,
                      TupleTableSlot *tupleSlot,
                      TupleTableSlot *planSlot)
 {
-    ProjectionInfo *projectReturning = resultRelInfo->ri_projectReturning;
     ExprContext *econtext = projectReturning->pi_exprContext;

     /* Make tuple and any needed join variables available to ExecProject */
     if (tupleSlot)
         econtext->ecxt_scantuple = tupleSlot;
+    else
+        Assert(econtext->ecxt_scantuple);
     econtext->ecxt_outertuple = planSlot;

     /*
-     * RETURNING expressions might reference the tableoid column, so
-     * reinitialize tts_tableOid before evaluating them.
+     * RETURNING expressions might reference the tableoid column, so be sure
+     * we expose the desired OID, ie that of the real target relation.
      */
-    econtext->ecxt_scantuple->tts_tableOid =
-        RelationGetRelid(resultRelInfo->ri_RelationDesc);
+    econtext->ecxt_scantuple->tts_tableOid = resultRelOid;

     /* Compute the RETURNING expressions */
     return ExecProject(projectReturning);
@@ -368,6 +374,16 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype
  *        For INSERT, we have to insert the tuple into the target relation
  *        and insert appropriate tuples into the index relations.
  *
+ *        slot contains the new tuple value to be stored.
+ *        planSlot is the output of the ModifyTable's subplan; we use it
+ *        to access "junk" columns that are not going to be stored.
+ *        In a cross-partition UPDATE, srcSlot is the slot that held the
+ *        updated tuple for the source relation; otherwise it's NULL.
+ *
+ *        returningRelInfo is the resultRelInfo for the source relation of a
+ *        cross-partition UPDATE; otherwise it's the current result relation.
+ *        We use it to process RETURNING lists, for reasons explained below.
+ *
  *        Returns RETURNING result if any, otherwise NULL.
  * ----------------------------------------------------------------
  */
@@ -375,6 +391,8 @@ static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
            TupleTableSlot *slot,
            TupleTableSlot *planSlot,
+           TupleTableSlot *srcSlot,
+           ResultRelInfo *returningRelInfo,
            EState *estate,
            bool canSetTag)
 {
@@ -677,8 +695,40 @@ ExecInsert(ModifyTableState *mtstate,
         ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);

     /* Process RETURNING if present */
-    if (resultRelInfo->ri_projectReturning)
-        result = ExecProcessReturning(resultRelInfo, slot, planSlot);
+    if (returningRelInfo->ri_projectReturning)
+    {
+        /*
+         * In a cross-partition UPDATE with RETURNING, we have to use the
+         * source partition's RETURNING list, because that matches the output
+         * of the planSlot, while the destination partition might have
+         * different resjunk columns.  This means we have to map the
+         * destination slot back to the source's format so we can apply that
+         * RETURNING list.  This is expensive, but it should be an uncommon
+         * corner case, so we won't spend much effort on making it fast.
+         */
+        if (returningRelInfo != resultRelInfo)
+        {
+            Relation    srcRelationDesc = returningRelInfo->ri_RelationDesc;
+            AttrMap    *map;
+
+            map = build_attrmap_by_name_if_req(RelationGetDescr(resultRelationDesc),
+                                               RelationGetDescr(srcRelationDesc));
+            if (map)
+            {
+                /* We assume we can use srcSlot to hold the converted tuple */
+                TupleTableSlot *origSlot = slot;
+
+                slot = execute_attr_map_slot(map, slot, srcSlot);
+                slot->tts_tid = origSlot->tts_tid;
+                slot->tts_tableOid = origSlot->tts_tableOid;
+                free_attrmap(map);
+            }
+        }
+
+        result = ExecProcessReturning(returningRelInfo->ri_projectReturning,
+                                      RelationGetRelid(resultRelationDesc),
+                                      slot, planSlot);
+    }

     return result;
 }
@@ -1027,7 +1077,9 @@ ldelete:;
             }
         }

-        rslot = ExecProcessReturning(resultRelInfo, slot, planSlot);
+        rslot = ExecProcessReturning(resultRelInfo->ri_projectReturning,
+                                     RelationGetRelid(resultRelationDesc),
+                                     slot, planSlot);

         /*
          * Before releasing the target tuple again, make sure rslot has a
@@ -1203,6 +1255,7 @@ lreplace:;
         {
             bool        tuple_deleted;
             TupleTableSlot *ret_slot;
+            TupleTableSlot *orig_slot = slot;
             TupleTableSlot *epqslot = NULL;
             PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
             int            map_index;
@@ -1309,6 +1362,7 @@ lreplace:;
                                            mtstate->rootResultRelInfo, slot);

             ret_slot = ExecInsert(mtstate, slot, planSlot,
+                                  orig_slot, resultRelInfo,
                                   estate, canSetTag);

             /* Revert ExecPrepareTupleRouting's node change. */
@@ -1505,7 +1559,9 @@ lreplace:;

     /* Process RETURNING if present */
     if (resultRelInfo->ri_projectReturning)
-        return ExecProcessReturning(resultRelInfo, slot, planSlot);
+        return ExecProcessReturning(resultRelInfo->ri_projectReturning,
+                                    RelationGetRelid(resultRelationDesc),
+                                    slot, planSlot);

     return NULL;
 }
@@ -2154,7 +2210,9 @@ ExecModifyTable(PlanState *pstate)
              * ExecProcessReturning by IterateDirectModify, so no need to
              * provide it here.
              */
-            slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
+            slot = ExecProcessReturning(resultRelInfo->ri_projectReturning,
+                                        RelationGetRelid(resultRelInfo->ri_RelationDesc),
+                                        NULL, planSlot);

             estate->es_result_relation_info = saved_resultRelInfo;
             return slot;
@@ -2244,6 +2302,7 @@ ExecModifyTable(PlanState *pstate)
                     slot = ExecPrepareTupleRouting(node, estate, proute,
                                                    resultRelInfo, slot);
                 slot = ExecInsert(node, slot, planSlot,
+                                  NULL, estate->es_result_relation_info,
                                   estate, node->canSetTag);
                 /* Revert ExecPrepareTupleRouting's state change. */
                 if (proute)
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index bf939d79f6..807005c40a 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -445,6 +445,46 @@ UPDATE range_parted set c = 95 WHERE a = 'b' and b > 10 and c > 100 returning (r
  part_c_1_100   | b | 17 |  95 | 19 |
 (6 rows)

+-- The following tests computing RETURNING when the source and the destination
+-- partitions of a UPDATE row movement operation have different tuple
+-- descriptors, which has been shown to be problematic in the cases where the
+-- RETURNING targetlist contains non-target relation attributes that are
+-- computed by referring to the source partition plan's output tuple.  Also,
+-- a trigger on the destination relation may change the tuple, which must be
+-- reflected in the RETURNING output, so we test that too.
+CREATE TABLE part_c_1_c_20 (LIKE range_parted);
+ALTER TABLE part_c_1_c_20 DROP a, DROP b, ADD a text, ADD b bigint;
+ALTER TABLE range_parted ATTACH PARTITION part_c_1_c_20 FOR VALUES FROM ('c', 1) TO ('c', 20);
+CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN NEW.e := 'in trigfunc()'; RETURN NEW; END;
$$;
+CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
+UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b
RETURNINGtableoid::regclass, *; 
+   tableoid    | a | b  |  c  | d |       e       | x | y
+---------------+---+----+-----+---+---------------+---+----
+ part_c_1_c_20 | c |  1 |   1 | 1 | in trigfunc() | a |  1
+ part_c_1_c_20 | c | 10 | 200 | 1 | in trigfunc() | a | 10
+ part_c_1_c_20 | c | 12 |  96 | 1 | in trigfunc() | b | 12
+(3 rows)
+
+DROP TRIGGER trig ON part_c_1_c_20;
+DROP FUNCTION trigfunc;
+:init_range_parted;
+CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN RETURN NULL; END; $$;
+CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
+UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b
RETURNINGtableoid::regclass, *; 
+ tableoid | a | b | c | d | e | x | y
+----------+---+---+---+---+---+---+---
+(0 rows)
+
+:show_data;
+   partname   | a | b  |  c  | d  | e
+--------------+---+----+-----+----+---
+ part_c_1_100 | b | 13 |  97 |  2 |
+ part_d_15_20 | b | 15 | 105 | 16 |
+ part_d_15_20 | b | 17 | 105 | 19 |
+(3 rows)
+
+DROP TABLE part_c_1_c_20;
+DROP FUNCTION trigfunc;
 -- Transition tables with update row movement
 :init_range_parted;
 CREATE FUNCTION trans_updatetrigfunc() RETURNS trigger LANGUAGE plpgsql AS
diff --git a/src/test/regress/sql/update.sql b/src/test/regress/sql/update.sql
index 8c558a7bc7..dc7274a4bb 100644
--- a/src/test/regress/sql/update.sql
+++ b/src/test/regress/sql/update.sql
@@ -236,6 +236,31 @@ DROP VIEW upview;
 UPDATE range_parted set c = 95 WHERE a = 'b' and b > 10 and c > 100 returning (range_parted), *;
 :show_data;

+-- The following tests computing RETURNING when the source and the destination
+-- partitions of a UPDATE row movement operation have different tuple
+-- descriptors, which has been shown to be problematic in the cases where the
+-- RETURNING targetlist contains non-target relation attributes that are
+-- computed by referring to the source partition plan's output tuple.  Also,
+-- a trigger on the destination relation may change the tuple, which must be
+-- reflected in the RETURNING output, so we test that too.
+CREATE TABLE part_c_1_c_20 (LIKE range_parted);
+ALTER TABLE part_c_1_c_20 DROP a, DROP b, ADD a text, ADD b bigint;
+ALTER TABLE range_parted ATTACH PARTITION part_c_1_c_20 FOR VALUES FROM ('c', 1) TO ('c', 20);
+CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN NEW.e := 'in trigfunc()'; RETURN NEW; END;
$$;
+CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
+UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b
RETURNINGtableoid::regclass, *; 
+
+DROP TRIGGER trig ON part_c_1_c_20;
+DROP FUNCTION trigfunc;
+
+:init_range_parted;
+CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN RETURN NULL; END; $$;
+CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
+UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b
RETURNINGtableoid::regclass, *; 
+:show_data;
+
+DROP TABLE part_c_1_c_20;
+DROP FUNCTION trigfunc;

 -- Transition tables with update row movement
 :init_range_parted;

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Stale description for pg_basebackup
Next
From: Bharath Rupireddy
Date:
Subject: Re: TRUNCATE on foreign table