Thread: making update/delete of inheritance trees scale better

making update/delete of inheritance trees scale better

From
Amit Langote
Date:
Here is a sketch for implementing the design that Tom described here:
https://www.postgresql.org/message-id/flat/357.1550612935%40sss.pgh.pa.us

In short, we would like to have only one plan for ModifyTable to get
tuples out of to update/delete, not N for N child result relations as
is done currently.

I suppose things are the way they are because creating a separate plan
for each result relation makes the job of ModifyTable node very
simple, which is currently this:

1. Take the plan's output tuple, extract the tupleid of the tuple to
update/delete in the currently active result relation,
2. If delete, go to 3, else if update, filter out the junk columns
from the above tuple
3. Call ExecUpdate()/ExecDelete() on the result relation with the new
tuple, if any

If we make ModifyTable do a little more work for the inheritance case,
we can create only one plan but without "expanding" the targetlist.
That is, it will contain entries only for attributes that are assigned
values in the SET clause.  This makes the plan reusable across result
relations, because all child relations must have those attributes,
even though the attribute numbers might be different.  Anyway, the
work that ModifyTable will now have to do is this:

1. Take the plan's output tuple, extract tupleid of the tuple to
update/delete and "tableoid"
2. Select the result relation to operate on using the tableoid
3. If delete, go to 4, else if update, fetch the tuple identified by
tupleid from the result relation and fill in the unassigned columns
using that "old" tuple, also filtering out the junk columns
4. Call ExecUpdate()/ExecDelete() on the result relation with the new
tuple, if any

I do think that doing this would be worthwhile even if we may be
increasing ModifyTable's per-row overhead slightly, because planning
overhead of the current approach is very significant, especially for
partition trees with beyond a couple of thousand partitions.  As to
how bad the problem is, trying to create a generic plan for `update
foo set ... where key = $1`, where foo has over 2000 partitions,
causes OOM even on a machine with 6GB of memory.

The one plan shared by all result relations will be same as the one we
would get if the query were SELECT, except it will contain junk
attributes such as ctid needed to identify tuples and a new "tableoid"
junk attribute if multiple result relations will be present due to
inheritance.  One major way in which this targetlist differs from the
current per-result-relation plans is that it won't be passed through
expand_targetlist(), because the set of unassigned attributes may not
be unique among children.  As mentioned above, those missing
attributes will be filled by ModifyTable doing some extra work,
whereas previously they would have come with the plan's output tuple.

For child result relations that are foreign tables, their FDW adds
junk attribute(s) to the query’s targetlist by updating it in-place
(AddForeignUpdateTargets).  However, as the child tables will no
longer get their own parsetree, we must use some hack around this
interface to obtain the foreign table specific junk attributes and add
them to the original/parent query’s targetlist.  Assuming that all or
most of the children will belong to the same FDW, we will end up with
only a handful such junk columns in the final targetlist.  I am not
sure if it's worthwhile to change the API of AddForeignUpdateTargets
to require FDWs to not scribble on the passed-in parsetree as part of
this patch.

As for how ModifyTable will create the new tuple for updates, I have
decided to use a ProjectionInfo for each result relation, which
projects a full, *clean* tuple ready to be put back into the relation.
When projecting, plan’s output tuple serves as OUTER tuple and the old
tuple fetched to fill unassigned attributes serves as SCAN tuple.  By
having this ProjectionInfo also serve as the “junk filter”, we don't
need JunkFilters.  The targetlist that this projection computes is
same as that of the result-relation-specific plan.  Initially, I
thought to generate this "expanded" targetlist in
ExecInitModifyTable().  But as it can be somewhat expensive, doing it
only once in the planner seemed like a good idea.  These
per-result-relations targetlists are carried in the ModifyTable node.

To identify the result relation from the tuple produced by the plan,
“tableoid” junk column will be used.  As the tuples for different
result relations won’t necessarily come out in the order in which
result relations are laid out in the ModifyTable node, we need a way
to map the tableoid value to result relation indexes.  I have decided
to use a hash table here.

A couple of things that I didn't think very hard what to do about now,
but may revisit later.

* We will no longer be able use DirectModify APIs to push updates to
remote servers for foreign child result relations

* Over in [1], I have said that we get run-time pruning for free for
ModifyTable because the plan we are using is same as that for SELECT,
although now I think that I hadn't thought that through.  With the PoC
patch that I have:

prepare q as update foo set a = 250001 where a = $1;
set plan_cache_mode to 'force_generic_plan';
explain execute q(1);
                             QUERY PLAN
--------------------------------------------------------------------
 Update on foo  (cost=0.00..142.20 rows=40 width=14)
   Update on foo_1
   Update on foo_2 foo
   Update on foo_3 foo
   Update on foo_4 foo
   ->  Append  (cost=0.00..142.20 rows=40 width=14)
         Subplans Removed: 3
         ->  Seq Scan on foo_1  (cost=0.00..35.50 rows=10 width=14)
               Filter: (a = $1)
(9 rows)

While it's true that we will never have to actually update foo_2,
foo_3, and foo_4, ModifyTable still sets up its ResultRelInfos, which
ideally it shouldn't.  Maybe we'll need to do something about that
after all.

I will post the patch shortly.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA%2BHiwqGXmP3-S9y%3DOQHyJyeWnZSOmcxBGdgAMWcLUOsnPTL88w%40mail.gmail.com



Re: making update/delete of inheritance trees scale better

From
Ashutosh Bapat
Date:
On Fri, May 8, 2020 at 7:03 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Here is a sketch for implementing the design that Tom described here:
> https://www.postgresql.org/message-id/flat/357.1550612935%40sss.pgh.pa.us
>
> In short, we would like to have only one plan for ModifyTable to get
> tuples out of to update/delete, not N for N child result relations as
> is done currently.
>
> I suppose things are the way they are because creating a separate plan
> for each result relation makes the job of ModifyTable node very
> simple, which is currently this:
>
> 1. Take the plan's output tuple, extract the tupleid of the tuple to
> update/delete in the currently active result relation,
> 2. If delete, go to 3, else if update, filter out the junk columns
> from the above tuple
> 3. Call ExecUpdate()/ExecDelete() on the result relation with the new
> tuple, if any
>
> If we make ModifyTable do a little more work for the inheritance case,
> we can create only one plan but without "expanding" the targetlist.
> That is, it will contain entries only for attributes that are assigned
> values in the SET clause.  This makes the plan reusable across result
> relations, because all child relations must have those attributes,
> even though the attribute numbers might be different.  Anyway, the
> work that ModifyTable will now have to do is this:
>
> 1. Take the plan's output tuple, extract tupleid of the tuple to
> update/delete and "tableoid"
> 2. Select the result relation to operate on using the tableoid
> 3. If delete, go to 4, else if update, fetch the tuple identified by
> tupleid from the result relation and fill in the unassigned columns
> using that "old" tuple, also filtering out the junk columns
> 4. Call ExecUpdate()/ExecDelete() on the result relation with the new
> tuple, if any
>
> I do think that doing this would be worthwhile even if we may be
> increasing ModifyTable's per-row overhead slightly, because planning
> overhead of the current approach is very significant, especially for
> partition trees with beyond a couple of thousand partitions.  As to
> how bad the problem is, trying to create a generic plan for `update
> foo set ... where key = $1`, where foo has over 2000 partitions,
> causes OOM even on a machine with 6GB of memory.

Per row overhead would be incurred for every row whereas the plan time
overhead is one-time or in case of a prepared statement almost free.
So we need to compare it esp. when there are 2000 partitions and all
of them are being updated. But generally I agree that this would be a
better approach. It might help using PWJ when the result relation
joins with other partitioned table. I am not sure whether that
effectively happens today by partition pruning. More on this later.

>
> The one plan shared by all result relations will be same as the one we
> would get if the query were SELECT, except it will contain junk
> attributes such as ctid needed to identify tuples and a new "tableoid"
> junk attribute if multiple result relations will be present due to
> inheritance.  One major way in which this targetlist differs from the
> current per-result-relation plans is that it won't be passed through
> expand_targetlist(), because the set of unassigned attributes may not
> be unique among children.  As mentioned above, those missing
> attributes will be filled by ModifyTable doing some extra work,
> whereas previously they would have come with the plan's output tuple.
>
> For child result relations that are foreign tables, their FDW adds
> junk attribute(s) to the query’s targetlist by updating it in-place
> (AddForeignUpdateTargets).  However, as the child tables will no
> longer get their own parsetree, we must use some hack around this
> interface to obtain the foreign table specific junk attributes and add
> them to the original/parent query’s targetlist.  Assuming that all or
> most of the children will belong to the same FDW, we will end up with
> only a handful such junk columns in the final targetlist.  I am not
> sure if it's worthwhile to change the API of AddForeignUpdateTargets
> to require FDWs to not scribble on the passed-in parsetree as part of
> this patch.

What happens if there's a mixture of foreign and local partitions or
mixture of FDWs? Injecting junk columns from all FDWs in the top level
target list will cause error because those attributes won't be
available everywhere.

>
> As for how ModifyTable will create the new tuple for updates, I have
> decided to use a ProjectionInfo for each result relation, which
> projects a full, *clean* tuple ready to be put back into the relation.
> When projecting, plan’s output tuple serves as OUTER tuple and the old
> tuple fetched to fill unassigned attributes serves as SCAN tuple.  By
> having this ProjectionInfo also serve as the “junk filter”, we don't
> need JunkFilters.  The targetlist that this projection computes is
> same as that of the result-relation-specific plan.  Initially, I
> thought to generate this "expanded" targetlist in
> ExecInitModifyTable().  But as it can be somewhat expensive, doing it
> only once in the planner seemed like a good idea.  These
> per-result-relations targetlists are carried in the ModifyTable node.
>
> To identify the result relation from the tuple produced by the plan,
> “tableoid” junk column will be used.  As the tuples for different
> result relations won’t necessarily come out in the order in which
> result relations are laid out in the ModifyTable node, we need a way
> to map the tableoid value to result relation indexes.  I have decided
> to use a hash table here.

Can we plan the scan query to add a sort node to order the rows by tableoid?

>
> A couple of things that I didn't think very hard what to do about now,
> but may revisit later.
>
> * We will no longer be able use DirectModify APIs to push updates to
> remote servers for foreign child result relations

If we convert a whole DML into partitionwise DML (just as it happens
today unintentionally), we should be able to use DirectModify. PWJ
will help there. But even we can detect that the scan underlying a
particular partition can be evaluated completely on the node same as
where the partition resides, we should be able to use DirectModify.
But if we are not able to support this optimization, the queries which
benefit from it for today won't perform well. I think we need to think
about this now instead of leave for later. Otherwise, make it so that
we use the old way when there are foreign partitions and new way
otherwise.

>
> * Over in [1], I have said that we get run-time pruning for free for
> ModifyTable because the plan we are using is same as that for SELECT,
> although now I think that I hadn't thought that through.  With the PoC
> patch that I have:
>
> prepare q as update foo set a = 250001 where a = $1;
> set plan_cache_mode to 'force_generic_plan';
> explain execute q(1);
>                              QUERY PLAN
> --------------------------------------------------------------------
>  Update on foo  (cost=0.00..142.20 rows=40 width=14)
>    Update on foo_1
>    Update on foo_2 foo
>    Update on foo_3 foo
>    Update on foo_4 foo
>    ->  Append  (cost=0.00..142.20 rows=40 width=14)
>          Subplans Removed: 3
>          ->  Seq Scan on foo_1  (cost=0.00..35.50 rows=10 width=14)
>                Filter: (a = $1)
> (9 rows)
>
> While it's true that we will never have to actually update foo_2,
> foo_3, and foo_4, ModifyTable still sets up its ResultRelInfos, which
> ideally it shouldn't.  Maybe we'll need to do something about that
> after all.

* Tuple re-routing during UPDATE. For now it's disabled so your design
should work. But we shouldn't design this feature in such a way that
it comes in the way to enable tuple re-routing in future :).

--
Best Wishes,
Ashutosh Bapat



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
Hi Ashutosh,

Thanks for chiming in.

On Mon, May 11, 2020 at 9:58 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Fri, May 8, 2020 at 7:03 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > I do think that doing this would be worthwhile even if we may be
> > increasing ModifyTable's per-row overhead slightly, because planning
> > overhead of the current approach is very significant, especially for
> > partition trees with beyond a couple of thousand partitions.  As to
> > how bad the problem is, trying to create a generic plan for `update
> > foo set ... where key = $1`, where foo has over 2000 partitions,
> > causes OOM even on a machine with 6GB of memory.
>
> Per row overhead would be incurred for every row whereas the plan time
> overhead is one-time or in case of a prepared statement almost free.
> So we need to compare it esp. when there are 2000 partitions and all
> of them are being updated.

I assume that such UPDATEs would be uncommon.

> But generally I agree that this would be a
> better approach. It might help using PWJ when the result relation
> joins with other partitioned table.

It does, because the plan below ModifyTable is same as if the query
were SELECT instead of UPDATE; with my PoC:

explain (costs off) update foo set a = foo2.a + 1 from foo foo2 where
foo.a = foo2.a;
                    QUERY PLAN
--------------------------------------------------
 Update on foo
   Update on foo_1
   Update on foo_2
   ->  Append
         ->  Merge Join
               Merge Cond: (foo_1.a = foo2_1.a)
               ->  Sort
                     Sort Key: foo_1.a
                     ->  Seq Scan on foo_1
               ->  Sort
                     Sort Key: foo2_1.a
                     ->  Seq Scan on foo_1 foo2_1
         ->  Merge Join
               Merge Cond: (foo_2.a = foo2_2.a)
               ->  Sort
                     Sort Key: foo_2.a
                     ->  Seq Scan on foo_2
               ->  Sort
                     Sort Key: foo2_2.a
                     ->  Seq Scan on foo_2 foo2_2
(20 rows)

as opposed to what you get today:

explain (costs off) update foo set a = foo2.a + 1 from foo foo2 where
foo.a = foo2.a;
                    QUERY PLAN
--------------------------------------------------
 Update on foo
   Update on foo_1
   Update on foo_2
   ->  Merge Join
         Merge Cond: (foo_1.a = foo2.a)
         ->  Sort
               Sort Key: foo_1.a
               ->  Seq Scan on foo_1
         ->  Sort
               Sort Key: foo2.a
               ->  Append
                     ->  Seq Scan on foo_1 foo2
                     ->  Seq Scan on foo_2 foo2_1
   ->  Merge Join
         Merge Cond: (foo_2.a = foo2.a)
         ->  Sort
               Sort Key: foo_2.a
               ->  Seq Scan on foo_2
         ->  Sort
               Sort Key: foo2.a
               ->  Append
                     ->  Seq Scan on foo_1 foo2
                     ->  Seq Scan on foo_2 foo2_1
(23 rows)

> > For child result relations that are foreign tables, their FDW adds
> > junk attribute(s) to the query’s targetlist by updating it in-place
> > (AddForeignUpdateTargets).  However, as the child tables will no
> > longer get their own parsetree, we must use some hack around this
> > interface to obtain the foreign table specific junk attributes and add
> > them to the original/parent query’s targetlist.  Assuming that all or
> > most of the children will belong to the same FDW, we will end up with
> > only a handful such junk columns in the final targetlist.  I am not
> > sure if it's worthwhile to change the API of AddForeignUpdateTargets
> > to require FDWs to not scribble on the passed-in parsetree as part of
> > this patch.
>
> What happens if there's a mixture of foreign and local partitions or
> mixture of FDWs? Injecting junk columns from all FDWs in the top level
> target list will cause error because those attributes won't be
> available everywhere.

That is a good question and something I struggled with ever since I
started started thinking about implementing this.

For the problem that FDWs may inject junk columns that could neither
be present in local tables (root parent and other local children) nor
other FDWs, I couldn't think of any solution other than to restrict
what those junk columns can be -- to require them to be either "ctid",
"wholerow", or a set of only *inherited* user columns.  I think that's
what Tom was getting at when he said the following in the email I
cited in my first email:

"...It gets  a bit harder if the tree contains some foreign tables,
because they might have different concepts of row identity, but I'd
think in most cases you could still combine those into a small number
of output columns."

Maybe I misunderstood what Tom said, but I can't imagine how to let
these junk columns be anything that *all* tables contained in an
inheritance tree, especially the root parent, cannot emit, if they are
to be emitted out of a single plan.

> > As for how ModifyTable will create the new tuple for updates, I have
> > decided to use a ProjectionInfo for each result relation, which
> > projects a full, *clean* tuple ready to be put back into the relation.
> > When projecting, plan’s output tuple serves as OUTER tuple and the old
> > tuple fetched to fill unassigned attributes serves as SCAN tuple.  By
> > having this ProjectionInfo also serve as the “junk filter”, we don't
> > need JunkFilters.  The targetlist that this projection computes is
> > same as that of the result-relation-specific plan.  Initially, I
> > thought to generate this "expanded" targetlist in
> > ExecInitModifyTable().  But as it can be somewhat expensive, doing it
> > only once in the planner seemed like a good idea.  These
> > per-result-relations targetlists are carried in the ModifyTable node.
> >
> > To identify the result relation from the tuple produced by the plan,
> > “tableoid” junk column will be used.  As the tuples for different
> > result relations won’t necessarily come out in the order in which
> > result relations are laid out in the ModifyTable node, we need a way
> > to map the tableoid value to result relation indexes.  I have decided
> > to use a hash table here.
>
> Can we plan the scan query to add a sort node to order the rows by tableoid?

Hmm, I am afraid that some piece of partitioning code that assumes a
certain order of result relations, and that order is not based on
sorting tableoids.

> > A couple of things that I didn't think very hard what to do about now,
> > but may revisit later.
> >
> > * We will no longer be able use DirectModify APIs to push updates to
> > remote servers for foreign child result relations
>
> If we convert a whole DML into partitionwise DML (just as it happens
> today unintentionally), we should be able to use DirectModify. PWJ
> will help there. But even we can detect that the scan underlying a
> particular partition can be evaluated completely on the node same as
> where the partition resides, we should be able to use DirectModify.

I remember Fujita-san mentioned something like this, but I haven't
looked into how feasible it would be given the current DirectModify
interface.

> But if we are not able to support this optimization, the queries which
> benefit from it for today won't perform well. I think we need to think
> about this now instead of leave for later. Otherwise, make it so that
> we use the old way when there are foreign partitions and new way
> otherwise.

I would very much like find a solution for this, which hopefully isn't
to fall back to using the old way.

> * Tuple re-routing during UPDATE. For now it's disabled so your design
> should work. But we shouldn't design this feature in such a way that
> it comes in the way to enable tuple re-routing in future :).

Sorry, what is tuple re-routing and why does this new approach get in its way?

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Robert Haas
Date:
On Mon, May 11, 2020 at 8:58 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> What happens if there's a mixture of foreign and local partitions or
> mixture of FDWs? Injecting junk columns from all FDWs in the top level
> target list will cause error because those attributes won't be
> available everywhere.

I think that we're talking about a plan like this:

Update
-> Append
  -> a bunch of children

I believe that you'd want to have happen here is for each child to
emit the row identity columns that it knows about, and emit NULL for
the others. Then when you do the Append you end up with a row format
that includes all the individual identity columns, but for any
particular tuple, only one set of such columns is populated and the
others are all NULL. There doesn't seem to be any execution-time
problem with such a representation, but there might be a planning-time
problem with building it, because when you're writing a tlist for the
Append node, what varattno are you going to use for the columns that
exist only in one particular child and not the others? The fact that
setrefs processing happens so late seems like an annoyance in this
case.

Maybe it would be easier to have one Update note per kind of row
identity, i.e. if there's more than one such notion then...

Placeholder
-> Update
 -> Append
  -> all children with one notion of row identity
-> Update
 -> Append
  -> all children with another notion of row identity

...and so forth.

But I'm not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I believe that you'd want to have happen here is for each child to
> emit the row identity columns that it knows about, and emit NULL for
> the others. Then when you do the Append you end up with a row format
> that includes all the individual identity columns, but for any
> particular tuple, only one set of such columns is populated and the
> others are all NULL.

Yeah, that was what I'd imagined in my earlier thinking about this.

> There doesn't seem to be any execution-time
> problem with such a representation, but there might be a planning-time
> problem with building it,

Possibly.  We manage to cope with not-all-alike children now, of course,
but I think it might be true that no one plan node has Vars from
dissimilar children.  Even so, the Vars are self-identifying, so it
seems like this ought to be soluble.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Robert Haas
Date:
On Mon, May 11, 2020 at 2:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > There doesn't seem to be any execution-time
> > problem with such a representation, but there might be a planning-time
> > problem with building it,
>
> Possibly.  We manage to cope with not-all-alike children now, of course,
> but I think it might be true that no one plan node has Vars from
> dissimilar children.  Even so, the Vars are self-identifying, so it
> seems like this ought to be soluble.

If the parent is RTI 1, and the children are RTIs 2..6, what
varno/varattno will we use in RTI 1's tlist to represent a column that
exists in both RTI 2 and RTI 3 but not in RTI 1, 4, 5, or 6?

I suppose the answer is 2 - or 3, but I guess we'd pick the first
child as the representative of the class. We surely can't use varno 1,
because then there's no varattno that makes any sense. But if we use
2, now we have the tlist for RTI 1 containing expressions with a
child's RTI as the varno. I could be wrong, but I think that's going
to make setrefs.c throw up and die, and I wouldn't be very surprised
if there were a bunch of other things that crashed and burned, too. I
think we have quite a bit of code that expects to be able to translate
between parent-rel expressions and child-rel expressions, and that's
going to be pretty problematic here.

Maybe your answer is - let's just fix all that stuff. That could well
be right, but my first reaction is to think that it sounds hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> If the parent is RTI 1, and the children are RTIs 2..6, what
> varno/varattno will we use in RTI 1's tlist to represent a column that
> exists in both RTI 2 and RTI 3 but not in RTI 1, 4, 5, or 6?

Fair question.  We don't have any problem representing the column
as it exists in any one of those children, but we lack a notation
for the "union" or whatever you want to call it, except in the case
where the parent relation has a corresponding column.  Still, this
doesn't seem that hard to fix.  My inclination would be to invent
dummy parent-rel columns (possibly with negative attnums? not sure if
that'd be easier or harder than adding them in the positive direction)
to represent such "union" columns.  This concept would only need to
exist within the planner I think, since after setrefs.c there'd be no
trace of those dummy columns.

> I think we have quite a bit of code that expects to be able to translate
> between parent-rel expressions and child-rel expressions, and that's
> going to be pretty problematic here.

... shrug.  Sure, we'll need to be able to do that mapping.  Why will
it be any harder than any other parent <-> child mapping?  The planner
would know darn well what the mapping is while it's inventing the
dummy columns, so it just has to keep that info around for use later.

> Maybe your answer is - let's just fix all that stuff. That could well
> be right, but my first reaction is to think that it sounds hard.

I have to think that it'll net out as less code, and certainly less
complicated code, than trying to extend inheritance_planner in its
current form to do what we wish it'd do.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Robert Haas
Date:
On Mon, May 11, 2020 at 4:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > If the parent is RTI 1, and the children are RTIs 2..6, what
> > varno/varattno will we use in RTI 1's tlist to represent a column that
> > exists in both RTI 2 and RTI 3 but not in RTI 1, 4, 5, or 6?
>
> Fair question.  We don't have any problem representing the column
> as it exists in any one of those children, but we lack a notation
> for the "union" or whatever you want to call it, except in the case
> where the parent relation has a corresponding column.  Still, this
> doesn't seem that hard to fix.  My inclination would be to invent
> dummy parent-rel columns (possibly with negative attnums? not sure if
> that'd be easier or harder than adding them in the positive direction)
> to represent such "union" columns.

Ah, that makes sense. If we can invent dummy columns on the parent
rel, then most of what I was worrying about no longer seems very
worrying.

I'm not sure what's involved in inventing such dummy columns, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: making update/delete of inheritance trees scale better

From
Ashutosh Bapat
Date:
On Mon, May 11, 2020 at 8:11 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Per row overhead would be incurred for every row whereas the plan time
> > overhead is one-time or in case of a prepared statement almost free.
> > So we need to compare it esp. when there are 2000 partitions and all
> > of them are being updated.
>
> I assume that such UPDATEs would be uncommon.

Yes, 2000 partitions being updated would be rare. But many rows from
the same partition being updated may not be that common. We have to
know how much is that per row overhead and updating how many rows it
takes to beat the planning time overhead. If the number of rows is
very large, we are good.

>
> > But generally I agree that this would be a
> > better approach. It might help using PWJ when the result relation
> > joins with other partitioned table.
>
> It does, because the plan below ModifyTable is same as if the query
> were SELECT instead of UPDATE; with my PoC:
>
> explain (costs off) update foo set a = foo2.a + 1 from foo foo2 where
> foo.a = foo2.a;
>                     QUERY PLAN
> --------------------------------------------------
>  Update on foo
>    Update on foo_1
>    Update on foo_2
>    ->  Append
>          ->  Merge Join
>                Merge Cond: (foo_1.a = foo2_1.a)
>                ->  Sort
>                      Sort Key: foo_1.a
>                      ->  Seq Scan on foo_1
>                ->  Sort
>                      Sort Key: foo2_1.a
>                      ->  Seq Scan on foo_1 foo2_1
>          ->  Merge Join
>                Merge Cond: (foo_2.a = foo2_2.a)
>                ->  Sort
>                      Sort Key: foo_2.a
>                      ->  Seq Scan on foo_2
>                ->  Sort
>                      Sort Key: foo2_2.a
>                      ->  Seq Scan on foo_2 foo2_2
> (20 rows)

Wonderful. That looks good.


> > Can we plan the scan query to add a sort node to order the rows by tableoid?
>
> Hmm, I am afraid that some piece of partitioning code that assumes a
> certain order of result relations, and that order is not based on
> sorting tableoids.

I am suggesting that we override that order (if any) in
create_modifytable_path() or create_modifytable_plan() by explicitly
ordering the incoming paths on tableoid. May be using MergeAppend.


>
> > * Tuple re-routing during UPDATE. For now it's disabled so your design
> > should work. But we shouldn't design this feature in such a way that
> > it comes in the way to enable tuple re-routing in future :).
>
> Sorry, what is tuple re-routing and why does this new approach get in its way?

An UPDATE causing a tuple to move to a different partition. It would
get in its way since the tuple will be located based on tableoid,
which will be the oid of the old partition. But I think this approach
has higher chance of being able to solve that problem eventually
rather than the current approach.
-- 
Best Wishes,
Ashutosh Bapat



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Tue, May 12, 2020 at 5:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, May 11, 2020 at 4:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > If the parent is RTI 1, and the children are RTIs 2..6, what
> > > varno/varattno will we use in RTI 1's tlist to represent a column that
> > > exists in both RTI 2 and RTI 3 but not in RTI 1, 4, 5, or 6?
> >
> > Fair question.  We don't have any problem representing the column
> > as it exists in any one of those children, but we lack a notation
> > for the "union" or whatever you want to call it, except in the case
> > where the parent relation has a corresponding column.  Still, this
> > doesn't seem that hard to fix.  My inclination would be to invent
> > dummy parent-rel columns (possibly with negative attnums? not sure if
> > that'd be easier or harder than adding them in the positive direction)
> > to represent such "union" columns.
>
> Ah, that makes sense. If we can invent dummy columns on the parent
> rel, then most of what I was worrying about no longer seems very
> worrying.

IIUC, the idea is to have "dummy" columns in the top parent's
reltarget for every junk TLE added to the top-level targetlist by
child tables' FDWs that the top parent itself can't emit. But we allow
these FDW junk TLEs to contain any arbitrary expression, not just
plain Vars [1], so what node type are these dummy parent columns?  I
can see from add_vars_to_targetlist() that we allow only Vars and
PlaceHolderVars to be present in a relation's reltarget->exprs, but
neither of those seem suitable for the task.

Once we get something in the parent's reltarget->exprs representing
these child expressions, from there they go back into child
reltargets, so it would appear that our appendrel transformation code
must somehow be taught to deal with these dummy columns.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/docs/current/fdw-callbacks.html#FDW-CALLBACKS-UPDATE

"...If the extra expressions are more complex than simple Vars, they
must be run through eval_const_expressions before adding them to the
targetlist."



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> On Tue, May 12, 2020 at 5:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
>> Ah, that makes sense. If we can invent dummy columns on the parent
>> rel, then most of what I was worrying about no longer seems very
>> worrying.

> IIUC, the idea is to have "dummy" columns in the top parent's
> reltarget for every junk TLE added to the top-level targetlist by
> child tables' FDWs that the top parent itself can't emit. But we allow
> these FDW junk TLEs to contain any arbitrary expression, not just
> plain Vars [1], so what node type are these dummy parent columns?

We'd have to group the children into groups that share the same
row-identity column type.  This is why I noted way-back-when that
it'd be a good idea to discourage FDWs from being too wild about
what they use for row identity.

(Also, just to be totally clear: I am *not* envisioning this as a
mechanism for FDWs to inject whatever computations they darn please
into query trees.  It's for the row identity needed by UPDATE/DELETE,
and nothing else.  That being the case, it's hard to understand why
the bottom-level Vars wouldn't be just plain Vars --- maybe "system
column" Vars or something like that, but still just Vars, not
expressions.)

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
David Rowley
Date:
On Wed, 13 May 2020 at 00:54, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Mon, May 11, 2020 at 8:11 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > Per row overhead would be incurred for every row whereas the plan time
> > > overhead is one-time or in case of a prepared statement almost free.
> > > So we need to compare it esp. when there are 2000 partitions and all
> > > of them are being updated.
> >
> > I assume that such UPDATEs would be uncommon.
>
> Yes, 2000 partitions being updated would be rare. But many rows from
> the same partition being updated may not be that common. We have to
> know how much is that per row overhead and updating how many rows it
> takes to beat the planning time overhead. If the number of rows is
> very large, we are good.

Rows from a non-parallel Append should arrive in order. If you were
worried about the performance of finding the correct ResultRelInfo for
the tuple that we just got, then we could just cache the tableOid and
ResultRelInfo for the last row, and if that tableoid matches on this
row, just use the same ResultRelInfo as last time.   That'll save
doing the hash table lookup in all cases, apart from when the Append
changes to the next child subplan.  Not sure exactly how that'll fit
in with the foreign table discussion that's going on here though.
Another option would be to not use tableoid and instead inject an INT4
Const (0 to nsubplans) into each subplan's targetlist that serves as
the index into an array of ResultRelInfos.

As for which ResultRelInfos to initialize, couldn't we just have the
planner generate an OidList of all the ones that we could need.
Basically, all the non-pruned partitions. Perhaps we could even be
pretty lazy about building those ResultRelInfos during execution too.
We'd need to grab the locks first, but, without staring at the code, I
doubt there's a reason we'd need to build them all upfront.  That
would help in cases where pruning didn't prune much, but due to
something else in the WHERE clause, the results only come from some
small subset of partitions.

David



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Tue, May 12, 2020 at 9:54 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Mon, May 11, 2020 at 8:11 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > Per row overhead would be incurred for every row whereas the plan time
> > > overhead is one-time or in case of a prepared statement almost free.
> > > So we need to compare it esp. when there are 2000 partitions and all
> > > of them are being updated.
> >
> > I assume that such UPDATEs would be uncommon.
>
> Yes, 2000 partitions being updated would be rare. But many rows from
> the same partition being updated may not be that common. We have to
> know how much is that per row overhead and updating how many rows it
> takes to beat the planning time overhead. If the number of rows is
> very large, we are good.

Maybe I am misunderstanding you, but the more the rows to update, the
more overhead we will be paying with the new approach.

> > > Can we plan the scan query to add a sort node to order the rows by tableoid?
> >
> > Hmm, I am afraid that some piece of partitioning code that assumes a
> > certain order of result relations, and that order is not based on
> > sorting tableoids.
>
> I am suggesting that we override that order (if any) in
> create_modifytable_path() or create_modifytable_plan() by explicitly
> ordering the incoming paths on tableoid. May be using MergeAppend.

So, we will need to do 2 things:

1. Implicitly apply an ORDER BY tableoid clause
2. Add result relation RTIs to ModifyTable.resultRelations in the
order of their RTE's relid.

Maybe we can do that as a separate patch.  Also, I am not sure if it
will get in the way of someone wanting to have ORDER BY LIMIT for
updates.

> > > * Tuple re-routing during UPDATE. For now it's disabled so your design
> > > should work. But we shouldn't design this feature in such a way that
> > > it comes in the way to enable tuple re-routing in future :).
> >
> > Sorry, what is tuple re-routing and why does this new approach get in its way?
>
> An UPDATE causing a tuple to move to a different partition. It would
> get in its way since the tuple will be located based on tableoid,
> which will be the oid of the old partition. But I think this approach
> has higher chance of being able to solve that problem eventually
> rather than the current approach.

Again, I don't think I understand.   We do currently (as of v11)
re-route tuples when UPDATE causes them to move to a different
partition, which, gladly, continues to work with my patch.

So how it works is like this: for a given "new" tuple, ExecUpdate()
checks if the tuple would violate the partition constraint of the
result relation that was passed along with the tuple.  If it does, the
new tuple will be moved, by calling ExecDelete() to delete it from the
current relation, followed by ExecInsert() to find the new home for
the tuple.  The only thing that changes with the new approach is how
ExecModifyTable() chooses a result relation to pass to ExecUpdate()
for a given "new" tuple it has fetched from the plan, which is quite
independent from the tuple re-routing mechanism proper.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Wed, May 13, 2020 at 8:52 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, 13 May 2020 at 00:54, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Mon, May 11, 2020 at 8:11 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > Per row overhead would be incurred for every row whereas the plan time
> > > > overhead is one-time or in case of a prepared statement almost free.
> > > > So we need to compare it esp. when there are 2000 partitions and all
> > > > of them are being updated.
> > >
> > > I assume that such UPDATEs would be uncommon.
> >
> > Yes, 2000 partitions being updated would be rare. But many rows from
> > the same partition being updated may not be that common. We have to
> > know how much is that per row overhead and updating how many rows it
> > takes to beat the planning time overhead. If the number of rows is
> > very large, we are good.
>
> Rows from a non-parallel Append should arrive in order. If you were
> worried about the performance of finding the correct ResultRelInfo for
> the tuple that we just got, then we could just cache the tableOid and
> ResultRelInfo for the last row, and if that tableoid matches on this
> row, just use the same ResultRelInfo as last time.   That'll save
> doing the hash table lookup in all cases, apart from when the Append
> changes to the next child subplan.

That would be a more common case, yes.  Not when a join is involved though.

>  Not sure exactly how that'll fit
> in with the foreign table discussion that's going on here though.

Foreign table discussion is concerned with what the only top-level
targetlist should look like given that different result relations may
require different row-identifying junk columns, due to possibly
belonging to different FDWs.  Currently that's not a thing to worry
about, because each result relation has its own plan and hence the
targetlist.

> Another option would be to not use tableoid and instead inject an INT4
> Const (0 to nsubplans) into each subplan's targetlist that serves as
> the index into an array of ResultRelInfos.

That may be a bit fragile, considering how volatile that number
(result relation index) can be if you figure in run-time pruning, but
maybe worth considering.

> As for which ResultRelInfos to initialize, couldn't we just have the
> planner generate an OidList of all the ones that we could need.
> Basically, all the non-pruned partitions.

Why would replacing list of RT indexes by OIDs be better?

> Perhaps we could even be
> pretty lazy about building those ResultRelInfos during execution too.
> We'd need to grab the locks first, but, without staring at the code, I
> doubt there's a reason we'd need to build them all upfront.  That
> would help in cases where pruning didn't prune much, but due to
> something else in the WHERE clause, the results only come from some

Late ResultRelInfo initialization is worth considering, given that
doing it for tuple-routing target relations works.  I don't know why
we are still Initializing them all in InitPlan(), because the only
justification given for doing so that I know of is that it prevents
lock-upgrade.  I think we discussed somewhat recently that that is not
really a hazard.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Tue, May 12, 2020 at 10:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Tue, May 12, 2020 at 5:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >> Ah, that makes sense. If we can invent dummy columns on the parent
> >> rel, then most of what I was worrying about no longer seems very
> >> worrying.
>
> > IIUC, the idea is to have "dummy" columns in the top parent's
> > reltarget for every junk TLE added to the top-level targetlist by
> > child tables' FDWs that the top parent itself can't emit. But we allow
> > these FDW junk TLEs to contain any arbitrary expression, not just
> > plain Vars [1], so what node type are these dummy parent columns?
>
> We'd have to group the children into groups that share the same
> row-identity column type.  This is why I noted way-back-when that
> it'd be a good idea to discourage FDWs from being too wild about
> what they use for row identity.

I understood the part about having a dummy parent column for each
group of children that use the same junk attribute.  I think we must
group them using resname + row-identity Var type though, not just the
latter, because during execution, the FDWs look up the junk columns by
name.  If two FDWs add junk Vars of the same type, say, 'tid', but use
different resname, say, "ctid" and "rowid", respectively, we must add
two dummy parent columns.

> (Also, just to be totally clear: I am *not* envisioning this as a
> mechanism for FDWs to inject whatever computations they darn please
> into query trees.  It's for the row identity needed by UPDATE/DELETE,
> and nothing else.  That being the case, it's hard to understand why
> the bottom-level Vars wouldn't be just plain Vars --- maybe "system
> column" Vars or something like that, but still just Vars, not
> expressions.)

I suppose we would need to explicitly check that and cause an error if
the contained expression is not a plain Var.  Neither the interface
we've got nor the documentation discourages them from putting just
about any expression into the junk TLE.

Based on an off-list chat with Robert, I started looking into whether
it would make sense to drop the middleman Append (or MergeAppend)
altogether, if only to avoid having to invent a representation for
parent targetlist that is never actually computed.  However, it's not
hard to imagine that any new book-keeping code to manage child plans,
even though perhaps cheaper in terms of cycles spent than
inheritance_planner(), would add complexity to the main planner.  It
would also be a shame to lose useful functionality that we get by
having an Append present, such as run-time pruning and partitionwise
joins.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
David Rowley
Date:
On Wed, 13 May 2020 at 19:02, Amit Langote <amitlangote09@gmail.com> wrote:
> > As for which ResultRelInfos to initialize, couldn't we just have the
> > planner generate an OidList of all the ones that we could need.
> > Basically, all the non-pruned partitions.
>
> Why would replacing list of RT indexes by OIDs be better?

TBH, I didn't refresh my memory of the code before saying that.
However, if we have a list of RT index for which rangetable entries we
must build ResultRelInfos for, then why is it a problem that plan-time
pruning is not allowing you to eliminate the excess ResultRelInfos,
like you mentioned in:

On Sat, 9 May 2020 at 01:33, Amit Langote <amitlangote09@gmail.com> wrote:
> prepare q as update foo set a = 250001 where a = $1;
> set plan_cache_mode to 'force_generic_plan';
> explain execute q(1);
>                              QUERY PLAN
> --------------------------------------------------------------------
>  Update on foo  (cost=0.00..142.20 rows=40 width=14)
>    Update on foo_1
>    Update on foo_2 foo
>    Update on foo_3 foo
>    Update on foo_4 foo
>    ->  Append  (cost=0.00..142.20 rows=40 width=14)
>          Subplans Removed: 3
>          ->  Seq Scan on foo_1  (cost=0.00..35.50 rows=10 width=14)
>                Filter: (a = $1)
> (9 rows)

Shouldn't you just be setting the ModifyTablePath.resultRelations to
the non-pruned RT indexes?

> > Perhaps we could even be
> > pretty lazy about building those ResultRelInfos during execution too.
> > We'd need to grab the locks first, but, without staring at the code, I
> > doubt there's a reason we'd need to build them all upfront.  That
> > would help in cases where pruning didn't prune much, but due to
> > something else in the WHERE clause, the results only come from some
>
> Late ResultRelInfo initialization is worth considering, given that
> doing it for tuple-routing target relations works.  I don't know why
> we are still Initializing them all in InitPlan(), because the only
> justification given for doing so that I know of is that it prevents
> lock-upgrade.  I think we discussed somewhat recently that that is not
> really a hazard.

Looking more closely at ExecGetRangeTableRelation(), we'll already
have the lock by that time, there's an Assert to verify that too.
It'll have been acquired either during planning or during
AcquireExecutorLocks(). So it seems doing anything for delaying the
building of ResultRelInfos wouldn't need to account for taking the
lock at a different time.

David



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Thu, May 14, 2020 at 7:55 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, 13 May 2020 at 19:02, Amit Langote <amitlangote09@gmail.com> wrote:
> > > As for which ResultRelInfos to initialize, couldn't we just have the
> > > planner generate an OidList of all the ones that we could need.
> > > Basically, all the non-pruned partitions.
> >
> > Why would replacing list of RT indexes by OIDs be better?
>
> TBH, I didn't refresh my memory of the code before saying that.
> However, if we have a list of RT index for which rangetable entries we
> must build ResultRelInfos for, then why is it a problem that plan-time
> pruning is not allowing you to eliminate the excess ResultRelInfos,
> like you mentioned in:
>
> On Sat, 9 May 2020 at 01:33, Amit Langote <amitlangote09@gmail.com> wrote:
> > prepare q as update foo set a = 250001 where a = $1;
> > set plan_cache_mode to 'force_generic_plan';
> > explain execute q(1);
> >                              QUERY PLAN
> > --------------------------------------------------------------------
> >  Update on foo  (cost=0.00..142.20 rows=40 width=14)
> >    Update on foo_1
> >    Update on foo_2 foo
> >    Update on foo_3 foo
> >    Update on foo_4 foo
> >    ->  Append  (cost=0.00..142.20 rows=40 width=14)
> >          Subplans Removed: 3
> >          ->  Seq Scan on foo_1  (cost=0.00..35.50 rows=10 width=14)
> >                Filter: (a = $1)
> > (9 rows)
>
> Shouldn't you just be setting the ModifyTablePath.resultRelations to
> the non-pruned RT indexes?

Oh, that example is showing run-time pruning for a generic plan.  If
planner prunes partitions, of course, their result relation indexes
are not present in ModifyTablePath.resultRelations.

> > > Perhaps we could even be
> > > pretty lazy about building those ResultRelInfos during execution too.
> > > We'd need to grab the locks first, but, without staring at the code, I
> > > doubt there's a reason we'd need to build them all upfront.  That
> > > would help in cases where pruning didn't prune much, but due to
> > > something else in the WHERE clause, the results only come from some
> >
> > Late ResultRelInfo initialization is worth considering, given that
> > doing it for tuple-routing target relations works.  I don't know why
> > we are still Initializing them all in InitPlan(), because the only
> > justification given for doing so that I know of is that it prevents
> > lock-upgrade.  I think we discussed somewhat recently that that is not
> > really a hazard.
>
> Looking more closely at ExecGetRangeTableRelation(), we'll already
> have the lock by that time, there's an Assert to verify that too.
> It'll have been acquired either during planning or during
> AcquireExecutorLocks(). So it seems doing anything for delaying the
> building of ResultRelInfos wouldn't need to account for taking the
> lock at a different time.

Yep, I think it might be worthwhile to delay ResultRelInfo building
for UPDATE/DELETE too.  I would like to leave that for another patch
though.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Ashutosh Bapat
Date:
On Wed, May 13, 2020 at 9:21 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Maybe I am misunderstanding you, but the more the rows to update, the
> more overhead we will be paying with the new approach.

Yes, that's right. How much is that compared to the current planning
overhead. How many rows it takes for that overhead to be comparable to
the current planning overhead.

But let's not sweat on that point much right now.

>
> So, we will need to do 2 things:
>
> 1. Implicitly apply an ORDER BY tableoid clause
> 2. Add result relation RTIs to ModifyTable.resultRelations in the
> order of their RTE's relid.
>
> Maybe we can do that as a separate patch.  Also, I am not sure if it
> will get in the way of someone wanting to have ORDER BY LIMIT for
> updates.

It won't. But may be David's idea is better.

>
> > > > * Tuple re-routing during UPDATE. For now it's disabled so your design
> > > > should work. But we shouldn't design this feature in such a way that
> > > > it comes in the way to enable tuple re-routing in future :).
> > >
> > > Sorry, what is tuple re-routing and why does this new approach get in its way?
> >
> > An UPDATE causing a tuple to move to a different partition. It would
> > get in its way since the tuple will be located based on tableoid,
> > which will be the oid of the old partition. But I think this approach
> > has higher chance of being able to solve that problem eventually
> > rather than the current approach.
>
> Again, I don't think I understand.   We do currently (as of v11)
> re-route tuples when UPDATE causes them to move to a different
> partition, which, gladly, continues to work with my patch.

Ah! Ok. I missed that part then.

>
> So how it works is like this: for a given "new" tuple, ExecUpdate()
> checks if the tuple would violate the partition constraint of the
> result relation that was passed along with the tuple.  If it does, the
> new tuple will be moved, by calling ExecDelete() to delete it from the
> current relation, followed by ExecInsert() to find the new home for
> the tuple.  The only thing that changes with the new approach is how
> ExecModifyTable() chooses a result relation to pass to ExecUpdate()
> for a given "new" tuple it has fetched from the plan, which is quite
> independent from the tuple re-routing mechanism proper.
>

Thanks for the explanation.

-- 
Best Wishes,
Ashutosh Bapat



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
So, I think I have a patch that seems to work, but not all the way,
more on which below.

Here is the commit message in the attached patch.

===
Subject: [PATCH] Overhaul UPDATE's targetlist processing

Instead of emitting the full tuple matching the target table's tuple
descriptor, make the plan emit only the attributes that are assigned
values in the SET clause, plus row-identity junk attributes as before.
This allows us to avoid making a separate plan for each target
relation in the inheritance case, because the only reason it is so
currently is to account for the fact that each target relations may
have a set of attributes that is different from others.  Having only
one plan suffices, because the set of assigned attributes must be same
in all the result relations.

While the plan will now produce only the assigned attributes and
row-identity junk attributes, other columns' values are filled by
refetching the old tuple. To that end, there will be a targetlist for
each target relation to compute the full tuple, that is, by combining
the values from the plan tuple and the old tuple, but they are passed
separately in the ModifyTable node.

Implementation notes:

* In the inheritance case, as the same plan produces tuples to be
updated from multiple result relations, the tuples now need to also
identity which table they come from, so an additional junk attribute
"tableoid" is present in that case.

* Considering that the inheritance set may contain foreign tables that
require a different (set of) row-identity junk attribute(s), the plan
needs to emit multiple distinct junk attributes.  When transposed to a
child scan node, this targetlist emits a non-NULL value for the junk
attribute that's valid for the child relation and NULL for others.

* Executor and FDW execution APIs can no longer assume any specific
order in which the result relations will be processed. For each
tuple to be updated/deleted, result relation is selected by looking it
up in a hash table using the "tableoid" value as the key.

* Since the plan does not emit values for all the attributes, FDW APIs
may not assume that the individual column values in the TupleTableSlot
containing the plan tuple are accessible by their attribute numbers.

TODO:

* Reconsider having only one plan!
* Update FDW handler docs to reflect the API changes
===

Regarding the first TODO, it is to address the limitation that FDWs
will no longer be able push the *whole* child UPDATE/DELETE query down
to the remote server, including any joins, which is allowed at the
moment via PlanDirectModify API.  The API seems to have been designed
with an assumption that the child scan/join node is the top-level
plan, but that's no longer the case.  If we consider bypassing the
Append and allow ModifyTable to access the child scan/join nodes
directly, maybe we can allow that.  I haven't updated the expected
output of postgres_fdw regression tests for now pending this.

A couple of things in the patch that I feel slightly uneasy about:

* Result relations are now appendrel children in the planner.
Normally, any wholerow Vars in the child relation's reltarget->exprs
get a ConvertRowType added on top to convert it back to the parent's
reltype, because that's what the client expects in the SELECT case.
In the result relation case, the executor expects to see child
wholerow Vars themselves, not their parent versions.

* FDW's ExecFoeignUpdate() API expects that the NEW tuple passed to it
match the target foreign table reltype, so that it can access the
target attributes in the tuple by attribute numbers.  Considering that
the plan no longer builds the full tuple itself, I made the executor
satisfy that expectation by filling the missing attributes' values
using the target table's wholerow attribute.  That is, we now *always*
fetch the wholerow attributes for UPDATE, not just when there are
row-level triggers that need it.  I think that's unfortunate.  Maybe,
the correct way is asking the FDWs to translate (setrefs.c style) the
target attribute numbers appropriately to access the plan's output
tuple.

I will add the patch to the next CF.  I haven't yet fully checked the
performance considerations of the new approach, but will do so in the
coming days.


--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Tue, Jun 2, 2020 at 1:15 PM Amit Langote <amitlangote09@gmail.com> wrote:
> So, I think I have a patch that seems to work, but not all the way,
> more on which below.
>
> Here is the commit message in the attached patch.
>
> ===
> Subject: [PATCH] Overhaul UPDATE's targetlist processing
>
> Instead of emitting the full tuple matching the target table's tuple
> descriptor, make the plan emit only the attributes that are assigned
> values in the SET clause, plus row-identity junk attributes as before.
> This allows us to avoid making a separate plan for each target
> relation in the inheritance case, because the only reason it is so
> currently is to account for the fact that each target relations may
> have a set of attributes that is different from others.  Having only
> one plan suffices, because the set of assigned attributes must be same
> in all the result relations.
>
> While the plan will now produce only the assigned attributes and
> row-identity junk attributes, other columns' values are filled by
> refetching the old tuple. To that end, there will be a targetlist for
> each target relation to compute the full tuple, that is, by combining
> the values from the plan tuple and the old tuple, but they are passed
> separately in the ModifyTable node.
>
> Implementation notes:
>
> * In the inheritance case, as the same plan produces tuples to be
> updated from multiple result relations, the tuples now need to also
> identity which table they come from, so an additional junk attribute
> "tableoid" is present in that case.
>
> * Considering that the inheritance set may contain foreign tables that
> require a different (set of) row-identity junk attribute(s), the plan
> needs to emit multiple distinct junk attributes.  When transposed to a
> child scan node, this targetlist emits a non-NULL value for the junk
> attribute that's valid for the child relation and NULL for others.
>
> * Executor and FDW execution APIs can no longer assume any specific
> order in which the result relations will be processed. For each
> tuple to be updated/deleted, result relation is selected by looking it
> up in a hash table using the "tableoid" value as the key.
>
> * Since the plan does not emit values for all the attributes, FDW APIs
> may not assume that the individual column values in the TupleTableSlot
> containing the plan tuple are accessible by their attribute numbers.
>
> TODO:
>
> * Reconsider having only one plan!
> * Update FDW handler docs to reflect the API changes
> ===

I divided that into two patches:

1. Make the plan producing tuples to be updated emit only the columns
that are actually updated.  postgres_fdw test fails unless you also
apply the patch I posted at [1], because there is an unrelated bug in
UPDATE tuple routing code that manifests due to some changes of this
patch.

2. Due to 1, inheritance_planner() is no longer needed, that is,
inherited update/delete can be handled by pulling the rows to
update/delete from only one plan, not one per child result relation.
This one makes that so.

There are some unsolved problems having to do with foreign tables in
both 1 and 2:

In 1, FDW update APIs still assume that the plan produces "full" tuple
for update.  That needs to be fixed so that FDWs deal with getting
only the updated columns in the plan's output targetlist.

In 2, still haven't figured out a way to call PlanDirectModify() on
child foreign tables.  Lacking that, inherited updates on foreign
tables are now slower, because they are not pushed down.  I'd like to
figure something out to fix that situation.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA%2BHiwqE_UK1jTSNrjb8mpTdivzd3dum6mK--xqKq0Y9VmfwWQA%40mail.gmail.com

Attachment

Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
Hello,

I have been working away at this and have updated the patches for many
cosmetic and some functional improvements.

On Fri, Jun 12, 2020 at 3:46 PM Amit Langote <amitlangote09@gmail.com> wrote:
> I divided that into two patches:
>
> 1. Make the plan producing tuples to be updated emit only the columns
> that are actually updated.  postgres_fdw test fails unless you also
> apply the patch I posted at [1], because there is an unrelated bug in
> UPDATE tuple routing code that manifests due to some changes of this
> patch.
>
> 2. Due to 1, inheritance_planner() is no longer needed, that is,
> inherited update/delete can be handled by pulling the rows to
> update/delete from only one plan, not one per child result relation.
> This one makes that so.
>
> There are some unsolved problems having to do with foreign tables in
> both 1 and 2:
>
> In 1, FDW update APIs still assume that the plan produces "full" tuple
> for update.  That needs to be fixed so that FDWs deal with getting
> only the updated columns in the plan's output targetlist.
>
> In 2, still haven't figured out a way to call PlanDirectModify() on
> child foreign tables.  Lacking that, inherited updates on foreign
> tables are now slower, because they are not pushed down.  I'd like to
> figure something out to fix that situation.

In the updated patch, I have implemented a partial solution to this,
but I think it should be enough in most practically useful situations.
With the updated patch, PlanDirectModify is now called for child
result relations, but the FDWs will need to be revised to do useful
work in that call (as the patch does for postgres_fdw), because a
potentially pushable ForeignScan involving a given child result
relation will now be at the bottom of the source plan tree, whereas
before it would be the top-level plan.  Another disadvantage of this
new situation is that inherited update/delete involving joins that
were previously pushable cannot be pushed anymore.  If update/delete
would have been able to use partition-wise join, a child join
involving a given child result relation could in principle be pushed,
but some semi-related issues prevent the use of partition-wise joins
for update/delete, especially when there are foreign table partitions.

Another major change is that instead of "tableoid"  junk attribute to
identify the target result relation for a given tuple to be
updated/deleted, the patch now makes the tuples to be updated/deleted
contain a junk attribute that gives the index of the result relation
in the query's list of result relations which can be used to look up
the target result relation directly.  With "tableoid", we would need
to build a hash table to map the result relation OIDs to result
relation indexes, a step that could be seen to become a bottleneck
with large partition counts (I am talking about executing generic
plans here and have mentioned this problem on the thread to make
generic plan execution for update/delete faster [1]).

Here are the commit messages of the attached patches:

[PATCH v3 1/3] Overhaul how updates compute a new tuple

Currently, the planner rewrites the top-level targetlist of an update
statement's parsetree so that it contains entries for all attributes
of the target relation, including for those columns that have not
been changed.  This arrangement means that the executor can take a
tuple that the plan produces, remove any junk attributes in it and
pass it down to the table AM or FDW update API as the new tuple.
It also means that in an inherited update, where there are multiple
target relations, the planner must produce that many plans, because
the targetlists for different target relations may not all look the
same considering that child relations may have different sets of
columns with varying attribute numbers.

This commit revises things so that the planner no longer expands
the parsetree targetlist to include unchanged columns so that the
plan only produces values of the changed columns.  To make the new
tuple to pass to table AM and FDW update API, executor now evaluates
another targetlist matching the target table's TupleDesc which refers
to the plan's output tuple to gets values of the changed columns and
to the old tuple that is refetched for values of unchanged columns.

To get values for unchanged columns to use when forming the new tuple
to pass to ExecForeignUpdate(), we now require foreign scans to
always include the wholerow Var corresponding to the old tuple being
updated, because the unchanged columns are not present in the
plan's targetlist.

As a note to FDW authors, any FDW update planning APIs that look at
the plan's targetlist for checking if it is pushable to remote side
(e.g. PlanDirectModify) should now instead look at "update targetlist"
that is set by the planner in PlannerInfo.update_tlist, because resnos
in the plan's targetlist is no longer indexable by target column's
attribute numbers.

Note that even though the main goal of doing this is to avoid having
to make multiple plans in the inherited update case, this commit does
not touch that subject.  A subsequent commit will change things that
are necessary to make inherited updates work with a single plan.

[PATCH v3 2/3] Include result relation index if any in ForeignScan

FDWs that can perform an UPDATE/DELETE remotely using the "direct
modify" set of APIs need in some cases to access the result relation
properties for which they can currently look at
EState.es_result_relation_info.  However that means the executor must
ensure that es_result_relation_info points to the correct result
relation at all times, especially during inherited updates.  This
requirement gets in the way of number of projects related to changing
how ModifyTable operates.  For example, an upcoming patch will change
things such that there will be one source plan for all result
relations whereas currently there is one per result relation, an
arrangement which makes it convenient to switch the result relation
when the source plan changes.

This commit installs a new field 'resultRelIndex' in ForeignScan node
which must be set by an FDW if the node will be used to carry out an
UPDATE/DELETE operation on a given foreign table, which is the case
if the FDW manages to push that operations to the remote side.  This
commit also modifies postgres_fdw to implement that.

[PATCH v3 3/3] Revise how inherited update/delete are handled

Now that we have the ability to maintain and evaluate the targetlist
needed to generate an update's new tuples independently of the plan
which fetches the tuples to be updated, there is no need to make
separate plans for child result relations as inheritance_planner()
currently does.  We generated separate plans before such capability
was present, because that was the only way to generate new tuples of
child relations where each may have its own unique set of columns
(albeit all sharing the set columns present in the root parent).

With this commit, an inherited update/delete query will now be planned
just as a non-inherited one, generating a single plan that goes under
ModifyTable.  The plan for the inherited case is essentially the one
that we get for a select query, although the targetlist additionally
contains junk attributes needed by update/delete.

By going from one plan per result relation to only one shared across
all result relations, the executor now needs a new way to identify the
result relation to direct a given tuple's update/delete to, whereas
before, it could tell that from the plan it is executing.  To that
end, the planner now adds a new junk attribute to the query's
targetlist that for each tuple gives the index of the result relation
in the query's list of result relations.  That is in addition to the
junk attribute that the planner already adds to identify the tuple's
position in a given relation (such as "ctid").

Given the way query planning with inherited tables work where child
relations are not part of the query's jointree and only the root
parent is, there are some challenges that arise in the update/delete
case:

* The junk attributes needed by child result relations need to be
represented as root parent Vars, which is a non-issue for a given
child if what the child needs and what is added for the root parent
are one and the same column.  But considering that that may not
always be the case, more parent Vars might get added to the top-level
targetlist as children are added to the query as result relations.
In some cases, a child relation may use a column that is not present
in the parent (allowed by traditional inheritance) or a non-column
expression, which must be represented using what this patch calls
"fake" parent vars.  These fake parent vars are really only
placeholders for the underlying child relation's column or expression
and don't reach the executor's expression evluation machinery.

* FDWs that are able to push update/delete fully to the remote side
using DirectModify set of APIs now have to go through hoops to
identify the subplan and the UPDATE targetlist to push for child
result relations, because the subplans for individual result
relations are no loger top-level plans.  In fact, if the result
relation is joined to another relation, update/delete cannot be
pushed down at all anymore, whereas before since the child relations
would be present in the main jointree, they could be in the case
where the relation being joined to was present on the same server as
the child result relation.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA%2BHiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA%40mail.gmail.com

Attachment

Re: making update/delete of inheritance trees scale better

From
Michael Paquier
Date:
On Fri, Sep 11, 2020 at 07:20:56PM +0900, Amit Langote wrote:
> I have been working away at this and have updated the patches for many
> cosmetic and some functional improvements.

Please note that this patch set fails to apply.  Could you provide a
rebase please?
--
Michael

Attachment

Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
Hi,

On Thu, Oct 1, 2020 at 1:32 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Sep 11, 2020 at 07:20:56PM +0900, Amit Langote wrote:
> > I have been working away at this and have updated the patches for many
> > cosmetic and some functional improvements.
>
> Please note that this patch set fails to apply.  Could you provide a
> rebase please?

Yeah, I'm working on posting an updated patch.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Fri, Sep 11, 2020 at 7:20 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Here are the commit messages of the attached patches:
>
> [PATCH v3 1/3] Overhaul how updates compute a new tuple
>
> Currently, the planner rewrites the top-level targetlist of an update
> statement's parsetree so that it contains entries for all attributes
> of the target relation, including for those columns that have not
> been changed.  This arrangement means that the executor can take a
> tuple that the plan produces, remove any junk attributes in it and
> pass it down to the table AM or FDW update API as the new tuple.
> It also means that in an inherited update, where there are multiple
> target relations, the planner must produce that many plans, because
> the targetlists for different target relations may not all look the
> same considering that child relations may have different sets of
> columns with varying attribute numbers.
>
> This commit revises things so that the planner no longer expands
> the parsetree targetlist to include unchanged columns so that the
> plan only produces values of the changed columns.  To make the new
> tuple to pass to table AM and FDW update API, executor now evaluates
> another targetlist matching the target table's TupleDesc which refers
> to the plan's output tuple to gets values of the changed columns and
> to the old tuple that is refetched for values of unchanged columns.
>
> To get values for unchanged columns to use when forming the new tuple
> to pass to ExecForeignUpdate(), we now require foreign scans to
> always include the wholerow Var corresponding to the old tuple being
> updated, because the unchanged columns are not present in the
> plan's targetlist.
>
> As a note to FDW authors, any FDW update planning APIs that look at
> the plan's targetlist for checking if it is pushable to remote side
> (e.g. PlanDirectModify) should now instead look at "update targetlist"
> that is set by the planner in PlannerInfo.update_tlist, because resnos
> in the plan's targetlist is no longer indexable by target column's
> attribute numbers.
>
> Note that even though the main goal of doing this is to avoid having
> to make multiple plans in the inherited update case, this commit does
> not touch that subject.  A subsequent commit will change things that
> are necessary to make inherited updates work with a single plan.

I tried to assess the performance impact of this rejiggering of how
updates are performed.  As to why one may think there may be a
negative impact, consider that ExecModifyTable() now has to perform an
extra fetch of the tuple being updated for filling in the unchanged
values of the update's NEW tuple, because the plan itself will only
produce the values of changed columns.

* Setup: a 10 column target table with a millions rows

create table test_update_10 (
        a       int,
        b       int             default NULL,
        c       int             default 0,
        d       text    default 'ddd',
        e       text    default 'eee',
        f       text    default 'fff',
        g       text    default 'ggg',
        h       text    default 'hhh',
        i       text    default 'iii',
        j       text    default 'jjj'
);
insert into test_update_1o (a) select generate_series(1, 1000000);

* pgbench test script (test_update_10.sql):

\set a random(1, 1000000)
update test_update_10 set b = :a where a = :a;

* TPS of `pgbench -n -T 120 -f test_update_10.sql`

HEAD:

tps = 10964.391120 (excluding connections establishing)
tps = 12142.456638 (excluding connections establishing)
tps = 11746.345270 (excluding connections establishing)
tps = 11959.602001 (excluding connections establishing)
tps = 12267.249378 (excluding connections establishing)

median: 11959.60

Patched:

tps = 11565.916170 (excluding connections establishing)
tps = 11952.491663 (excluding connections establishing)
tps = 11959.789308 (excluding connections establishing)
tps = 11699.611281 (excluding connections establishing)
tps = 11799.220930 (excluding connections establishing)

median: 11799.22

There is a slight impact but the difference seems within margin of error.

On the more optimistic side, I imagined that the trimming down of the
plan's targetlist to include only changed columns would boost
performance, especially with tables containing more columns, which is
not uncommon.  With 20 columns (additional columns are all filler ones
as shown in the 10-column example), the same benchmarks gives the
following numbers:

HEAD:

tps = 11401.691219 (excluding connections establishing)
tps = 11620.855088 (excluding connections establishing)
tps = 11285.469430 (excluding connections establishing)
tps = 10991.890904 (excluding connections establishing)
tps = 10847.433093 (excluding connections establishing)

median: 11285.46

Patched:

tps = 10958.443325 (excluding connections establishing)
tps = 11613.783817 (excluding connections establishing)
tps = 10940.129336 (excluding connections establishing)
tps = 10717.405272 (excluding connections establishing)
tps = 11691.330537 (excluding connections establishing)

median: 10958.44

Hmm, not so much.

With 40 columns:

HEAD:

tps = 9778.362149 (excluding connections establishing)
tps = 10004.792176 (excluding connections establishing)
tps = 9473.849373 (excluding connections establishing)
tps = 9776.931393 (excluding connections establishing)
tps = 9737.891870 (excluding connections establishing)

median: 9776.93

Patched:

tps = 10709.949043 (excluding connections establishing)
tps = 10754.160718 (excluding connections establishing)
tps = 10175.841480 (excluding connections establishing)
tps = 9973.729774 (excluding connections establishing)
tps = 10467.109679 (excluding connections establishing)

median: 10467.10

There you go.

Perhaps, the plan's bigger target list with HEAD does not cause a
significant overhead in the *simple* update like above, because most
of the work during execution is of fetching the tuple to update and of
actually updating it.  So, I also checked with a slightly more
complicated query containing a join:

\set a random(1, 1000000)
update test_update_10 t set b = foo.b from foo where t.a = foo.a and foo.b = :a;

where `foo` is defined as:

create table foo (a int, b int);
insert into foo select generate_series(1, 1000000);
create index on foo (b);

Looking at the EXPLAIN output of the query, one can see that the
target list is smaller after patching which can save some work:

HEAD:

explain (costs off, verbose) update test_update_10 t set b = foo.b
from foo where t.a = foo.a and foo.b = 1;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Update on public.test_update_10 t
   ->  Nested Loop
         Output: t.a, foo.b, t.c, t.d, t.e, t.f, t.g, t.h, t.i, t.j,
t.ctid, foo.ctid
         ->  Index Scan using foo_b_idx on public.foo
               Output: foo.b, foo.ctid, foo.a
               Index Cond: (foo.b = 1)
         ->  Index Scan using test_update_10_a_idx on public.test_update_10 t
               Output: t.a, t.c, t.d, t.e, t.f, t.g, t.h, t.i, t.j, t.ctid
               Index Cond: (t.a = foo.a)
(9 rows)

Patched:

explain (costs off, verbose) update test_update_10 t set b = foo.b
from foo where t.a = foo.a and foo.b = 1;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Update on public.test_update_10 t
   ->  Nested Loop
         Output: foo.b, t.ctid, foo.ctid
         ->  Index Scan using foo_b_idx on public.foo
               Output: foo.b, foo.ctid, foo.a
               Index Cond: (foo.b = 1)
         ->  Index Scan using test_update_10_a_idx on public.test_update_10 t
               Output: t.ctid, t.a
               Index Cond: (t.a = foo.a)
(9 rows)

And here are the TPS numbers for that query with 10, 20, 40 columns
table cases.  Note that the more columns the target table has, the
bigger the target list to compute is with HEAD.

10 columns:

HEAD:

tps = 7594.881268 (excluding connections establishing)
tps = 7660.451217 (excluding connections establishing)
tps = 7598.899951 (excluding connections establishing)
tps = 7413.397046 (excluding connections establishing)
tps = 7484.978635 (excluding connections establishing)

median: 7594.88

Patched:

tps = 7402.409104 (excluding connections establishing)
tps = 7532.776214 (excluding connections establishing)
tps = 7549.397016 (excluding connections establishing)
tps = 7512.321466 (excluding connections establishing)
tps = 7448.255418 (excluding connections establishing)

median: 7512.32

20 columns:

HEAD:

tps = 6842.674366 (excluding connections establishing)
tps = 7151.724481 (excluding connections establishing)
tps = 7093.727976 (excluding connections establishing)
tps = 7072.273547 (excluding connections establishing)
tps = 7040.350004 (excluding connections establishing)

median: 7093.72

Patched:

tps = 7362.941398 (excluding connections establishing)
tps = 7106.826433 (excluding connections establishing)
tps = 7353.507317 (excluding connections establishing)
tps = 7361.944770 (excluding connections establishing)
tps = 7072.027684 (excluding connections establishing)

median: 7353.50

40 columns:

HEAD:

tps = 6396.845818 (excluding connections establishing)
tps = 6383.105593 (excluding connections establishing)
tps = 6370.143763 (excluding connections establishing)
tps = 6370.455213 (excluding connections establishing)
tps = 6380.993666 (excluding connections establishing)

median: 6380.99

Patched:

tps = 7091.581813 (excluding connections establishing)
tps = 7036.805326 (excluding connections establishing)
tps = 7019.120007 (excluding connections establishing)
tps = 7025.704379 (excluding connections establishing)
tps = 6848.846667 (excluding connections establishing)

median: 7025.70

It seems clear that the saving on the target list computation overhead
that we get from the patch is hard to ignore in this case.

I've attached updated patches, because as Michael pointed out, the
previous version no longer applies.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Sun, Oct 4, 2020 at 11:44 AM Amit Langote <amitlangote09@gmail.com> wrote:
> On Fri, Sep 11, 2020 at 7:20 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Here are the commit messages of the attached patches:
> >
> > [PATCH v3 1/3] Overhaul how updates compute a new tuple
>
> I tried to assess the performance impact of this rejiggering of how
> updates are performed.  As to why one may think there may be a
> negative impact, consider that ExecModifyTable() now has to perform an
> extra fetch of the tuple being updated for filling in the unchanged
> values of the update's NEW tuple, because the plan itself will only
> produce the values of changed columns.
>
...
> It seems clear that the saving on the target list computation overhead
> that we get from the patch is hard to ignore in this case.
>
> I've attached updated patches, because as Michael pointed out, the
> previous version no longer applies.

Rebased over the recent executor result relation related commits.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: making update/delete of inheritance trees scale better

From
Heikki Linnakangas
Date:
On 29/10/2020 15:03, Amit Langote wrote:
> On Sun, Oct 4, 2020 at 11:44 AM Amit Langote <amitlangote09@gmail.com> wrote:
>> On Fri, Sep 11, 2020 at 7:20 PM Amit Langote <amitlangote09@gmail.com> wrote:
>>> Here are the commit messages of the attached patches:
>>>
>>> [PATCH v3 1/3] Overhaul how updates compute a new tuple
>>
>> I tried to assess the performance impact of this rejiggering of how
>> updates are performed.  As to why one may think there may be a
>> negative impact, consider that ExecModifyTable() now has to perform an
>> extra fetch of the tuple being updated for filling in the unchanged
>> values of the update's NEW tuple, because the plan itself will only
>> produce the values of changed columns.
>>
> ...
>> It seems clear that the saving on the target list computation overhead
>> that we get from the patch is hard to ignore in this case.
>>
>> I've attached updated patches, because as Michael pointed out, the
>> previous version no longer applies.
> 
> Rebased over the recent executor result relation related commits.

I also did some quick performance testing with a simple update designed 
as a worst-case scenario:

create unlogged table tab (a int4, b int4);
insert into tab select g, g from generate_series(1, 10000000) g;

\timing on
vacuum tab; update tab set b = b, a = a;

Without the patch, the update takes about 7.3 s on my laptop, and about 
8.3 s with the patch.

In this case, the patch fetches the old tuple, but it wouldn't really 
need to, because all the columns are updated. Could we optimize that 
special case?

In principle, it would sometimes also make sense to add the old columns 
to the targetlist like we used to, to avoid the fetch. But estimating 
when that's cheaper would be complicated.

Despite that, I like this new approach a lot. It's certainly much nicer 
than inheritance_planner().

- Heikki



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> I also did some quick performance testing with a simple update designed 
> as a worst-case scenario:

> vacuum tab; update tab set b = b, a = a;

> In this case, the patch fetches the old tuple, but it wouldn't really 
> need to, because all the columns are updated. Could we optimize that 
> special case?

I'm not following.  We need to read the old values of a and b for
the update source expressions, no?

(One could imagine realizing that this is a no-op update, but that
seems quite distinct from the problem at hand, and probably not
worth the cycles.)

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Heikki Linnakangas
Date:
On 30/10/2020 23:10, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> I also did some quick performance testing with a simple update designed
>> as a worst-case scenario:
> 
>> vacuum tab; update tab set b = b, a = a;
> 
>> In this case, the patch fetches the old tuple, but it wouldn't really
>> need to, because all the columns are updated. Could we optimize that
>> special case?
> 
> I'm not following.  We need to read the old values of a and b for
> the update source expressions, no?
> 
> (One could imagine realizing that this is a no-op update, but that
> seems quite distinct from the problem at hand, and probably not
> worth the cycles.)

Ah, no, that's not what I meant. You do need to read the old values to 
calculate the new ones, but if you update all the columns or if you 
happened to read all the old values as part of the scan, then you don't 
need to fetch the old tuple in the ModifyTable node.

Let's try better example. Currently with the patch:

postgres=# explain verbose update tab set a = 1;
                                    QUERY PLAN 

---------------------------------------------------------------------------------
  Update on public.tab  (cost=0.00..269603.27 rows=0 width=0)
    ->  Seq Scan on public.tab  (cost=0.00..269603.27 rows=10028327 
width=10)
          Output: 1, ctid

The Modify Table node will fetch the old tuple to get the value for 'b', 
which is unchanged. But if you do:

postgres=# explain verbose update tab set a = 1, b = 2;
                                    QUERY PLAN 

---------------------------------------------------------------------------------
  Update on public.tab  (cost=0.00..269603.27 rows=0 width=0)
    ->  Seq Scan on public.tab  (cost=0.00..269603.27 rows=10028327 
width=14)
          Output: 1, 2, ctid

The Modify Table will still fetch the old tuple, but in this case, it's 
not really necessary, because both columns are overwritten.

- Heikki



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> .... But if you do:

> postgres=# explain verbose update tab set a = 1, b = 2;
>                                     QUERY PLAN
> ---------------------------------------------------------------------------------
>   Update on public.tab  (cost=0.00..269603.27 rows=0 width=0)
>     ->  Seq Scan on public.tab  (cost=0.00..269603.27 rows=10028327
> width=14)
>           Output: 1, 2, ctid

> The Modify Table will still fetch the old tuple, but in this case, it's
> not really necessary, because both columns are overwritten.

Ah, that I believe.  Not sure it's a common enough case to spend cycles
looking for, though.

In any case, we still have to access the old tuple, don't we?
To lock it and update its t_ctid, whether or not we have use for
its user columns.  Maybe there's some gain from not having to
deconstruct the tuple, but it doesn't seem like it'd be much.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Heikki Linnakangas
Date:
On 31/10/2020 00:12, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> .... But if you do:
> 
>> postgres=# explain verbose update tab set a = 1, b = 2;
>>                                      QUERY PLAN
>> ---------------------------------------------------------------------------------
>>    Update on public.tab  (cost=0.00..269603.27 rows=0 width=0)
>>      ->  Seq Scan on public.tab  (cost=0.00..269603.27 rows=10028327
>> width=14)
>>            Output: 1, 2, ctid
> 
>> The Modify Table will still fetch the old tuple, but in this case, it's
>> not really necessary, because both columns are overwritten.
> 
> Ah, that I believe.  Not sure it's a common enough case to spend cycles
> looking for, though.
> 
> In any case, we still have to access the old tuple, don't we?
> To lock it and update its t_ctid, whether or not we have use for
> its user columns.  Maybe there's some gain from not having to
> deconstruct the tuple, but it doesn't seem like it'd be much.

Yeah, you need to access the old tuple to update its t_ctid, but 
accessing it twice is still more expensive than accessing it once. Maybe 
you could optimize it somewhat by keeping the buffer pinned or 
something. Or push the responsibility down to the table AM, passing the 
AM only the modified columns, and let the AM figure out how to deal with 
the columns that were not modified, hoping that it can do something smart.

It's indeed not a big deal in usual cases. The test case I constructed 
was deliberately bad, and the slowdown was only about 10%. I'm OK with 
that, but if there's an easy way to avoid it, we should. (Seems like 
there isn't.)

- Heikki



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Sat, Oct 31, 2020 at 7:26 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 31/10/2020 00:12, Tom Lane wrote:
> > Heikki Linnakangas <hlinnaka@iki.fi> writes:
> >> .... But if you do:
> >
> >> postgres=# explain verbose update tab set a = 1, b = 2;
> >>                                      QUERY PLAN
> >> ---------------------------------------------------------------------------------
> >>    Update on public.tab  (cost=0.00..269603.27 rows=0 width=0)
> >>      ->  Seq Scan on public.tab  (cost=0.00..269603.27 rows=10028327
> >> width=14)
> >>            Output: 1, 2, ctid
> >
> >> The Modify Table will still fetch the old tuple, but in this case, it's
> >> not really necessary, because both columns are overwritten.
> >
> > Ah, that I believe.  Not sure it's a common enough case to spend cycles
> > looking for, though.
> >
> > In any case, we still have to access the old tuple, don't we?
> > To lock it and update its t_ctid, whether or not we have use for
> > its user columns.  Maybe there's some gain from not having to
> > deconstruct the tuple, but it doesn't seem like it'd be much.

With the patched, the old tuple fetched by ModifyTable node will not
be deconstructed in this case, because all the values needed to form
the new tuple will be obtained from the plan's output tuple, so there
is no need to read the user columns from the old tuple.  Given that,
it indeed sounds a bit wasteful to have read the tuple as Heikki
points out, but again, that's in a rare case.

> Yeah, you need to access the old tuple to update its t_ctid, but
> accessing it twice is still more expensive than accessing it once. Maybe
> you could optimize it somewhat by keeping the buffer pinned or
> something.

The buffer containing the old tuple is already pinned first when
ExecModifyTable() fetches the tuple to form the new tuple, and then
when, in this example, heap_update() fetches it to update the old
tuple contents.

> Or push the responsibility down to the table AM, passing the
> AM only the modified columns, and let the AM figure out how to deal with
> the columns that were not modified, hoping that it can do something smart.

That sounds interesting, but maybe a sizable project on its own?

Thanks a lot for taking a look at this, BTW.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Heikki Linnakangas
Date:
On 29/10/2020 15:03, Amit Langote wrote:
> Rebased over the recent executor result relation related commits.

ModifyTablePath didn't get the memo that a ModifyTable can only have one 
subpath after these patches. Attached patch, on top of your v5 patches, 
cleans that up.

- Heikki

Attachment

Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Wed, Nov 11, 2020 at 9:11 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 29/10/2020 15:03, Amit Langote wrote:
> > Rebased over the recent executor result relation related commits.
>
> ModifyTablePath didn't get the memo that a ModifyTable can only have one
> subpath after these patches. Attached patch, on top of your v5 patches,
> cleans that up.

Ah, thought I'd taken care of that, thanks.  Attached v6.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Fri, Nov 13, 2020 at 6:52 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Nov 11, 2020 at 9:11 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > On 29/10/2020 15:03, Amit Langote wrote:
> > > Rebased over the recent executor result relation related commits.
> >
> > ModifyTablePath didn't get the memo that a ModifyTable can only have one
> > subpath after these patches. Attached patch, on top of your v5 patches,
> > cleans that up.
>
> Ah, thought I'd taken care of that, thanks.  Attached v6.

This got slightly broken due to the recent batch insert related
changes, so here is the rebased version.  I also made a few cosmetic
changes.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: making update/delete of inheritance trees scale better

From
Robert Haas
Date:
On Fri, Oct 30, 2020 at 6:26 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Yeah, you need to access the old tuple to update its t_ctid, but
> accessing it twice is still more expensive than accessing it once. Maybe
> you could optimize it somewhat by keeping the buffer pinned or
> something. Or push the responsibility down to the table AM, passing the
> AM only the modified columns, and let the AM figure out how to deal with
> the columns that were not modified, hoping that it can do something smart.

Just as a point of possible interest, back when I was working on
zheap, I sort of wanted to take this in the opposite direction. In
effect, a zheap tuple has system columns that don't exist for a heap
tuple, and you can't do an update or delete without knowing what the
values for those columns are, so zheap had to just refetch the tuple,
but that sucked in comparisons with the existing heap, which didn't
have to do the refetch. At the time, I thought maybe the right idea
would be to extend things so that a table AM could specify an
arbitrary set of system columns that needed to be bubbled up to the
point where the update or delete happens, but that seemed really
complicated to implement and I never tried. Here it seems like we're
thinking of going the other way, and just always doing the refetch.
That is of course fine for zheap comparative benchmarks: instead of
making zheap faster, we just make the heap slower!

Well, sort of. I didn't think about the benefits of the refetch
approach when the tuples are wide. That does cast a somewhat different
light on things. I suppose we could have both methods and choose the
one that seems likely to be faster in particular cases, but that seems
like way too much machinery. Maybe there's some way to further
optimize accessing the same tuple multiple times in rapid succession
to claw back some of the lost performance in the slow cases, but I
don't have a specific idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Tue, Jan 26, 2021 at 8:54 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Fri, Nov 13, 2020 at 6:52 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Wed, Nov 11, 2020 at 9:11 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > > On 29/10/2020 15:03, Amit Langote wrote:
> > > > Rebased over the recent executor result relation related commits.
> > >
> > > ModifyTablePath didn't get the memo that a ModifyTable can only have one
> > > subpath after these patches. Attached patch, on top of your v5 patches,
> > > cleans that up.
> >
> > Ah, thought I'd taken care of that, thanks.  Attached v6.
>
> This got slightly broken due to the recent batch insert related
> changes, so here is the rebased version.  I also made a few cosmetic
> changes.

Broken again, so rebased.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Wed, Jan 27, 2021 at 4:42 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Oct 30, 2020 at 6:26 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > Yeah, you need to access the old tuple to update its t_ctid, but
> > accessing it twice is still more expensive than accessing it once. Maybe
> > you could optimize it somewhat by keeping the buffer pinned or
> > something. Or push the responsibility down to the table AM, passing the
> > AM only the modified columns, and let the AM figure out how to deal with
> > the columns that were not modified, hoping that it can do something smart.
>
> Just as a point of possible interest, back when I was working on
> zheap, I sort of wanted to take this in the opposite direction. In
> effect, a zheap tuple has system columns that don't exist for a heap
> tuple, and you can't do an update or delete without knowing what the
> values for those columns are, so zheap had to just refetch the tuple,
> but that sucked in comparisons with the existing heap, which didn't
> have to do the refetch.

So would zheap refetch a tuple using the "ctid" column in the plan's
output tuple and then use some other columns from the fetched tuple to
actually do the update?

> At the time, I thought maybe the right idea
> would be to extend things so that a table AM could specify an
> arbitrary set of system columns that needed to be bubbled up to the
> point where the update or delete happens, but that seemed really
> complicated to implement and I never tried.

Currently, FDWs can specify tuple-identifying system columns, which
are added to the query's targetlist when rewriteTargetListUD() calls
the AddForeignUpdateTargets() API.

In rewriteTargetListUD(), one can see that the planner assumes that
all local tables, irrespective of their AM, use a "ctid" column to
identify their tuples:

    if (target_relation->rd_rel->relkind == RELKIND_RELATION ||
        target_relation->rd_rel->relkind == RELKIND_MATVIEW ||
        target_relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
    {
        /*
         * Emit CTID so that executor can find the row to update or delete.
         */
        var = makeVar(parsetree->resultRelation,
                      SelfItemPointerAttributeNumber,
                      TIDOID,
                      -1,
                      InvalidOid,
                      0);

        attrname = "ctid";
    }
    else if (target_relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
    {
        /*
         * Let the foreign table's FDW add whatever junk TLEs it wants.
         */
        FdwRoutine *fdwroutine;

        fdwroutine = GetFdwRoutineForRelation(target_relation, false);

        if (fdwroutine->AddForeignUpdateTargets != NULL)
            fdwroutine->AddForeignUpdateTargets(parsetree, target_rte,
                                                target_relation);

Maybe the first block could likewise ask the table AM if it prefers to
add a custom set of system columns or just add "ctid" otherwise?

> Here it seems like we're
> thinking of going the other way, and just always doing the refetch.

To be clear, the new refetch in ExecModifyTable() is to fill in the
unchanged columns in the new tuple.  If we rejigger the
table_tuple_update() API to receive a partial tuple (essentially
what's in 'planSlot' passed to ExecUpdate) as opposed to the full
tuple, we wouldn't need the refetch.

We'd need to teach a few other executor routines, such as
ExecWithCheckOptions(), ExecConstraints(), etc. to live with a partial
tuple but maybe that's doable with some effort.  We could even
optimize away evaluating any constraints if none of the constrained
columns are unchanged.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Robert Haas
Date:
On Thu, Feb 4, 2021 at 4:33 AM Amit Langote <amitlangote09@gmail.com> wrote:
> So would zheap refetch a tuple using the "ctid" column in the plan's
> output tuple and then use some other columns from the fetched tuple to
> actually do the update?

Yes.

> To be clear, the new refetch in ExecModifyTable() is to fill in the
> unchanged columns in the new tuple.  If we rejigger the
> table_tuple_update() API to receive a partial tuple (essentially
> what's in 'planSlot' passed to ExecUpdate) as opposed to the full
> tuple, we wouldn't need the refetch.

I don't think we should assume that every AM needs the unmodified
columns. Imagine a table AM that's a columnar store. Imagine that each
column is stored completely separately, so you have to look up the TID
once per column and then stick in the new values. Well, clearly you
want to skip this completely for columns that don't need to be
modified. If someone gives you all the columns it actually sucks,
because now you have to look them all up again just to figure out
which ones you need to change, whereas if they gave you only the
unmodified columns you could just do nothing for those and save a
bunch of work.

zheap, though, is always going to need to take another look at the
tuple to do the update, unless you can pass up some values through
hidden columns. I'm not exactly sure how expensive that really is,
though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Thu, Feb 4, 2021 at 10:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Feb 4, 2021 at 4:33 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > To be clear, the new refetch in ExecModifyTable() is to fill in the
> > unchanged columns in the new tuple.  If we rejigger the
> > table_tuple_update() API to receive a partial tuple (essentially
> > what's in 'planSlot' passed to ExecUpdate) as opposed to the full
> > tuple, we wouldn't need the refetch.
>
> I don't think we should assume that every AM needs the unmodified
> columns.  Imagine a table AM that's a columnar store. Imagine that each
> column is stored completely separately, so you have to look up the TID
> once per column and then stick in the new values. Well, clearly you
> want to skip this completely for columns that don't need to be
> modified. If someone gives you all the columns it actually sucks,
> because now you have to look them all up again just to figure out
> which ones you need to change, whereas if they gave you only the
> unmodified columns you could just do nothing for those and save a
> bunch of work.

Right, that's the idea in case I wasn't clear.  Currently, a slot
containing the full tuple is passed to the table AM, with or without
the patch.  The API:

static inline TM_Result
table_tuple_update(Relation rel, ItemPointer otid, TupleTableSlot *slot, ...

 describes its 'slot' parameter as:

 *  slot - newly constructed tuple data to store

We could, possibly in a follow-on patch, adjust the
table_tuple_update() API to only accept the changed values through the
slot.

> zheap, though, is always going to need to take another look at the
> tuple to do the update, unless you can pass up some values through
> hidden columns. I'm not exactly sure how expensive that really is,
> though.

I guess it would depend on how many of those hidden columns there need
to be (in addition to the existing "ctid" hidden column) and how many
levels of the plan tree they would need to climb through when bubbling
up.
--
Amit Langote
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Robert Haas
Date:
On Thu, Feb 4, 2021 at 10:14 PM Amit Langote <amitlangote09@gmail.com> wrote:
> I guess it would depend on how many of those hidden columns there need
> to be (in addition to the existing "ctid" hidden column) and how many
> levels of the plan tree they would need to climb through when bubbling
> up.

My memory is a bit fuzzy because I haven't looked at this in a while,
but I feel like there were basically two: a 64-bit UndoRecPtr and an
integer slot number. I could be misremembering, though.

It's a bit annoying that we percolate things up the tree the way we do
at all. I realize this is far afield from the topic of this thread.
But suppose that I join 5 tables and select a subset of the table
columns in the output. Suppose WLOG the join order is A-B-C-D-E. Well,
we're going to pull the columns that are needed from A and B and put
them into the slot representing the result of the A-B join. Then we're
going to take some columns from that slot and some columns from C and
put them into the slot representing the result of the A-B-C join. And
so on until we get to the top. But the slots for the A-B, A-B-C, and
A-B-C-D joins don't seem to really be needed. At any point in time,
the value for some column A.x should be the same in all of those
intermediate slots as it is in the current tuple for the baserel scan
of A. I *think* the only time it's different is when we've advanced
the scan for A but haven't gotten around to advancing the joins yet.
But that just underscores the point: if we didn't have all of these
intermediate slots around we wouldn't have to keep advancing them all
separately. Instead we could have the slot at the top, representing
the final join, pull directly from the baserel slots and skip all the
junk in the middle.

Maybe there's a real reason that won't work, but the only reason I
know about why it wouldn't work is that we don't have the bookkeeping
to make it work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> It's a bit annoying that we percolate things up the tree the way we do
> at all. I realize this is far afield from the topic of this thread.
> But suppose that I join 5 tables and select a subset of the table
> columns in the output. Suppose WLOG the join order is A-B-C-D-E. Well,
> we're going to pull the columns that are needed from A and B and put
> them into the slot representing the result of the A-B join. Then we're
> going to take some columns from that slot and some columns from C and
> put them into the slot representing the result of the A-B-C join. And
> so on until we get to the top. But the slots for the A-B, A-B-C, and
> A-B-C-D joins don't seem to really be needed.

You do realize that we're just copying Datums from one level to the
next?  For pass-by-ref data, the Datums generally all point at the
same physical data in some disk buffer ... or if they don't, it's
because the join method had a good reason to want to copy data.

If we didn't have the intermediate tuple slots, we'd have to have
some other scheme for identifying which data to examine in intermediate
join levels' quals.  Maybe you can devise a scheme that has less overhead,
but it's not immediately obvious that any huge win would be available.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Robert Haas
Date:
On Fri, Feb 5, 2021 at 12:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You do realize that we're just copying Datums from one level to the
> next?  For pass-by-ref data, the Datums generally all point at the
> same physical data in some disk buffer ... or if they don't, it's
> because the join method had a good reason to want to copy data.

I am older and dumber than I used to be, but I'm amused at the idea
that I might be old enough and dumb enough not to understand this. To
be honest, given that we are just copying the datums, I find it kind
of surprising that it causes us pain, but it clearly does. If you
think it's not an issue, then what of the email from Amit Langote to
which I was responding, or his earlier message at
http://postgr.es/m/CA+HiwqHUkwcy84uFfUA3qVsyU2pgTwxVkJx1uwPQFSHfPz4rsA@mail.gmail.com
which contains benchmark results?

As to why it causes us pain, I don't have a full picture of that.
Target list construction is one problem: we build all these target
lists for intermediate notes during planning and they're long enough
-- if the user has a bunch of columns -- and planning is cheap enough
for some queries that the sheer time to construct the list shows up
noticeably in profiles. I've seen that be a problem even for query
planning problems that involve just one table: a test that takes the
"physical tlist" path can be slower just because the time to construct
the longer tlist is significant and the savings from postponing tuple
deforming isn't. It seems impossible to believe that it can't also
hurt us on join queries that actually make use of a lot of columns, so
that they've all got to be included in tlists at every level of the
join tree. I believe that the execution-time overhead isn't entirely
trivial either. Sure, copying an 8-byte quantity is pretty cheap, but
if you have a lot of columns and you copy them a lot of times for each
of a lot of tuples, it adds up. Queries that do enough "real work"
e.g. calling expensive functions, forcing disk I/O, etc. will make the
effect of a bunch of x[i] = y[j] stuff unnoticeable, but there are
plenty of queries that don't really do anything expensive -- they're
doing simple joins of data that's already in memory. Even there,
accessing buffers figures to be more expensive because it's shared
memory with locking and cache line contention; but I don't think that
means we can completely ignore the performance impact of backend-local
computation. b8d7f053c5c2bf2a7e8734fe3327f6a8bc711755 is a good
example of getting a significant gain by refactoring to reduce
seemingly trivial overheads -- in that case, AIUI, the benefits are
around fewer function calls and better CPU branch prediction.

> If we didn't have the intermediate tuple slots, we'd have to have
> some other scheme for identifying which data to examine in intermediate
> join levels' quals.  Maybe you can devise a scheme that has less overhead,
> but it's not immediately obvious that any huge win would be available.

I agree. I'm inclined to suspect that some benefit is possible, but
that might be wrong and it sure doesn't look easy.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> As to why it causes us pain, I don't have a full picture of that.
> Target list construction is one problem: we build all these target
> lists for intermediate notes during planning and they're long enough
> -- if the user has a bunch of columns -- and planning is cheap enough
> for some queries that the sheer time to construct the list shows up
> noticeably in profiles.

Well, the tlist data structure is just about completely separate from
the TupleTableSlot mechanisms.  I'm more prepared to believe that
we could improve on the former, though I don't have any immediate
ideas about how.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Thu, Feb 4, 2021 at 3:22 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Jan 26, 2021 at 8:54 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Fri, Nov 13, 2020 at 6:52 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Wed, Nov 11, 2020 at 9:11 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > > > On 29/10/2020 15:03, Amit Langote wrote:
> > > > > Rebased over the recent executor result relation related commits.
> > > >
> > > > ModifyTablePath didn't get the memo that a ModifyTable can only have one
> > > > subpath after these patches. Attached patch, on top of your v5 patches,
> > > > cleans that up.
> > >
> > > Ah, thought I'd taken care of that, thanks.  Attached v6.
> >
> > This got slightly broken due to the recent batch insert related
> > changes, so here is the rebased version.  I also made a few cosmetic
> > changes.
>
> Broken again, so rebased.

Rebased.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: making update/delete of inheritance trees scale better

From
Robert Haas
Date:
On Wed, Mar 3, 2021 at 9:39 AM Amit Langote <amitlangote09@gmail.com> wrote:
> Just noticed that a test added by the recent 927f453a941 fails due to
> 0002.  We no longer allow moving a row into a postgres_fdw partition
> if it is among the UPDATE's result relations, whereas previously we
> would if the UPDATE on that partition is already finished.
>
> To fix, I've adjusted the test case.  Attached updated version.

I spent some time studying this patch this morning. As far as I can
see, 0001 is a relatively faithful implementation of the design Tom
proposed back in early 2019. I think it would be nice to either get
this committed or else decide that we don't want it and what we're
going to try to do instead, because we can't make UPDATEs and DELETEs
stop sucking with partitioned tables until we settle on some solution
to the problem that is inheritance_planner(), and that strikes me as
an *extremely* important problem. Lots of people are trying to use
partitioning in PostgreSQL, and they don't like finding out that, in
many cases, it makes things slower rather than faster. Neither this
nor any other patch is going to solve that problem in general, because
there are *lots* of things that haven't been well-optimized for
partitioning yet. But, this is a pretty important case that we should
really try to do something about.

So, that said, here are some random comments:

- I think it would be interesting to repeat your earlier benchmarks
using -Mprepared. One question I have is whether we're saving overhead
here during planning at the price of execution-time overhead, or
whether we're saving during both planning and execution.

- Until I went back and found your earlier remarks on this thread, I
was confused as to why you were replacing a JunkFilter with a
ProjectionInfo. I think it would be good to try to add some more
explicit comments about that design choice, perhaps as header comments
for ExecGetUpdateNewTuple, or maybe there's a better place. I'm still
not sure why we need to do the same thing for the insert case, which
seems to just be about removing junk columns. At least in the non-JIT
case, it seems to me that ExecJunkFilter() should be cheaper than
ExecProject(). Is it different enough to matter? Does the fact that we
can JIT the ExecProject() work make it actually faster? These are
things I don't know.

- There's a comment which you didn't write but just moved which I find
to be quite confusing. It says "For UPDATE/DELETE, find the
appropriate junk attr now. Typically, this will be a 'ctid' or
'wholerow' attribute, but in the case of a foreign data wrapper it
might be a set of junk attributes sufficient to identify the remote
row." But, the block of code which follows caters only to the 'ctid'
and 'wholerow' cases, not anything else. Perhaps that's explained by
the comment a bit further down which says "When there is a row-level
trigger, there should be a wholerow attribute," but if the point is
that this code is only reached when there's a row-level trigger,
that's far from obvious. It *looks* like something we'd reach for ANY
insert or UPDATE case. Maybe it's not your job to do anything about
this confusion, but I thought it was worth highlighting.

- The comment for filter_junk_tlist_entries(), needless to say, is of
the highest quality, but would it make any sense to think about having
an option for expand_tlist() to omit the junk entries itself, to avoid
extra work? I'm unclear whether we want that behavior in all UPDATE
cases or only some of them, because preproces_targetlist() has a call
to expand_tlist() to set parse->onConflict->onConflictSet that does
not call filter_junk_tlist_entries() on the result. Does this patch
need to make any changes to the handling of ON CONFLICT .. UPDATE? It
looks to me like right now it doesn't, but I don't know whether that's
an oversight or intentional.

- The output changes in the main regression test suite are pretty easy
to understand: we're just seeing columns that no longer need to get
fed through the execution get dropped. The changes in the postgres_fdw
results are harder to understand. In general, it appears that what's
happening is that we're no longer outputting the non-updated columns
individually -- which makes sense -- but now we're outputting a
whole-row var that wasn't there before, e.g.:

-         Output: foreign_tbl.a, (foreign_tbl.b + 15), foreign_tbl.ctid
+         Output: (foreign_tbl.b + 15), foreign_tbl.ctid, foreign_tbl.*

Since this is postgres_fdw, we can re-fetch the row using CTID, so
it's not clear to me why we need foreign_tbl.* when we didn't before.
Maybe the comments need improvement.

- Specifically, I think the comments in preptlist.c need some work.
You've edited the top-of-file comment, but I don't think it's really
accurate. The first sentence says "For INSERT and UPDATE, the
targetlist must contain an entry for each attribute of the target
relation in the correct order," but I don't think that's really true
any more. It's certainly not what you see in the EXPLAIN output. The
paragraph goes on to explain that UPDATE has a second target list, but
(a) that seems to contradict the first sentence and (b) that second
target list isn't what you see when you run EXPLAIN. Also, there's no
mention of what happens for FDWs here, but it's evidently different,
as per the previous review comment.

- The comments atop fix_join_expr() should be updated. Maybe you can
just adjust the wording for case #2.

OK, that's all I've got based on a first read-through. Thanks for your
work on this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I spent some time studying this patch this morning. As far as I can
> see, 0001 is a relatively faithful implementation of the design Tom
> proposed back in early 2019. I think it would be nice to either get
> this committed or else decide that we don't want it and what we're
> going to try to do instead,

Yeah, it's on my to-do list for this CF, but I expect it's going to
take some concentrated study and other things keep intruding :-(.

All of your comments/questions seem reasonable; thanks for taking
a look.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
I wrote:
> Yeah, it's on my to-do list for this CF, but I expect it's going to
> take some concentrated study and other things keep intruding :-(.

I've started to look at this seriously, and I wanted to make a note
about something that I think we should try to do, but there seems
little hope of shoehorning it in for v14.

That "something" is that ideally, the ModifyTable node should pass
only the updated column values to the table AM or FDW, and let that
lower-level code worry about reconstructing a full tuple by
re-fetching the unmodified columns.  When I first proposed this
concept, I'd imagined that maybe we could avoid the additional tuple
read that this implementation requires by combining it with the
tuple access that a heap UPDATE must do anyway to lock and outdate
the target tuple.  Another example of a possible win is Andres'
comment upthread that a columnar-storage AM would really rather
deal only with the modified columns.  Also, the patch as it stands
asks FDWs to supply all columns in a whole-row junk var, which is
something that might become unnecessary.

However, there are big stumbling blocks in the way.  ModifyTable
is responsible for applying CHECK constraints, which may require
looking at the values of not-modified columns.  An even bigger
problem is that per-row triggers currently expect to be given
the whole proposed row (and to be able to change columns not
already marked for update).  We could imagine redefining the
trigger APIs to reduce the overhead here, but that's certainly
not going to happen in time for v14.

So for now I think we have to stick with Amit's design of
reconstructing the full updated tuple at the outset in
ModifyTable, and then proceeding pretty much as updates
have done in the past.  But there's more we can do later.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Thu, Mar 25, 2021 at 4:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > Yeah, it's on my to-do list for this CF, but I expect it's going to
> > take some concentrated study and other things keep intruding :-(.
>
> I've started to look at this seriously,

Thanks a lot.

> and I wanted to make a note
> about something that I think we should try to do, but there seems
> little hope of shoehorning it in for v14.
>
> That "something" is that ideally, the ModifyTable node should pass
> only the updated column values to the table AM or FDW, and let that
> lower-level code worry about reconstructing a full tuple by
> re-fetching the unmodified columns.  When I first proposed this
> concept, I'd imagined that maybe we could avoid the additional tuple
> read that this implementation requires by combining it with the
> tuple access that a heap UPDATE must do anyway to lock and outdate
> the target tuple.  Another example of a possible win is Andres'
> comment upthread

(Ah, I think you mean Heikki's.)

> that a columnar-storage AM would really rather
> deal only with the modified columns.  Also, the patch as it stands
> asks FDWs to supply all columns in a whole-row junk var, which is
> something that might become unnecessary.

That would indeed be nice.  I had considered taking on the project to
revise FDW local (non-direct) update APIs to deal with being passed
only the values of changed columns, but hit some problems when
implementing that in postgres_fdw that I don't quite remember the
details of.  As you say below, we can pick that up later.

> However, there are big stumbling blocks in the way.  ModifyTable
> is responsible for applying CHECK constraints, which may require
> looking at the values of not-modified columns.  An even bigger
> problem is that per-row triggers currently expect to be given
> the whole proposed row (and to be able to change columns not
> already marked for update).  We could imagine redefining the
> trigger APIs to reduce the overhead here, but that's certainly
> not going to happen in time for v14.

Yeah, at least the trigger concerns look like they will take work we
better not do in the 2 weeks left in the v14 cycle.

> So for now I think we have to stick with Amit's design of
> reconstructing the full updated tuple at the outset in
> ModifyTable, and then proceeding pretty much as updates
> have done in the past.  But there's more we can do later.

Agreed.

I'm addressing Robert's comments and will post an updated patch by tomorrow.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> - Until I went back and found your earlier remarks on this thread, I
> was confused as to why you were replacing a JunkFilter with a
> ProjectionInfo. I think it would be good to try to add some more
> explicit comments about that design choice, perhaps as header comments
> for ExecGetUpdateNewTuple, or maybe there's a better place. I'm still
> not sure why we need to do the same thing for the insert case, which
> seems to just be about removing junk columns.

I wondered about that too; this patch allegedly isn't touching anything
interesting about INSERT cases, so why should we modify that?  However,
when I tried to poke at that, I discovered that it seems to be dead code
anyway.  A look at coverage.postgresql.org will show you that no
regression test reaches "junk_filter_needed = true" in
ExecInitModifyTable's inspection of INSERT tlists, and I've been unable to
create such a case manually either.  I think the reason is that the parser
initially builds all INSERT ... SELECT cases with the SELECT as an
explicit subquery, giving rise to a SubqueryScan node just below the
ModifyTable, which will project exactly the desired columns and no more.
We'll optimize away the SubqueryScan if it's a no-op projection, but not
if it is getting rid of junk columns.  There is room for more optimization
here: dropping the SubqueryScan in favor of making ModifyTable do the same
projection would win by removing one plan node's worth of overhead.  But
I don't think we need to panic about whether the projection is done with
ExecProject or a junk filter --- we'd be strictly ahead of the current
situation either way.

Given that background, I agree with Amit's choice to change this,
just to reduce the difference between how INSERT and UPDATE cases work.
For now, there's no performance difference anyway, since neither the
ProjectionInfo nor the JunkFilter code can be reached.  (Maybe a comment
about that would be useful.)

BTW, in the version of the patch that I'm working on (not ready to
post yet), I've thrown away everything that Amit did in setrefs.c
and tlist.c, so I don't recommend he spend time improving the comments
there ;-).  I did not much like the idea of building a full TargetList
for each partition; that's per-partition cycles and storage space that
we won't be able to reclaim with the 0002 patch.  And we don't really
need it, because there are only going to be three cases at runtime:
pull a column from the subplan result tuple, pull a column from the
old tuple, or insert a NULL for a dropped column.  So I've replaced
the per-target-table tlists with integer lists of the attnums of the
UPDATE's target columns in that table.  These are compact and they
don't require any further processing in setrefs.c, and the executor
can easily build a projection expression from that data.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Fri, Mar 26, 2021 at 3:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > - Until I went back and found your earlier remarks on this thread, I
> > was confused as to why you were replacing a JunkFilter with a
> > ProjectionInfo. I think it would be good to try to add some more
> > explicit comments about that design choice, perhaps as header comments
> > for ExecGetUpdateNewTuple, or maybe there's a better place. I'm still
> > not sure why we need to do the same thing for the insert case, which
> > seems to just be about removing junk columns.
>
> I wondered about that too; this patch allegedly isn't touching anything
> interesting about INSERT cases, so why should we modify that?  However,
> when I tried to poke at that, I discovered that it seems to be dead code
> anyway.  A look at coverage.postgresql.org will show you that no
> regression test reaches "junk_filter_needed = true" in
> ExecInitModifyTable's inspection of INSERT tlists, and I've been unable to
> create such a case manually either.

I noticed this too.

>  I think the reason is that the parser
> initially builds all INSERT ... SELECT cases with the SELECT as an
> explicit subquery, giving rise to a SubqueryScan node just below the
> ModifyTable, which will project exactly the desired columns and no more.
> We'll optimize away the SubqueryScan if it's a no-op projection, but not
> if it is getting rid of junk columns.  There is room for more optimization
> here: dropping the SubqueryScan in favor of making ModifyTable do the same
> projection would win by removing one plan node's worth of overhead.

Oh, so there could possibly be a case where ModifyTable would have to
do junk filtering for INSERTs, but IIUC only if the planner optimized
away junk-filtering-SubqueryScan nodes too?  I was thinking that
perhaps INSERTs used to need junk filtering in the past but no longer
and now it's just dead code.

>  But
> I don't think we need to panic about whether the projection is done with
> ExecProject or a junk filter --- we'd be strictly ahead of the current
> situation either way.

I would've liked to confirm that with a performance comparison, but no
test case exists to do so. :(

> Given that background, I agree with Amit's choice to change this,
> just to reduce the difference between how INSERT and UPDATE cases work.
>
> For now, there's no performance difference anyway, since neither the
> ProjectionInfo nor the JunkFilter code can be reached.
>  (Maybe a comment about that would be useful.)

I've added a comment in ExecInitModifyTable() around the block that
initializes new-tuple ProjectionInfo to say that INSERTs don't
actually need to use it today.

> BTW, in the version of the patch that I'm working on (not ready to
> post yet), I've thrown away everything that Amit did in setrefs.c
> and tlist.c, so I don't recommend he spend time improving the comments
> there ;-).

Oh, I removed filter_junk_tlist_entries() in my updated version too
prompted by Robert's comment, but haven't touched
set_update_tlist_references().

>  I did not much like the idea of building a full TargetList
> for each partition; that's per-partition cycles and storage space that
> we won't be able to reclaim with the 0002 patch.  And we don't really
> need it, because there are only going to be three cases at runtime:
> pull a column from the subplan result tuple, pull a column from the
> old tuple, or insert a NULL for a dropped column.  So I've replaced
> the per-target-table tlists with integer lists of the attnums of the
> UPDATE's target columns in that table.  These are compact and they
> don't require any further processing in setrefs.c, and the executor
> can easily build a projection expression from that data.

I remember that in the earliest unposted versions, I had made
ExecInitModifyTable() take up the burden of creating the targetlist
that the projection will compute, which is what your approach sounds
like.  However, I had abandoned it due to the concern that it possibly
hurt the prepared statements because we would build the targetlist on
every execution, whereas only once if the planner does it.

I'm just about done addressing Robert's comments, so will post an
update shortly.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> On Fri, Mar 26, 2021 at 3:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think the reason is that the parser
>> initially builds all INSERT ... SELECT cases with the SELECT as an
>> explicit subquery, giving rise to a SubqueryScan node just below the
>> ModifyTable, which will project exactly the desired columns and no more.
>> We'll optimize away the SubqueryScan if it's a no-op projection, but not
>> if it is getting rid of junk columns.  There is room for more optimization
>> here: dropping the SubqueryScan in favor of making ModifyTable do the same
>> projection would win by removing one plan node's worth of overhead.

> Oh, so there could possibly be a case where ModifyTable would have to
> do junk filtering for INSERTs, but IIUC only if the planner optimized
> away junk-filtering-SubqueryScan nodes too?  I was thinking that
> perhaps INSERTs used to need junk filtering in the past but no longer
> and now it's just dead code.

I'm honestly not very sure about that.  It's possible that there was
some state of the code in which we supported INSERT/SELECT but didn't
end up putting a SubqueryScan node in there, but if so it was a long
long time ago.  It looks like the SELECT-is-a-subquery parser logic
dates to 05e3d0ee8 of 2000-10-05, which was a long time before
ModifyTable existed as such.  I'm not interested enough to dig
further than that.

However, it's definitely true that we can now generate INSERT plans
where there's a SubqueryScan node that's not doing anything but
stripping junk columns, for instance

    INSERT INTO t SELECT x FROM t2 ORDER BY y;

where the ORDER BY has to be done with an explicit sort.  The
sort will be on a resjunk "y" column.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Wed, Mar 24, 2021 at 1:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 3, 2021 at 9:39 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > Just noticed that a test added by the recent 927f453a941 fails due to
> > 0002.  We no longer allow moving a row into a postgres_fdw partition
> > if it is among the UPDATE's result relations, whereas previously we
> > would if the UPDATE on that partition is already finished.
> >
> > To fix, I've adjusted the test case.  Attached updated version.
>
> I spent some time studying this patch this morning.

Thanks a lot for your time on this.

> As far as I can
> see, 0001 is a relatively faithful implementation of the design Tom
> proposed back in early 2019. I think it would be nice to either get
> this committed or else decide that we don't want it and what we're
> going to try to do instead, because we can't make UPDATEs and DELETEs
> stop sucking with partitioned tables until we settle on some solution
> to the problem that is inheritance_planner(), and that strikes me as
> an *extremely* important problem. Lots of people are trying to use
> partitioning in PostgreSQL, and they don't like finding out that, in
> many cases, it makes things slower rather than faster. Neither this
> nor any other patch is going to solve that problem in general, because
> there are *lots* of things that haven't been well-optimized for
> partitioning yet. But, this is a pretty important case that we should
> really try to do something about.
>
> So, that said, here are some random comments:
>
> - I think it would be interesting to repeat your earlier benchmarks
> using -Mprepared. One question I have is whether we're saving overhead
> here during planning at the price of execution-time overhead, or
> whether we're saving during both planning and execution.

Please see at the bottom of this reply.

> - Until I went back and found your earlier remarks on this thread, I
> was confused as to why you were replacing a JunkFilter with a
> ProjectionInfo. I think it would be good to try to add some more
> explicit comments about that design choice, perhaps as header comments
> for ExecGetUpdateNewTuple, or maybe there's a better place.

I think the comments around ri_projectNew that holds the
ProjectionInfo node explains this to some degree, especially the
comment in ExecInitModifyTable() that sets it.  I don't particularly
see a need to go into detail why JunkFilter is not suitable for the
task if we're no longer using it at all in nodeModifyTable.c.

> I'm still
> not sure why we need to do the same thing for the insert case, which
> seems to just be about removing junk columns.

I think I was hesitant to have both a ri_junkFilter and ri_projectNew
catering for inserts and update/delete respectively.

> At least in the non-JIT
> case, it seems to me that ExecJunkFilter() should be cheaper than
> ExecProject(). Is it different enough to matter? Does the fact that we
> can JIT the ExecProject() work make it actually faster? These are
> things I don't know.

ExecJunkFilter() indeed looks cheaper on a first look for simple junk
filtering, but as Tom also found out, there's actually no test case
involving INSERT to do the actual performance comparison with.

> - There's a comment which you didn't write but just moved which I find
> to be quite confusing. It says "For UPDATE/DELETE, find the
> appropriate junk attr now. Typically, this will be a 'ctid' or
> 'wholerow' attribute, but in the case of a foreign data wrapper it
> might be a set of junk attributes sufficient to identify the remote
> row." But, the block of code which follows caters only to the 'ctid'
> and 'wholerow' cases, not anything else. Perhaps that's explained by
> the comment a bit further down which says "When there is a row-level
> trigger, there should be a wholerow attribute," but if the point is
> that this code is only reached when there's a row-level trigger,
> that's far from obvious. It *looks* like something we'd reach for ANY
> insert or UPDATE case. Maybe it's not your job to do anything about
> this confusion, but I thought it was worth highlighting.

I do remember being confused by that note regarding the junk
attributes required by FDWs for their result relations when I first
saw it, but eventually found out that it's talking about the
information about junk attributes that FDWs track in their *private*
data structures.  For example, postgres_fdw uses
PgFdwModifyState.ctidAttno to record the index of the "ctid" TLE in
the source plan's targetlist.  It is used, for example, by
postgresExecForeignUpdate() to extract the ctid from the plan tuple
passed to it and pass the value as parameter for the remote query:
update remote_tab set ... where ctid = $1.

I've clarified the comment to make that a bit clear.

> - The comment for filter_junk_tlist_entries(), needless to say, is of
> the highest quality,

Sorry, it was a copy-paste job.

> but would it make any sense to think about having
> an option for expand_tlist() to omit the junk entries itself, to avoid
> extra work? I'm unclear whether we want that behavior in all UPDATE
> cases or only some of them, because preproces_targetlist() has a call
> to expand_tlist() to set parse->onConflict->onConflictSet that does
> not call filter_junk_tlist_entries() on the result.

I added an exclude_junk parameter to expand_targetlist() and passed
false for it in all sites except make_update_tlist(), including where
it's called on parse->onConflict->onConflictSet.

Although, make_update_tlist() and related code may have been
superseded by Tom's WIP patch.

> Does this patch
> need to make any changes to the handling of ON CONFLICT .. UPDATE? It
> looks to me like right now it doesn't, but I don't know whether that's
> an oversight or intentional.

I intentionally didn't bother with changing any part of the ON
CONFLICT UPDATE case, mainly because INSERTs don't have a
inheritance_planner() problem.  We may want to revisit that in the
future if we decide to revise the ExecUpdate() API to not pass the
fully-reconstructed new tuple, which this patch doesn't do.

> - The output changes in the main regression test suite are pretty easy
> to understand: we're just seeing columns that no longer need to get
> fed through the execution get dropped. The changes in the postgres_fdw
> results are harder to understand. In general, it appears that what's
> happening is that we're no longer outputting the non-updated columns
> individually -- which makes sense -- but now we're outputting a
> whole-row var that wasn't there before, e.g.:
>
> -         Output: foreign_tbl.a, (foreign_tbl.b + 15), foreign_tbl.ctid
> +         Output: (foreign_tbl.b + 15), foreign_tbl.ctid, foreign_tbl.*
>
> Since this is postgres_fdw, we can re-fetch the row using CTID, so
> it's not clear to me why we need foreign_tbl.* when we didn't before.
> Maybe the comments need improvement.

ExecForeignUpdate FDW API expects being passed a fully-formed new
tuple, even though it will typically only access the changed columns
from that tuple to pass in the remote update query.  There is a
comment in rewriteTargetListUD() to explain this, which I have updated
somewhat to read as follows:

        /*
         * ExecUpdate() needs to pass a full new tuple to be assigned to the
         * result relation to ExecForeignUpdate(), although the plan will have
         * produced values for only the changed columns.  Here we ask the FDW
         * to fetch wholerow to serve as the side channel for getting the
         * values of the unchanged columns when constructing the full tuple to
         * be passed to ExecForeignUpdate().  Actually, we only really need
         * this for UPDATEs that are not pushed to the remote side, but whether
         * or not the pushdown will occur is not clear when this function is
         * called, so we ask for wholerow anyway.
         *
         * We will also need the "old" tuple if there are any row triggers.
         */

> - Specifically, I think the comments in preptlist.c need some work.
> You've edited the top-of-file comment, but I don't think it's really
> accurate. The first sentence says "For INSERT and UPDATE, the
> targetlist must contain an entry for each attribute of the target
> relation in the correct order," but I don't think that's really true
> any more. It's certainly not what you see in the EXPLAIN output. The
> paragraph goes on to explain that UPDATE has a second target list, but
> (a) that seems to contradict the first sentence and (b) that second
> target list isn't what you see when you run EXPLAIN. Also, there's no
> mention of what happens for FDWs here, but it's evidently different,
> as per the previous review comment.

It seems Tom has other things in mind for what I've implemented as
update_tlist, so I will leave this alone.

> - The comments atop fix_join_expr() should be updated. Maybe you can
> just adjust the wording for case #2.

Apparently the changes in setrefs.c are being thrown out as well in
Tom's patch, so likewise I will leave this alone.


Attached updated version of the patch.  I have forgotten to mention in
my recent posts on this thread one thing about 0001 that I had
mentioned upthread back in June.  That it currently fails a test in
postgres_fdw's suite due to a bug of cross-partition updates that I
decided at the time to pursue in another thread:
https://www.postgresql.org/message-id/CA%2BHiwqE_UK1jTSNrjb8mpTdivzd3dum6mK--xqKq0Y9VmfwWQA%40mail.gmail.com

That bug is revealed due to some changes that 0001 makes.  However, it
does not matter after applying 0002, because the current way of having
one plan per result relation is a precondition for that bug to
manifest.  So, if we are to apply only 0001 first, then I'm afraid we
would have take care of that bug before applying 0001.

Finally, here are the detailed results of the benchmarks I redid to
check the performance implications of doing UPDATEs the new way,
comparing master and 0001.

Repeated 2 custom pgbench tests against the UPDATE target tables
containing 10, 20, 40, and 80 columns.  The 2 custom tests are as
follows:

nojoin:

\set a random(1, 1000000)
update test_table t set b = :a where a = :a;

join:

\set a random(1, 1000000)
update test_table t set b = foo.b from foo where t.a = foo.a and foo.b = :a;

foo has just 2 integer columns a, b, with an index on b.

Checked using both -Msimple and -Mprepared this time, whereas I had
only checked the former the last time.

I'd summarize the results I see as follows:

In -Msimple mode, patched wins by a tiny margin for both nojoin and
join cases at 10, 20 columns, and by slightly larger margin at 40, 80
columns with the join case showing bigger margin than nojoin.

In -Mprepared mode, where the numbers are a bit noisy, I can only tell
clearly that the patched wins by a very wide margin for the join case
at 40, 80 columns, without a clear winner in other cases.

To answer Robert's questions in this regard:

> One question I have is whether we're saving overhead
> here during planning at the price of execution-time overhead, or
> whether we're saving during both planning and execution.

Smaller targetlists due to the patch at least help the patched end up
on the better side of tps comparison.  Maybe this aspect helps reduce
both the planning and execution time.  As for whether the results
reflect negatively on the fact that we now fetch the tuple one more
time to construct the new tuple, I don't quite see that to be the
case.

Raw tps figures (each case repeated 3 times) follow.  I'm also
attaching (a hopefully self-contained) shell script file
(test_update.sh) that you can run to reproduce the numbers for the
various cases.

10 columns

nojoin simple master
tps = 12278.749205 (without initial connection time)
tps = 11537.051718 (without initial connection time)
tps = 12312.717990 (without initial connection time)
nojoin simple patched
tps = 12160.125784 (without initial connection time)
tps = 12170.271905 (without initial connection time)
tps = 12212.037774 (without initial connection time)

nojoin prepared master
tps = 12228.149183 (without initial connection time)
tps = 12509.135100 (without initial connection time)
tps = 11698.161145 (without initial connection time)
nojoin prepared patched
tps = 13033.005860 (without initial connection time)
tps = 14690.203013 (without initial connection time)
tps = 15083.096511 (without initial connection time)

join simple master
tps = 9112.059568 (without initial connection time)
tps = 10730.739559 (without initial connection time)
tps = 10663.677821 (without initial connection time)
join simple patched
tps = 10980.139631 (without initial connection time)
tps = 10887.743691 (without initial connection time)
tps = 10929.663379 (without initial connection time)

join prepared master
tps = 21333.421825 (without initial connection time)
tps = 23895.538826 (without initial connection time)
tps = 24761.384786 (without initial connection time)
join prepared patched
tps = 25665.062858 (without initial connection time)
tps = 25037.391119 (without initial connection time)
tps = 25421.839842 (without initial connection time)

20 columns

nojoin simple master
tps = 11215.161620 (without initial connection time)
tps = 11306.536537 (without initial connection time)
tps = 11310.776393 (without initial connection time)
nojoin simple patched
tps = 11791.107767 (without initial connection time)
tps = 11757.933141 (without initial connection time)
tps = 11743.983647 (without initial connection time)

nojoin prepared master
tps = 17144.510719 (without initial connection time)
tps = 14032.133587 (without initial connection time)
tps = 15678.801224 (without initial connection time)
nojoin prepared patched
tps = 16603.131255 (without initial connection time)
tps = 14703.564675 (without initial connection time)
tps = 13652.827905 (without initial connection time)

join simple master
tps = 9637.904229 (without initial connection time)
tps = 9869.163480 (without initial connection time)
tps = 9865.673335 (without initial connection time)
join simple patched
tps = 10779.705826 (without initial connection time)
tps = 10790.961520 (without initial connection time)
tps = 10917.759963 (without initial connection time)

join prepared master
tps = 23030.120609 (without initial connection time)
tps = 22347.620338 (without initial connection time)
tps = 24227.376933 (without initial connection time)
join prepared patched
tps = 22303.689184 (without initial connection time)
tps = 24507.395745 (without initial connection time)
tps = 25219.535413 (without initial connection time)

40 columns

nojoin simple master
tps = 10348.352638 (without initial connection time)
tps = 9978.449528 (without initial connection time)
tps = 10024.132430 (without initial connection time)
nojoin simple patched
tps = 10169.485989 (without initial connection time)
tps = 10239.297780 (without initial connection time)
tps = 10643.076675 (without initial connection time)

nojoin prepared master
tps = 13606.361325 (without initial connection time)
tps = 15815.149553 (without initial connection time)
tps = 15940.675165 (without initial connection time)
nojoin prepared patched
tps = 13889.450942 (without initial connection time)
tps = 13406.879350 (without initial connection time)
tps = 15640.326344 (without initial connection time)

join simple master
tps = 9235.503480 (without initial connection time)
tps = 9244.756832 (without initial connection time)
tps = 8785.542317 (without initial connection time)
join simple patched
tps = 10106.285796 (without initial connection time)
tps = 10375.248536 (without initial connection time)
tps = 10357.087162 (without initial connection time)

join prepared master
tps = 18795.665779 (without initial connection time)
tps = 17650.815736 (without initial connection time)
tps = 20903.206602 (without initial connection time)
join prepared patched
tps = 24706.505207 (without initial connection time)
tps = 22867.751793 (without initial connection time)
tps = 23589.244380 (without initial connection time)

80 columns

nojoin simple master
tps = 8281.679334 (without initial connection time)
tps = 7517.657106 (without initial connection time)
tps = 8509.366647 (without initial connection time)
nojoin simple patched
tps = 9200.437258 (without initial connection time)
tps = 9349.939671 (without initial connection time)
tps = 9128.197101 (without initial connection time)

nojoin prepared master
tps = 12975.410783 (without initial connection time)
tps = 13486.858443 (without initial connection time)
tps = 10994.355244 (without initial connection time)
nojoin prepared patched
tps = 14266.725696 (without initial connection time)
tps = 15250.258418 (without initial connection time)
tps = 13356.236075 (without initial connection time)

join simple master
tps = 7678.440018 (without initial connection time)
tps = 7699.796166 (without initial connection time)
tps = 7880.407359 (without initial connection time)
join simple patched
tps = 9552.413096 (without initial connection time)
tps = 9469.579290 (without initial connection time)
tps = 9584.026033 (without initial connection time)

join prepared master
tps = 18390.262404 (without initial connection time)
tps = 18754.121500 (without initial connection time)
tps = 20355.875827 (without initial connection time)
join prepared patched
tps = 24041.648927 (without initial connection time)
tps = 22510.192030 (without initial connection time)
tps = 21825.870402 (without initial connection time)

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> Attached updated version of the patch.  I have forgotten to mention in
> my recent posts on this thread one thing about 0001 that I had
> mentioned upthread back in June.  That it currently fails a test in
> postgres_fdw's suite due to a bug of cross-partition updates that I
> decided at the time to pursue in another thread:
> https://www.postgresql.org/message-id/CA%2BHiwqE_UK1jTSNrjb8mpTdivzd3dum6mK--xqKq0Y9VmfwWQA%40mail.gmail.com

Yeah, I ran into that too.  I think we need not try to fix it in HEAD;
we aren't likely to commit 0001 and 0002 separately.  We need some fix
for the back branches, but that would better be discussed in the other
thread.  (Note that the version of 0001 I attach below shows the actual
output of the postgres_fdw test, including a failure from said bug.)

I wanted to give a data dump of where I am.  I've reviewed and
nontrivially modified 0001 and the executor parts of 0002, and
I'm fairly happy with the state of that much of the code now.
(Note that 0002 below contains some cosmetic fixes, such as comments,
that logically belong in 0001, but I didn't bother to tidy that up
since I'm not seeing these as separate commits anyway.)

The planner, however, still needs a lot of work.  There's a serious
functional problem, in that UPDATEs across partition trees having
more than one foreign table fail with

ERROR:  junk column "wholerow" of child relation 5 conflicts with parent junk column with same name

(cf. multiupdate.sql test case attached).  I think we could get around
that by requiring "wholerow" junk attrs to have vartype RECORDOID instead
of the particular table's rowtype, which might also remove the need for
some of the vartype translation hacking in 0002.  But I haven't tried yet.

More abstractly, I really dislike the "fake variable" design, primarily
the aspect that you made the fake variables look like real columns of
the parent table with attnums just beyond the last real one.  I think
this is just a recipe for obscuring bugs, since it means you have to
lobotomize a lot of bad-attnum error checks.  The alternative I'm
considering is to invent a separate RTE that holds all the junk columns.
Haven't tried that yet either.

The situation in postgres_fdw is not great either, specifically this
change:

@@ -2054,8 +2055,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
      */
     if (plan && plan->operation == CMD_UPDATE &&
         (resultRelInfo->ri_usesFdwDirectModify ||
-         resultRelInfo->ri_FdwState) &&
-        resultRelInfo > mtstate->resultRelInfo + mtstate->mt_whichplan)
+         resultRelInfo->ri_FdwState))
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                  errmsg("cannot route tuples into foreign table to be updated \"%s\"",

which is what forced you to remove or lobotomize several regression
test cases.  Now admittedly, that just moves the state of play for
cross-partition updates into postgres_fdw partitions from "works
sometimes" to "works never".  But I don't like the idea that we'll
be taking away actual functionality.

I have a blue-sky idea for fixing that properly, which is to disable FDW
direct updates when there is a possibility of a cross-partition update,
instead doing it the old way with a remote cursor reading the source rows
for later UPDATEs.  (If anyone complains that this is too slow, my answer
is "it can be arbitrarily fast when it doesn't have to give the right
answer".  Failing on cross-partition updates isn't acceptable.)  The point
is that once we have issued DECLARE CURSOR, the cursor's view of the
source data is static so it doesn't matter if we insert new rows into the
remote table.  The hard part of that is to make sure that the DECLARE
CURSOR gets issued before any updates from other partitions can arrive,
which I think means we'd need to issue it during plan tree startup not at
first fetch from the ForeignScan node.  Maybe that happens already, or
maybe we'd need a new/repurposed FDW API call.  I've not researched it.

Anyway, I'd really like to get this done for v14, so I'm going to buckle
down and try to fix the core-planner issues I mentioned.  It'd be nice
if somebody could look at fixing the postgres_fdw problem in parallel
with that.

            regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6faf499f9a..cff23b0211 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1867,6 +1867,7 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
  * 'foreignrel' is the RelOptInfo for the target relation or the join relation
  *        containing all base relations in the query
  * 'targetlist' is the tlist of the underlying foreign-scan plan node
+ *        (note that this only contains new-value expressions and junk attrs)
  * 'targetAttrs' is the target columns of the UPDATE
  * 'remote_conds' is the qual clauses that must be evaluated remotely
  * '*params_list' is an output list of exprs that will become remote Params
@@ -1888,8 +1889,8 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
     deparse_expr_cxt context;
     int            nestlevel;
     bool        first;
-    ListCell   *lc;
     RangeTblEntry *rte = planner_rt_fetch(rtindex, root);
+    ListCell   *lc, *lc2;

     /* Set up context struct for recursion */
     context.root = root;
@@ -1908,14 +1909,13 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
     nestlevel = set_transmission_modes();

     first = true;
-    foreach(lc, targetAttrs)
+    forboth(lc, targetlist, lc2, targetAttrs)
     {
-        int            attnum = lfirst_int(lc);
-        TargetEntry *tle = get_tle_by_resno(targetlist, attnum);
+        TargetEntry *tle = lfirst_node(TargetEntry, lc);
+        int attnum = lfirst_int(lc2);

-        if (!tle)
-            elog(ERROR, "attribute number %d not found in UPDATE targetlist",
-                 attnum);
+        /* update's new-value expressions shouldn't be resjunk */
+        Assert(!tle->resjunk);

         if (!first)
             appendStringInfoString(buf, ", ");
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b81c..b46e7e623f 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -5503,13 +5503,13 @@ UPDATE ft2 AS target SET (c2, c7) = (
         FROM ft2 AS src
         WHERE target.c1 = src.c1
 ) WHERE c1 > 1100;
-                                                                    QUERY PLAN
                            

----------------------------------------------------------------------------------------------------------------------------------------------------
+                                                      QUERY PLAN


+-----------------------------------------------------------------------------------------------------------------------
  Update on public.ft2 target
    Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2, c7 = $3 WHERE ctid = $1
    ->  Foreign Scan on public.ft2 target
-         Output: target.c1, $1, NULL::integer, target.c3, target.c4, target.c5, target.c6, $2, target.c8, (SubPlan 1
(returns$1,$2)), target.ctid 
-         Remote SQL: SELECT "C 1", c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 1100)) FOR UPDATE
+         Output: $1, $2, (SubPlan 1 (returns $1,$2)), target.ctid, target.*
+         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 1100)) FOR UPDATE
          SubPlan 1 (returns $1,$2)
            ->  Foreign Scan on public.ft2 src
                  Output: (src.c2 * 10), src.c7
@@ -5539,9 +5539,9 @@ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
    ->  Foreign Scan on public.ft2
-         Output: c1, c2, NULL::integer, 'bar'::text, c4, c5, c6, c7, c8, ctid
+         Output: 'bar'::text, ctid, ft2.*
          Filter: (postgres_fdw_abs(ft2.c1) > 2000)
-         Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" FOR UPDATE
+         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" FOR UPDATE
 (7 rows)

 UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
@@ -5570,11 +5570,11 @@ UPDATE ft2 SET c3 = 'baz'
    Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2,
ft5.c3
    Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
    ->  Nested Loop
-         Output: ft2.c1, ft2.c2, NULL::integer, 'baz'::text, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*,
ft5.*,ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3 
+         Output: 'baz'::text, ft2.ctid, ft2.*, ft4.*, ft5.*, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
          Join Filter: (ft2.c2 === ft4.c1)
          ->  Foreign Scan on public.ft2
-               Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid
-               Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR
UPDATE
+               Output: ft2.ctid, ft2.*, ft2.c2
+               Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR
UPDATE
          ->  Foreign Scan
                Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
                Relations: (public.ft4) INNER JOIN (public.ft5)
@@ -6266,7 +6266,7 @@ UPDATE rw_view SET b = b + 5;
  Update on public.foreign_tbl
    Remote SQL: UPDATE public.base_tbl SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Foreign Scan on public.foreign_tbl
-         Output: foreign_tbl.a, (foreign_tbl.b + 5), foreign_tbl.ctid
+         Output: (foreign_tbl.b + 5), foreign_tbl.ctid, foreign_tbl.*
          Remote SQL: SELECT a, b, ctid FROM public.base_tbl WHERE ((a < b)) FOR UPDATE
 (5 rows)

@@ -6280,7 +6280,7 @@ UPDATE rw_view SET b = b + 15;
  Update on public.foreign_tbl
    Remote SQL: UPDATE public.base_tbl SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Foreign Scan on public.foreign_tbl
-         Output: foreign_tbl.a, (foreign_tbl.b + 15), foreign_tbl.ctid
+         Output: (foreign_tbl.b + 15), foreign_tbl.ctid, foreign_tbl.*
          Remote SQL: SELECT a, b, ctid FROM public.base_tbl WHERE ((a < b)) FOR UPDATE
 (5 rows)

@@ -6354,7 +6354,7 @@ UPDATE rw_view SET b = b + 5;
    Foreign Update on public.foreign_tbl parent_tbl_1
      Remote SQL: UPDATE public.child_tbl SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Foreign Scan on public.foreign_tbl parent_tbl_1
-         Output: parent_tbl_1.a, (parent_tbl_1.b + 5), parent_tbl_1.ctid
+         Output: (parent_tbl_1.b + 5), parent_tbl_1.ctid, parent_tbl_1.*
          Remote SQL: SELECT a, b, ctid FROM public.child_tbl WHERE ((a < b)) FOR UPDATE
 (6 rows)

@@ -6369,7 +6369,7 @@ UPDATE rw_view SET b = b + 15;
    Foreign Update on public.foreign_tbl parent_tbl_1
      Remote SQL: UPDATE public.child_tbl SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Foreign Scan on public.foreign_tbl parent_tbl_1
-         Output: parent_tbl_1.a, (parent_tbl_1.b + 15), parent_tbl_1.ctid
+         Output: (parent_tbl_1.b + 15), parent_tbl_1.ctid, parent_tbl_1.*
          Remote SQL: SELECT a, b, ctid FROM public.child_tbl WHERE ((a < b)) FOR UPDATE
 (6 rows)

@@ -6686,7 +6686,7 @@ UPDATE rem1 set f1 = 10;          -- all columns should be transmitted
  Update on public.rem1
    Remote SQL: UPDATE public.loc1 SET f1 = $2, f2 = $3 WHERE ctid = $1
    ->  Foreign Scan on public.rem1
-         Output: 10, f2, ctid, rem1.*
+         Output: 10, ctid, rem1.*
          Remote SQL: SELECT f1, f2, ctid FROM public.loc1 FOR UPDATE
 (5 rows)

@@ -6919,7 +6919,7 @@ UPDATE rem1 set f2 = '';          -- can't be pushed down
  Update on public.rem1
    Remote SQL: UPDATE public.loc1 SET f1 = $2, f2 = $3 WHERE ctid = $1
    ->  Foreign Scan on public.rem1
-         Output: f1, ''::text, ctid, rem1.*
+         Output: ''::text, ctid, rem1.*
          Remote SQL: SELECT f1, f2, ctid FROM public.loc1 FOR UPDATE
 (5 rows)

@@ -6943,7 +6943,7 @@ UPDATE rem1 set f2 = '';          -- can't be pushed down
  Update on public.rem1
    Remote SQL: UPDATE public.loc1 SET f2 = $2 WHERE ctid = $1 RETURNING f1, f2
    ->  Foreign Scan on public.rem1
-         Output: f1, ''::text, ctid, rem1.*
+         Output: ''::text, ctid, rem1.*
          Remote SQL: SELECT f1, f2, ctid FROM public.loc1 FOR UPDATE
 (5 rows)

@@ -7253,18 +7253,18 @@ select * from bar where f1 in (select f1 from foo) for share;
 -- Check UPDATE with inherited target and an inherited source table
 explain (verbose, costs off)
 update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
-                                           QUERY PLAN
--------------------------------------------------------------------------------------------------
+                                      QUERY PLAN
+---------------------------------------------------------------------------------------
  Update on public.bar
    Update on public.bar
    Foreign Update on public.bar2 bar_1
      Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1
    ->  Hash Join
-         Output: bar.f1, (bar.f2 + 100), bar.ctid, foo.ctid, foo.*, foo.tableoid
+         Output: (bar.f2 + 100), bar.ctid, foo.ctid, foo.*, foo.tableoid
          Inner Unique: true
          Hash Cond: (bar.f1 = foo.f1)
          ->  Seq Scan on public.bar
-               Output: bar.f1, bar.f2, bar.ctid
+               Output: bar.f2, bar.ctid, bar.f1
          ->  Hash
                Output: foo.ctid, foo.f1, foo.*, foo.tableoid
                ->  HashAggregate
@@ -7277,11 +7277,11 @@ update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
                                  Output: foo_2.ctid, foo_2.f1, foo_2.*, foo_2.tableoid
                                  Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct1
    ->  Hash Join
-         Output: bar_1.f1, (bar_1.f2 + 100), bar_1.f3, bar_1.ctid, foo.ctid, foo.*, foo.tableoid
+         Output: (bar_1.f2 + 100), bar_1.ctid, bar_1.*, foo.ctid, foo.*, foo.tableoid
          Inner Unique: true
          Hash Cond: (bar_1.f1 = foo.f1)
          ->  Foreign Scan on public.bar2 bar_1
-               Output: bar_1.f1, bar_1.f2, bar_1.f3, bar_1.ctid
+               Output: bar_1.f2, bar_1.ctid, bar_1.*, bar_1.f1
                Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
          ->  Hash
                Output: foo.ctid, foo.f1, foo.*, foo.tableoid
@@ -7321,7 +7321,7 @@ where bar.f1 = ss.f1;
    Foreign Update on public.bar2 bar_1
      Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1
    ->  Hash Join
-         Output: bar.f1, (bar.f2 + 100), bar.ctid, (ROW(foo.f1))
+         Output: (bar.f2 + 100), bar.ctid, (ROW(foo.f1))
          Hash Cond: (foo.f1 = bar.f1)
          ->  Append
                ->  Seq Scan on public.foo
@@ -7335,17 +7335,17 @@ where bar.f1 = ss.f1;
                      Output: ROW((foo_3.f1 + 3)), (foo_3.f1 + 3)
                      Remote SQL: SELECT f1 FROM public.loct1
          ->  Hash
-               Output: bar.f1, bar.f2, bar.ctid
+               Output: bar.f2, bar.ctid, bar.f1
                ->  Seq Scan on public.bar
-                     Output: bar.f1, bar.f2, bar.ctid
+                     Output: bar.f2, bar.ctid, bar.f1
    ->  Merge Join
-         Output: bar_1.f1, (bar_1.f2 + 100), bar_1.f3, bar_1.ctid, (ROW(foo.f1))
+         Output: (bar_1.f2 + 100), bar_1.ctid, bar_1.*, (ROW(foo.f1))
          Merge Cond: (bar_1.f1 = foo.f1)
          ->  Sort
-               Output: bar_1.f1, bar_1.f2, bar_1.f3, bar_1.ctid
+               Output: bar_1.f2, bar_1.ctid, bar_1.*, bar_1.f1
                Sort Key: bar_1.f1
                ->  Foreign Scan on public.bar2 bar_1
-                     Output: bar_1.f1, bar_1.f2, bar_1.f3, bar_1.ctid
+                     Output: bar_1.f2, bar_1.ctid, bar_1.*, bar_1.f1
                      Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
          ->  Sort
                Output: (ROW(foo.f1)), foo.f1
@@ -7519,7 +7519,7 @@ update bar set f2 = f2 + 100 returning *;
    Update on public.bar
    Foreign Update on public.bar2 bar_1
    ->  Seq Scan on public.bar
-         Output: bar.f1, (bar.f2 + 100), bar.ctid
+         Output: (bar.f2 + 100), bar.ctid
    ->  Foreign Update on public.bar2 bar_1
          Remote SQL: UPDATE public.loct2 SET f2 = (f2 + 100) RETURNING f1, f2
 (8 rows)
@@ -7551,9 +7551,9 @@ update bar set f2 = f2 + 100;
    Foreign Update on public.bar2 bar_1
      Remote SQL: UPDATE public.loct2 SET f1 = $2, f2 = $3, f3 = $4 WHERE ctid = $1 RETURNING f1, f2, f3
    ->  Seq Scan on public.bar
-         Output: bar.f1, (bar.f2 + 100), bar.ctid
+         Output: (bar.f2 + 100), bar.ctid
    ->  Foreign Scan on public.bar2 bar_1
-         Output: bar_1.f1, (bar_1.f2 + 100), bar_1.f3, bar_1.ctid, bar_1.*
+         Output: (bar_1.f2 + 100), bar_1.ctid, bar_1.*
          Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
 (9 rows)

@@ -7622,10 +7622,10 @@ update parent set b = parent.b || remt2.b from remt2 where parent.a = remt2.a re
    Update on public.parent
    Foreign Update on public.remt1 parent_1
    ->  Nested Loop
-         Output: parent.a, (parent.b || remt2.b), parent.ctid, remt2.*, remt2.a, remt2.b
+         Output: (parent.b || remt2.b), parent.ctid, remt2.*, remt2.a, remt2.b
          Join Filter: (parent.a = remt2.a)
          ->  Seq Scan on public.parent
-               Output: parent.a, parent.b, parent.ctid
+               Output: parent.b, parent.ctid, parent.a
          ->  Foreign Scan on public.remt2
                Output: remt2.b, remt2.*, remt2.a
                Remote SQL: SELECT a, b FROM public.loct2
@@ -7880,7 +7880,7 @@ update utrtest set a = 1 where a = 1 or a = 2 returning *;
    ->  Foreign Update on public.remp utrtest_1
          Remote SQL: UPDATE public.loct SET a = 1 WHERE (((a = 1) OR (a = 2))) RETURNING a, b
    ->  Seq Scan on public.locp utrtest_2
-         Output: 1, utrtest_2.b, utrtest_2.ctid
+         Output: 1, utrtest_2.ctid
          Filter: ((utrtest_2.a = 1) OR (utrtest_2.a = 2))
 (9 rows)

@@ -7896,13 +7896,13 @@ insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition isn't a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 2 returning *;
-                   QUERY PLAN
-------------------------------------------------
+               QUERY PLAN
+-----------------------------------------
  Update on public.utrtest
    Output: utrtest_1.a, utrtest_1.b
    Update on public.locp utrtest_1
    ->  Seq Scan on public.locp utrtest_1
-         Output: 1, utrtest_1.b, utrtest_1.ctid
+         Output: 1, utrtest_1.ctid
          Filter: (utrtest_1.a = 2)
 (6 rows)

@@ -7932,7 +7932,7 @@ update utrtest set a = 1 returning *;
    ->  Foreign Update on public.remp utrtest_1
          Remote SQL: UPDATE public.loct SET a = 1 RETURNING a, b
    ->  Seq Scan on public.locp utrtest_2
-         Output: 1, utrtest_2.b, utrtest_2.ctid
+         Output: 1, utrtest_2.ctid
 (8 rows)

 update utrtest set a = 1 returning *;
@@ -7956,20 +7956,20 @@ update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
      Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
    Update on public.locp utrtest_2
    ->  Hash Join
-         Output: 1, utrtest_1.b, utrtest_1.ctid, "*VALUES*".*, "*VALUES*".column1
+         Output: 1, utrtest_1.ctid, utrtest_1.*, "*VALUES*".*, "*VALUES*".column1
          Hash Cond: (utrtest_1.a = "*VALUES*".column1)
          ->  Foreign Scan on public.remp utrtest_1
-               Output: utrtest_1.b, utrtest_1.ctid, utrtest_1.a
+               Output: utrtest_1.ctid, utrtest_1.*, utrtest_1.a
                Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
          ->  Hash
                Output: "*VALUES*".*, "*VALUES*".column1
                ->  Values Scan on "*VALUES*"
                      Output: "*VALUES*".*, "*VALUES*".column1
    ->  Hash Join
-         Output: 1, utrtest_2.b, utrtest_2.ctid, "*VALUES*".*, "*VALUES*".column1
+         Output: 1, utrtest_2.ctid, "*VALUES*".*, "*VALUES*".column1
          Hash Cond: (utrtest_2.a = "*VALUES*".column1)
          ->  Seq Scan on public.locp utrtest_2
-               Output: utrtest_2.b, utrtest_2.ctid, utrtest_2.a
+               Output: utrtest_2.ctid, utrtest_2.a
          ->  Hash
                Output: "*VALUES*".*, "*VALUES*".column1
                ->  Values Scan on "*VALUES*"
@@ -7977,12 +7977,7 @@ update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
 (24 rows)

 update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
- a |  b  | x
----+-----+---
- 1 | foo | 1
- 1 | qux | 2
-(2 rows)
-
+ERROR:  invalid attribute number 5
 -- Change the definition of utrtest so that the foreign partition get updated
 -- after the local partition
 delete from utrtest;
@@ -8005,7 +8000,7 @@ update utrtest set a = 3 returning *;
    Update on public.locp utrtest_1
    Foreign Update on public.remp utrtest_2
    ->  Seq Scan on public.locp utrtest_1
-         Output: 3, utrtest_1.b, utrtest_1.ctid
+         Output: 3, utrtest_1.ctid
    ->  Foreign Update on public.remp utrtest_2
          Remote SQL: UPDATE public.loct SET a = 3 RETURNING a, b
 (8 rows)
@@ -8023,19 +8018,19 @@ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
    Foreign Update on public.remp utrtest_2
      Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
    ->  Hash Join
-         Output: 3, utrtest_1.b, utrtest_1.ctid, "*VALUES*".*, "*VALUES*".column1
+         Output: 3, utrtest_1.ctid, "*VALUES*".*, "*VALUES*".column1
          Hash Cond: (utrtest_1.a = "*VALUES*".column1)
          ->  Seq Scan on public.locp utrtest_1
-               Output: utrtest_1.b, utrtest_1.ctid, utrtest_1.a
+               Output: utrtest_1.ctid, utrtest_1.a
          ->  Hash
                Output: "*VALUES*".*, "*VALUES*".column1
                ->  Values Scan on "*VALUES*"
                      Output: "*VALUES*".*, "*VALUES*".column1
    ->  Hash Join
-         Output: 3, utrtest_2.b, utrtest_2.ctid, "*VALUES*".*, "*VALUES*".column1
+         Output: 3, utrtest_2.ctid, utrtest_2.*, "*VALUES*".*, "*VALUES*".column1
          Hash Cond: (utrtest_2.a = "*VALUES*".column1)
          ->  Foreign Scan on public.remp utrtest_2
-               Output: utrtest_2.b, utrtest_2.ctid, utrtest_2.a
+               Output: utrtest_2.ctid, utrtest_2.*, utrtest_2.a
                Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
          ->  Hash
                Output: "*VALUES*".*, "*VALUES*".column1
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b48575c5..6ba6786c8b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2322,32 +2322,26 @@ postgresPlanDirectModify(PlannerInfo *root,
      */
     if (operation == CMD_UPDATE)
     {
-        int            col;
+        ListCell *lc, *lc2;

         /*
-         * We transmit only columns that were explicitly targets of the
-         * UPDATE, so as to avoid unnecessary data transmission.
+         * The expressions of concern are the first N columns of the subplan
+         * targetlist, where N is the length of root->update_colnos.
          */
-        col = -1;
-        while ((col = bms_next_member(rte->updatedCols, col)) >= 0)
+        targetAttrs = root->update_colnos;
+        forboth(lc, subplan->targetlist, lc2, targetAttrs)
         {
-            /* bit numbers are offset by FirstLowInvalidHeapAttributeNumber */
-            AttrNumber    attno = col + FirstLowInvalidHeapAttributeNumber;
-            TargetEntry *tle;
+            TargetEntry *tle = lfirst_node(TargetEntry, lc);
+            AttrNumber attno = lfirst_int(lc2);
+
+            /* update's new-value expressions shouldn't be resjunk */
+            Assert(!tle->resjunk);

             if (attno <= InvalidAttrNumber) /* shouldn't happen */
                 elog(ERROR, "system-column update is not supported");

-            tle = get_tle_by_resno(subplan->targetlist, attno);
-
-            if (!tle)
-                elog(ERROR, "attribute number %d not found in subplan targetlist",
-                     attno);
-
             if (!is_foreign_expr(root, foreignrel, (Expr *) tle->expr))
                 return false;
-
-            targetAttrs = lappend_int(targetAttrs, attno);
         }
     }

diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 04bc052ee8..6989957d50 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -703,10 +703,14 @@ ExecForeignUpdate(EState *estate,
      <literal>slot</literal> contains the new data for the tuple; it will match the
      row-type definition of the foreign table.
      <literal>planSlot</literal> contains the tuple that was generated by the
-     <structname>ModifyTable</structname> plan node's subplan; it differs from
-     <literal>slot</literal> in possibly containing additional <quote>junk</quote>
-     columns.  In particular, any junk columns that were requested by
-     <function>AddForeignUpdateTargets</function> will be available from this slot.
+     <structname>ModifyTable</structname> plan node's subplan.  Unlike
+     <literal>slot</literal>, this tuple contains only the new values for
+     columns changed by the query, so do not rely on attribute numbers of the
+     foreign table to index into <literal>planSlot</literal>.
+     Also, <literal>planSlot</literal> typically contains
+     additional <quote>junk</quote> columns.  In particular, any junk columns
+     that were requested by <function>AddForeignUpdateTargets</function> will
+     be available from this slot.
     </para>

     <para>
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 7383d5994e..a53070f602 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2724,20 +2724,22 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
         /*
          * In READ COMMITTED isolation level it's possible that target tuple
          * was changed due to concurrent update.  In that case we have a raw
-         * subplan output tuple in epqslot_candidate, and need to run it
-         * through the junk filter to produce an insertable tuple.
+         * subplan output tuple in epqslot_candidate, and need to form a new
+         * insertable tuple using ExecGetUpdateNewTuple to replace the one
+         * we received in newslot.  Neither we nor our callers have any
+         * further interest in the passed-in tuple, so it's okay to overwrite
+         * newslot with the newer data.
          *
-         * Caution: more than likely, the passed-in slot is the same as the
-         * junkfilter's output slot, so we are clobbering the original value
-         * of slottuple by doing the filtering.  This is OK since neither we
-         * nor our caller have any more interest in the prior contents of that
-         * slot.
+         * (Typically, newslot was also generated by ExecGetUpdateNewTuple, so
+         * that epqslot_clean will be that same slot and the copy step below
+         * is not needed.)
          */
         if (epqslot_candidate != NULL)
         {
             TupleTableSlot *epqslot_clean;

-            epqslot_clean = ExecFilterJunk(relinfo->ri_junkFilter, epqslot_candidate);
+            epqslot_clean = ExecGetUpdateNewTuple(relinfo, epqslot_candidate,
+                                                  oldslot);

             if (newslot != epqslot_clean)
                 ExecCopySlot(newslot, epqslot_clean);
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2e463f5499..a3937f3e66 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -477,6 +477,204 @@ ExecBuildProjectionInfo(List *targetList,
     return projInfo;
 }

+/*
+ *        ExecBuildUpdateProjection
+ *
+ * Build a ProjectionInfo node for constructing a new tuple during UPDATE.
+ * The projection will be executed in the given econtext and the result will
+ * be stored into the given tuple slot.  (Caller must have ensured that tuple
+ * slot has a descriptor matching the target rel!)
+ *
+ * subTargetList is the tlist of the subplan node feeding ModifyTable.
+ * We use this mainly to cross-check that the expressions being assigned
+ * are of the correct types.  The values from this tlist are assumed to be
+ * available from the "outer" tuple slot.  They are assigned to target columns
+ * listed in the corresponding targetColnos elements.  (Only non-resjunk tlist
+ * entries are assigned.)  Columns not listed in targetColnos are filled from
+ * the UPDATE's old tuple, which is assumed to be available in the "scan"
+ * tuple slot.
+ *
+ * relDesc must describe the relation we intend to update.
+ *
+ * This is basically a specialized variant of ExecBuildProjectionInfo.
+ * However, it also performs sanity checks equivalent to ExecCheckPlanOutput.
+ * Since we never make a normal tlist equivalent to the whole
+ * tuple-to-be-assigned, there is no convenient way to apply
+ * ExecCheckPlanOutput, so we must do our safety checks here.
+ */
+ProjectionInfo *
+ExecBuildUpdateProjection(List *subTargetList,
+                          List *targetColnos,
+                          TupleDesc relDesc,
+                          ExprContext *econtext,
+                          TupleTableSlot *slot,
+                          PlanState *parent)
+{
+    ProjectionInfo *projInfo = makeNode(ProjectionInfo);
+    ExprState  *state;
+    int nAssignableCols;
+    bool sawJunk;
+    Bitmapset*assignedCols;
+    LastAttnumInfo deform = {0, 0, 0};
+    ExprEvalStep scratch = {0};
+    int outerattnum;
+    ListCell   *lc, *lc2;
+
+    projInfo->pi_exprContext = econtext;
+    /* We embed ExprState into ProjectionInfo instead of doing extra palloc */
+    projInfo->pi_state.tag = T_ExprState;
+    state = &projInfo->pi_state;
+    state->expr = NULL;            /* not used */
+    state->parent = parent;
+    state->ext_params = NULL;
+
+    state->resultslot = slot;
+
+    /*
+     * Examine the subplan tlist to see how many non-junk columns there are,
+     * and to verify that the non-junk columns come before the junk ones.
+     */
+    nAssignableCols = 0;
+    sawJunk = false;
+    foreach(lc, subTargetList)
+    {
+        TargetEntry *tle = lfirst_node(TargetEntry, lc);
+
+        if (tle->resjunk)
+            sawJunk = true;
+        else
+        {
+            if (sawJunk)
+                elog(ERROR, "subplan target list is out of order");
+            nAssignableCols++;
+        }
+    }
+
+    /* We should have one targetColnos entry per non-junk column */
+    if (nAssignableCols != list_length(targetColnos))
+        elog(ERROR, "targetColnos does not match subplan target list");
+
+    /*
+     * Build a bitmapset of the columns in targetColnos.  (We could just
+     * use list_member_int() tests, but that risks O(N^2) behavior with
+     * many columns.)
+     */
+    assignedCols = NULL;
+    foreach(lc, targetColnos)
+    {
+        AttrNumber    targetattnum = lfirst_int(lc);
+
+        assignedCols = bms_add_member(assignedCols, targetattnum);
+    }
+
+    /*
+     * We want to insert EEOP_*_FETCHSOME steps to ensure the outer and scan
+     * tuples are sufficiently deconstructed.  Outer tuple is easy, but for
+     * scan tuple we must find out the last old column we need.
+     */
+    deform.last_outer = nAssignableCols;
+
+    for (int attnum = relDesc->natts; attnum > 0; attnum--)
+    {
+        Form_pg_attribute attr = TupleDescAttr(relDesc, attnum - 1);
+        if (attr->attisdropped)
+            continue;
+        if (bms_is_member(attnum, assignedCols))
+            continue;
+        deform.last_scan = attnum;
+        break;
+    }
+
+    ExecPushExprSlots(state, &deform);
+
+    /*
+     * Now generate code to fetch data from the outer tuple, incidentally
+     * validating that it'll be of the right type.  The checks above ensure
+     * that the forboth() will iterate over exactly the non-junk columns.
+     */
+    outerattnum = 0;
+    forboth(lc, subTargetList, lc2, targetColnos)
+    {
+        TargetEntry *tle = lfirst_node(TargetEntry, lc);
+        AttrNumber    targetattnum = lfirst_int(lc2);
+        Form_pg_attribute attr;
+
+        Assert(!tle->resjunk);
+
+        /*
+         * Apply sanity checks comparable to ExecCheckPlanOutput().
+         */
+        if (targetattnum <= 0 || targetattnum > relDesc->natts)
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("table row type and query-specified row type do not match"),
+                     errdetail("Query has too many columns.")));
+        attr = TupleDescAttr(relDesc, targetattnum - 1);
+
+        if (attr->attisdropped)
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("table row type and query-specified row type do not match"),
+                     errdetail("Query provides a value for a dropped column at ordinal position %d.",
+                               targetattnum)));
+        if (exprType((Node *) tle->expr) != attr->atttypid)
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("table row type and query-specified row type do not match"),
+                     errdetail("Table has type %s at ordinal position %d, but query expects %s.",
+                               format_type_be(attr->atttypid),
+                               targetattnum,
+                               format_type_be(exprType((Node *) tle->expr)))));
+
+        /*
+         * OK, build an outer-tuple reference.
+         */
+        scratch.opcode = EEOP_ASSIGN_OUTER_VAR;
+        scratch.d.assign_var.attnum = outerattnum++;
+        scratch.d.assign_var.resultnum = targetattnum - 1;
+        ExprEvalPushStep(state, &scratch);
+    }
+
+    /*
+     * Now generate code to copy over any old columns that were not assigned
+     * to, and to ensure that dropped columns are set to NULL.
+     */
+    for (int attnum = 1; attnum <= relDesc->natts; attnum++)
+    {
+        Form_pg_attribute attr = TupleDescAttr(relDesc, attnum - 1);
+
+        if (attr->attisdropped)
+        {
+            /* Put a null into the ExprState's resvalue/resnull ... */
+            scratch.opcode = EEOP_CONST;
+            scratch.resvalue = &state->resvalue;
+            scratch.resnull = &state->resnull;
+            scratch.d.constval.value = (Datum) 0;
+            scratch.d.constval.isnull = true;
+            ExprEvalPushStep(state, &scratch);
+            /* ... then assign it to the result slot */
+            scratch.opcode = EEOP_ASSIGN_TMP;
+            scratch.d.assign_tmp.resultnum = attnum - 1;
+            ExprEvalPushStep(state, &scratch);
+        }
+        else if (!bms_is_member(attnum, assignedCols))
+        {
+            /* Certainly the right type, so needn't check */
+            scratch.opcode = EEOP_ASSIGN_SCAN_VAR;
+            scratch.d.assign_var.attnum = attnum - 1;
+            scratch.d.assign_var.resultnum = attnum - 1;
+            ExprEvalPushStep(state, &scratch);
+        }
+    }
+
+    scratch.opcode = EEOP_DONE;
+    ExprEvalPushStep(state, &scratch);
+
+    ExecReadyExpr(state);
+
+    return projInfo;
+}
+
 /*
  * ExecPrepareExpr --- initialize for expression execution outside a normal
  * Plan tree context.
diff --git a/src/backend/executor/execJunk.c b/src/backend/executor/execJunk.c
index 970e1c325e..2e0bcbbede 100644
--- a/src/backend/executor/execJunk.c
+++ b/src/backend/executor/execJunk.c
@@ -59,43 +59,16 @@
 JunkFilter *
 ExecInitJunkFilter(List *targetList, TupleTableSlot *slot)
 {
+    JunkFilter *junkfilter;
     TupleDesc    cleanTupType;
+    int            cleanLength;
+    AttrNumber *cleanMap;

     /*
      * Compute the tuple descriptor for the cleaned tuple.
      */
     cleanTupType = ExecCleanTypeFromTL(targetList);

-    /*
-     * The rest is the same as ExecInitJunkFilterInsertion, ie, we want to map
-     * every non-junk targetlist column into the output tuple.
-     */
-    return ExecInitJunkFilterInsertion(targetList, cleanTupType, slot);
-}
-
-/*
- * ExecInitJunkFilterInsertion
- *
- * Initialize a JunkFilter for insertions into a table.
- *
- * Here, we are given the target "clean" tuple descriptor rather than
- * inferring it from the targetlist.  Although the target descriptor can
- * contain deleted columns, that is not of concern here, since the targetlist
- * should contain corresponding NULL constants (cf. ExecCheckPlanOutput).
- * It is assumed that the caller has checked that the table's columns match up
- * with the non-junk columns of the targetlist.
- */
-JunkFilter *
-ExecInitJunkFilterInsertion(List *targetList,
-                            TupleDesc cleanTupType,
-                            TupleTableSlot *slot)
-{
-    JunkFilter *junkfilter;
-    int            cleanLength;
-    AttrNumber *cleanMap;
-    ListCell   *t;
-    AttrNumber    cleanResno;
-
     /*
      * Use the given slot, or make a new slot if we weren't given one.
      */
@@ -117,6 +90,9 @@ ExecInitJunkFilterInsertion(List *targetList,
     cleanLength = cleanTupType->natts;
     if (cleanLength > 0)
     {
+        AttrNumber    cleanResno;
+        ListCell   *t;
+
         cleanMap = (AttrNumber *) palloc(cleanLength * sizeof(AttrNumber));
         cleanResno = 0;
         foreach(t, targetList)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 8de78ada63..ea1530e032 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1217,11 +1217,14 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
         resultRelInfo->ri_FdwRoutine = NULL;

     /* The following fields are set later if needed */
+    resultRelInfo->ri_RowIdAttNo = 0;
+    resultRelInfo->ri_projectNew = NULL;
+    resultRelInfo->ri_newTupleSlot = NULL;
+    resultRelInfo->ri_oldTupleSlot = NULL;
     resultRelInfo->ri_FdwState = NULL;
     resultRelInfo->ri_usesFdwDirectModify = false;
     resultRelInfo->ri_ConstraintExprs = NULL;
     resultRelInfo->ri_GeneratedExprs = NULL;
-    resultRelInfo->ri_junkFilter = NULL;
     resultRelInfo->ri_projectReturning = NULL;
     resultRelInfo->ri_onConflictArbiterIndexes = NIL;
     resultRelInfo->ri_onConflict = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2993ba43e3..b9064bfe66 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -81,7 +81,7 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
                                                ResultRelInfo **partRelInfo);

 /*
- * Verify that the tuples to be produced by INSERT or UPDATE match the
+ * Verify that the tuples to be produced by INSERT match the
  * target relation's rowtype
  *
  * We do this to guard against stale plans.  If plan invalidation is
@@ -91,6 +91,9 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
  *
  * The plan output is represented by its targetlist, because that makes
  * handling the dropped-column case easier.
+ *
+ * We used to use this for UPDATE as well, but now the equivalent checks
+ * are done in ExecBuildUpdateProjection.
  */
 static void
 ExecCheckPlanOutput(Relation resultRel, List *targetList)
@@ -104,8 +107,7 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
         TargetEntry *tle = (TargetEntry *) lfirst(lc);
         Form_pg_attribute attr;

-        if (tle->resjunk)
-            continue;            /* ignore junk tlist items */
+        Assert(!tle->resjunk);    /* caller removed junk items already */

         if (attno >= resultDesc->natts)
             ereport(ERROR,
@@ -367,6 +369,55 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
     MemoryContextSwitchTo(oldContext);
 }

+/*
+ * ExecGetInsertNewTuple
+ *        This prepares a "new" tuple ready to be inserted into given result
+ *        relation by removing any junk columns of the plan's output tuple.
+ *
+ * Note: currently, this is really dead code, because INSERT cases don't
+ * receive any junk columns so there's never a projection to be done.
+ */
+static TupleTableSlot *
+ExecGetInsertNewTuple(ResultRelInfo *relinfo,
+                      TupleTableSlot *planSlot)
+{
+    ProjectionInfo *newProj = relinfo->ri_projectNew;
+    ExprContext   *econtext;
+
+    if (newProj == NULL)
+        return planSlot;
+
+    econtext = newProj->pi_exprContext;
+    econtext->ecxt_outertuple = planSlot;
+    return ExecProject(newProj);
+}
+
+/*
+ * ExecGetUpdateNewTuple
+ *        This prepares a "new" tuple by combining an UPDATE subplan's output
+ *        tuple (which contains values of changed columns) with unchanged
+ *        columns taken from the old tuple.  The subplan tuple might also
+ *        contain junk columns, which are ignored.
+ */
+TupleTableSlot *
+ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
+                      TupleTableSlot *planSlot,
+                      TupleTableSlot *oldSlot)
+{
+    ProjectionInfo *newProj = relinfo->ri_projectNew;
+    ExprContext   *econtext;
+
+    Assert(newProj != NULL);
+    Assert(planSlot != NULL && !TTS_EMPTY(planSlot));
+    Assert(oldSlot != NULL && !TTS_EMPTY(oldSlot));
+
+    econtext = newProj->pi_exprContext;
+    econtext->ecxt_outertuple = planSlot;
+    econtext->ecxt_scantuple = oldSlot;
+    return ExecProject(newProj);
+}
+
+
 /* ----------------------------------------------------------------
  *        ExecInsert
  *
@@ -374,6 +425,10 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *        (or partition thereof) and insert appropriate tuples into the index
  *        relations.
  *
+ *        slot contains the new tuple value to be stored.
+ *        planSlot is the output of the ModifyTable's subplan; we use it
+ *        to access "junk" columns that are not going to be stored.
+ *
  *        Returns RETURNING result if any, otherwise NULL.
  *
  *        This may change the currently active tuple conversion map in
@@ -1194,7 +1249,9 @@ static bool
 ExecCrossPartitionUpdate(ModifyTableState *mtstate,
                          ResultRelInfo *resultRelInfo,
                          ItemPointer tupleid, HeapTuple oldtuple,
-                         TupleTableSlot *slot, TupleTableSlot *planSlot,
+                         TupleTableSlot *slot,
+                         TupleTableSlot *oldSlot,
+                         TupleTableSlot *planSlot,
                          EPQState *epqstate, bool canSetTag,
                          TupleTableSlot **retry_slot,
                          TupleTableSlot **inserted_tuple)
@@ -1269,7 +1326,15 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
             return true;
         else
         {
-            *retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+            /* Fetch the most recent version of old tuple. */
+            ExecClearTuple(oldSlot);
+            if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+                                               tupleid,
+                                               SnapshotAny,
+                                               oldSlot))
+                elog(ERROR, "failed to fetch tuple being updated");
+            *retry_slot = ExecGetUpdateNewTuple(resultRelInfo, epqslot,
+                                                oldSlot);
             return false;
         }
     }
@@ -1319,6 +1384,11 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
  *        foreign table triggers; it is NULL when the foreign table has
  *        no relevant triggers.
  *
+ *        slot contains the new tuple value to be stored, while oldSlot
+ *        contains the old tuple being replaced.  planSlot is the output
+ *        of the ModifyTable's subplan; we use it to access values from
+ *        other input tables (for RETURNING), row-ID junk columns, etc.
+ *
  *        Returns RETURNING result if any, otherwise NULL.
  * ----------------------------------------------------------------
  */
@@ -1328,6 +1398,7 @@ ExecUpdate(ModifyTableState *mtstate,
            ItemPointer tupleid,
            HeapTuple oldtuple,
            TupleTableSlot *slot,
+           TupleTableSlot *oldSlot,
            TupleTableSlot *planSlot,
            EPQState *epqstate,
            EState *estate,
@@ -1465,8 +1536,8 @@ lreplace:;
              * the tuple we're trying to move has been concurrently updated.
              */
             retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
-                                              oldtuple, slot, planSlot,
-                                              epqstate, canSetTag,
+                                              oldtuple, slot, oldSlot,
+                                              planSlot, epqstate, canSetTag,
                                               &retry_slot, &inserted_tuple);
             if (retry)
             {
@@ -1578,7 +1649,15 @@ lreplace:;
                                 /* Tuple not passing quals anymore, exiting... */
                                 return NULL;

-                            slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+                            /* Fetch the most recent version of old tuple. */
+                            ExecClearTuple(oldSlot);
+                            if (!table_tuple_fetch_row_version(resultRelationDesc,
+                                                               tupleid,
+                                                               SnapshotAny,
+                                                               oldSlot))
+                                elog(ERROR, "failed to fetch tuple being updated");
+                            slot = ExecGetUpdateNewTuple(resultRelInfo,
+                                                         epqslot, oldSlot);
                             goto lreplace;

                         case TM_Deleted:
@@ -1874,7 +1953,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
     /* Execute UPDATE with projection */
     *returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
                             resultRelInfo->ri_onConflict->oc_ProjSlot,
-                            planSlot,
+                            existing, planSlot,
                             &mtstate->mt_epqstate, mtstate->ps.state,
                             canSetTag);

@@ -2051,7 +2130,6 @@ ExecModifyTable(PlanState *pstate)
     CmdType        operation = node->operation;
     ResultRelInfo *resultRelInfo;
     PlanState  *subplanstate;
-    JunkFilter *junkfilter;
     TupleTableSlot *slot;
     TupleTableSlot *planSlot;
     ItemPointer tupleid;
@@ -2097,7 +2175,6 @@ ExecModifyTable(PlanState *pstate)
     /* Preload local variables */
     resultRelInfo = node->resultRelInfo + node->mt_whichplan;
     subplanstate = node->mt_plans[node->mt_whichplan];
-    junkfilter = resultRelInfo->ri_junkFilter;

     /*
      * Fetch rows from subplan(s), and execute the required table modification
@@ -2131,7 +2208,6 @@ ExecModifyTable(PlanState *pstate)
             {
                 resultRelInfo++;
                 subplanstate = node->mt_plans[node->mt_whichplan];
-                junkfilter = resultRelInfo->ri_junkFilter;
                 EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
                                     node->mt_arowmarks[node->mt_whichplan]);
                 continue;
@@ -2173,87 +2249,123 @@ ExecModifyTable(PlanState *pstate)

         tupleid = NULL;
         oldtuple = NULL;
-        if (junkfilter != NULL)
+
+        /*
+         * For UPDATE/DELETE, fetch the row identity info for the tuple to be
+         * updated/deleted.  For a heap relation, that's a TID; otherwise we
+         * may have a wholerow junk attr that carries the old tuple in toto.
+         * Keep this in step with the part of ExecInitModifyTable that sets
+         * up ri_RowIdAttNo.
+         */
+        if (operation == CMD_UPDATE || operation == CMD_DELETE)
         {
-            /*
-             * extract the 'ctid' or 'wholerow' junk attribute.
-             */
-            if (operation == CMD_UPDATE || operation == CMD_DELETE)
+            char        relkind;
+            Datum        datum;
+            bool        isNull;
+
+            relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
+            if (relkind == RELKIND_RELATION ||
+                relkind == RELKIND_MATVIEW ||
+                relkind == RELKIND_PARTITIONED_TABLE)
             {
-                char        relkind;
-                Datum        datum;
-                bool        isNull;
-
-                relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
-                if (relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW)
-                {
-                    datum = ExecGetJunkAttribute(slot,
-                                                 junkfilter->jf_junkAttNo,
-                                                 &isNull);
-                    /* shouldn't ever get a null result... */
-                    if (isNull)
-                        elog(ERROR, "ctid is NULL");
-
-                    tupleid = (ItemPointer) DatumGetPointer(datum);
-                    tuple_ctid = *tupleid;    /* be sure we don't free ctid!! */
-                    tupleid = &tuple_ctid;
-                }
-
-                /*
-                 * Use the wholerow attribute, when available, to reconstruct
-                 * the old relation tuple.
-                 *
-                 * Foreign table updates have a wholerow attribute when the
-                 * relation has a row-level trigger.  Note that the wholerow
-                 * attribute does not carry system columns.  Foreign table
-                 * triggers miss seeing those, except that we know enough here
-                 * to set t_tableOid.  Quite separately from this, the FDW may
-                 * fetch its own junk attrs to identify the row.
-                 *
-                 * Other relevant relkinds, currently limited to views, always
-                 * have a wholerow attribute.
-                 */
-                else if (AttributeNumberIsValid(junkfilter->jf_junkAttNo))
-                {
-                    datum = ExecGetJunkAttribute(slot,
-                                                 junkfilter->jf_junkAttNo,
-                                                 &isNull);
-                    /* shouldn't ever get a null result... */
-                    if (isNull)
-                        elog(ERROR, "wholerow is NULL");
-
-                    oldtupdata.t_data = DatumGetHeapTupleHeader(datum);
-                    oldtupdata.t_len =
-                        HeapTupleHeaderGetDatumLength(oldtupdata.t_data);
-                    ItemPointerSetInvalid(&(oldtupdata.t_self));
-                    /* Historically, view triggers see invalid t_tableOid. */
-                    oldtupdata.t_tableOid =
-                        (relkind == RELKIND_VIEW) ? InvalidOid :
-                        RelationGetRelid(resultRelInfo->ri_RelationDesc);
-
-                    oldtuple = &oldtupdata;
-                }
-                else
-                    Assert(relkind == RELKIND_FOREIGN_TABLE);
+                /* ri_RowIdAttNo refers to a ctid attribute */
+                Assert(AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo));
+                datum = ExecGetJunkAttribute(slot,
+                                             resultRelInfo->ri_RowIdAttNo,
+                                             &isNull);
+                /* shouldn't ever get a null result... */
+                if (isNull)
+                    elog(ERROR, "ctid is NULL");
+
+                tupleid = (ItemPointer) DatumGetPointer(datum);
+                tuple_ctid = *tupleid;    /* be sure we don't free ctid!! */
+                tupleid = &tuple_ctid;
             }

             /*
-             * apply the junkfilter if needed.
+             * Use the wholerow attribute, when available, to reconstruct the
+             * old relation tuple.  The old tuple serves one or both of two
+             * purposes: 1) it serves as the OLD tuple for row triggers, 2) it
+             * provides values for any unchanged columns for the NEW tuple of
+             * an UPDATE, because the subplan does not produce all the columns
+             * of the target table.
+             *
+             * Note that the wholerow attribute does not carry system columns,
+             * so foreign table triggers miss seeing those, except that we
+             * know enough here to set t_tableOid.  Quite separately from
+             * this, the FDW may fetch its own junk attrs to identify the row.
+             *
+             * Other relevant relkinds, currently limited to views, always
+             * have a wholerow attribute.
              */
-            if (operation != CMD_DELETE)
-                slot = ExecFilterJunk(junkfilter, slot);
+            else if (AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
+            {
+                datum = ExecGetJunkAttribute(slot,
+                                             resultRelInfo->ri_RowIdAttNo,
+                                             &isNull);
+                /* shouldn't ever get a null result... */
+                if (isNull)
+                    elog(ERROR, "wholerow is NULL");
+
+                oldtupdata.t_data = DatumGetHeapTupleHeader(datum);
+                oldtupdata.t_len =
+                    HeapTupleHeaderGetDatumLength(oldtupdata.t_data);
+                ItemPointerSetInvalid(&(oldtupdata.t_self));
+                /* Historically, view triggers see invalid t_tableOid. */
+                oldtupdata.t_tableOid =
+                    (relkind == RELKIND_VIEW) ? InvalidOid :
+                    RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+                oldtuple = &oldtupdata;
+            }
+            else
+            {
+                /* Only foreign tables are allowed to omit a row-ID attr */
+                Assert(relkind == RELKIND_FOREIGN_TABLE);
+            }
         }

         switch (operation)
         {
             case CMD_INSERT:
+                slot = ExecGetInsertNewTuple(resultRelInfo, planSlot);
                 slot = ExecInsert(node, resultRelInfo, slot, planSlot,
                                   estate, node->canSetTag);
                 break;
             case CMD_UPDATE:
-                slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
-                                  planSlot, &node->mt_epqstate, estate,
-                                  node->canSetTag);
+                {
+                    TupleTableSlot *oldSlot = resultRelInfo->ri_oldTupleSlot;
+
+                    /*
+                     * Make the new tuple by combining plan's output tuple
+                     * with the old tuple being updated.
+                     */
+                    ExecClearTuple(oldSlot);
+                    if (oldtuple != NULL)
+                    {
+                        /* Foreign table update, store the wholerow attr. */
+                        ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
+                    }
+                    else
+                    {
+                        /* Fetch the most recent version of old tuple. */
+                        Relation    relation = resultRelInfo->ri_RelationDesc;
+
+                        Assert(tupleid != NULL);
+                        if (!table_tuple_fetch_row_version(relation, tupleid,
+                                                           SnapshotAny,
+                                                           oldSlot))
+                            elog(ERROR, "failed to fetch tuple being updated");
+                    }
+                    slot = ExecGetUpdateNewTuple(resultRelInfo, planSlot,
+                                                 oldSlot);
+
+                    /* Now apply the update. */
+                    slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple,
+                                      slot, oldSlot, planSlot,
+                                      &node->mt_epqstate, estate,
+                                      node->canSetTag);
+                }
                 break;
             case CMD_DELETE:
                 slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
@@ -2679,117 +2791,143 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
                         mtstate->mt_arowmarks[0]);

     /*
-     * Initialize the junk filter(s) if needed.  INSERT queries need a filter
-     * if there are any junk attrs in the tlist.  UPDATE and DELETE always
-     * need a filter, since there's always at least one junk attribute present
-     * --- no need to look first.  Typically, this will be a 'ctid' or
-     * 'wholerow' attribute, but in the case of a foreign data wrapper it
-     * might be a set of junk attributes sufficient to identify the remote
-     * row.
+     * Initialize projection(s) to create tuples suitable for result rel(s).
+     * INSERT queries may need a projection to filter out junk attrs in the
+     * tlist.  UPDATE always needs a projection, because (1) there's always
+     * some junk attrs, and (2) we may need to merge values of not-updated
+     * columns from the old tuple into the final tuple.  In UPDATE, the tuple
+     * arriving from the subplan contains only new values for the changed
+     * columns, plus row identity info in the junk attrs.
      *
-     * If there are multiple result relations, each one needs its own junk
-     * filter.  Note multiple rels are only possible for UPDATE/DELETE, so we
-     * can't be fooled by some needing a filter and some not.
+     * If there are multiple result relations, each one needs its own
+     * projection.  Note multiple rels are only possible for UPDATE/DELETE, so
+     * we can't be fooled by some needing a filter and some not.
      *
      * This section of code is also a convenient place to verify that the
      * output of an INSERT or UPDATE matches the target table(s).
      */
+    for (i = 0; i < nplans; i++)
     {
-        bool        junk_filter_needed = false;
+        resultRelInfo = &mtstate->resultRelInfo[i];
+        subplan = mtstate->mt_plans[i]->plan;

-        switch (operation)
+        /*
+         * Prepare to generate tuples suitable for the target relation.
+         */
+        if (operation == CMD_INSERT)
         {
-            case CMD_INSERT:
-                foreach(l, subplan->targetlist)
-                {
-                    TargetEntry *tle = (TargetEntry *) lfirst(l);
+            List       *insertTargetList = NIL;
+            bool        need_projection = false;
+            foreach(l, subplan->targetlist)
+            {
+                TargetEntry *tle = (TargetEntry *) lfirst(l);

-                    if (tle->resjunk)
-                    {
-                        junk_filter_needed = true;
-                        break;
-                    }
-                }
-                break;
-            case CMD_UPDATE:
-            case CMD_DELETE:
-                junk_filter_needed = true;
-                break;
-            default:
-                elog(ERROR, "unknown operation");
-                break;
-        }
+                if (!tle->resjunk)
+                    insertTargetList = lappend(insertTargetList, tle);
+                else
+                    need_projection = true;
+            }
+            if (need_projection)
+            {
+                TupleDesc    relDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
+
+                resultRelInfo->ri_newTupleSlot =
+                    table_slot_create(resultRelInfo->ri_RelationDesc,
+                                      &mtstate->ps.state->es_tupleTable);
+
+                /* need an expression context to do the projection */
+                if (mtstate->ps.ps_ExprContext == NULL)
+                    ExecAssignExprContext(estate, &mtstate->ps);
+
+                resultRelInfo->ri_projectNew =
+                    ExecBuildProjectionInfo(insertTargetList,
+                                            mtstate->ps.ps_ExprContext,
+                                            resultRelInfo->ri_newTupleSlot,
+                                            &mtstate->ps,
+                                            relDesc);
+            }

-        if (junk_filter_needed)
+            /*
+             * The junk-free list must produce a tuple suitable for the result
+             * relation.
+             */
+            ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
+                                insertTargetList);
+        }
+        else if (operation == CMD_UPDATE)
         {
-            resultRelInfo = mtstate->resultRelInfo;
-            for (i = 0; i < nplans; i++)
-            {
-                JunkFilter *j;
-                TupleTableSlot *junkresslot;
+            List       *updateColnos;
+            TupleDesc    relDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
+
+            updateColnos = (List *) list_nth(node->updateColnosLists, i);

-                subplan = mtstate->mt_plans[i]->plan;
+            /*
+             * For UPDATE, we use the old tuple to fill up missing values in
+             * the tuple produced by the plan to get the new tuple.
+             */
+            resultRelInfo->ri_oldTupleSlot =
+                table_slot_create(resultRelInfo->ri_RelationDesc,
+                                  &mtstate->ps.state->es_tupleTable);
+            resultRelInfo->ri_newTupleSlot =
+                table_slot_create(resultRelInfo->ri_RelationDesc,
+                                  &mtstate->ps.state->es_tupleTable);
+
+            /* need an expression context to do the projection */
+            if (mtstate->ps.ps_ExprContext == NULL)
+                ExecAssignExprContext(estate, &mtstate->ps);
+
+            resultRelInfo->ri_projectNew =
+                ExecBuildUpdateProjection(subplan->targetlist,
+                                          updateColnos,
+                                          relDesc,
+                                          mtstate->ps.ps_ExprContext,
+                                          resultRelInfo->ri_newTupleSlot,
+                                          &mtstate->ps);
+        }

-                junkresslot =
-                    ExecInitExtraTupleSlot(estate, NULL,
-                                           table_slot_callbacks(resultRelInfo->ri_RelationDesc));
+        /*
+         * For UPDATE/DELETE, find the appropriate junk attr now, either a
+         * 'ctid' or 'wholerow' attribute depending on relkind.  For foreign
+         * tables, the FDW might have created additional junk attr(s), but
+         * those are no concern of ours.
+         */
+        if (operation == CMD_UPDATE || operation == CMD_DELETE)
+        {
+            char    relkind;

+            relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
+            if (relkind == RELKIND_RELATION ||
+                relkind == RELKIND_MATVIEW ||
+                relkind == RELKIND_PARTITIONED_TABLE)
+            {
+                resultRelInfo->ri_RowIdAttNo =
+                    ExecFindJunkAttributeInTlist(subplan->targetlist, "ctid");
+                if (!AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
+                    elog(ERROR, "could not find junk ctid column");
+            }
+            else if (relkind == RELKIND_FOREIGN_TABLE)
+            {
                 /*
-                 * For an INSERT or UPDATE, the result tuple must always match
-                 * the target table's descriptor.  For a DELETE, it won't
-                 * (indeed, there's probably no non-junk output columns).
+                 * When there is a row-level trigger, there should be a
+                 * wholerow attribute.  We also require it to be present in
+                 * UPDATE, so we can get the values of unchanged columns.
                  */
-                if (operation == CMD_INSERT || operation == CMD_UPDATE)
-                {
-                    ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
-                                        subplan->targetlist);
-                    j = ExecInitJunkFilterInsertion(subplan->targetlist,
-                                                    RelationGetDescr(resultRelInfo->ri_RelationDesc),
-                                                    junkresslot);
-                }
-                else
-                    j = ExecInitJunkFilter(subplan->targetlist,
-                                           junkresslot);
-
-                if (operation == CMD_UPDATE || operation == CMD_DELETE)
-                {
-                    /* For UPDATE/DELETE, find the appropriate junk attr now */
-                    char        relkind;
-
-                    relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
-                    if (relkind == RELKIND_RELATION ||
-                        relkind == RELKIND_MATVIEW ||
-                        relkind == RELKIND_PARTITIONED_TABLE)
-                    {
-                        j->jf_junkAttNo = ExecFindJunkAttribute(j, "ctid");
-                        if (!AttributeNumberIsValid(j->jf_junkAttNo))
-                            elog(ERROR, "could not find junk ctid column");
-                    }
-                    else if (relkind == RELKIND_FOREIGN_TABLE)
-                    {
-                        /*
-                         * When there is a row-level trigger, there should be
-                         * a wholerow attribute.
-                         */
-                        j->jf_junkAttNo = ExecFindJunkAttribute(j, "wholerow");
-                    }
-                    else
-                    {
-                        j->jf_junkAttNo = ExecFindJunkAttribute(j, "wholerow");
-                        if (!AttributeNumberIsValid(j->jf_junkAttNo))
-                            elog(ERROR, "could not find junk wholerow column");
-                    }
-                }
-
-                resultRelInfo->ri_junkFilter = j;
-                resultRelInfo++;
+                resultRelInfo->ri_RowIdAttNo =
+                    ExecFindJunkAttributeInTlist(subplan->targetlist,
+                                                 "wholerow");
+                if (mtstate->operation == CMD_UPDATE &&
+                    !AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
+                    elog(ERROR, "could not find junk wholerow column");
+            }
+            else
+            {
+                /* Other valid target relkinds must provide wholerow */
+                resultRelInfo->ri_RowIdAttNo =
+                    ExecFindJunkAttributeInTlist(subplan->targetlist,
+                                                 "wholerow");
+                if (!AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
+                    elog(ERROR, "could not find junk wholerow column");
             }
-        }
-        else
-        {
-            if (operation == CMD_INSERT)
-                ExecCheckPlanOutput(mtstate->resultRelInfo->ri_RelationDesc,
-                                    subplan->targetlist);
         }
     }

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 38b56231b7..1ec586729b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -207,6 +207,7 @@ _copyModifyTable(const ModifyTable *from)
     COPY_SCALAR_FIELD(partColsUpdated);
     COPY_NODE_FIELD(resultRelations);
     COPY_NODE_FIELD(plans);
+    COPY_NODE_FIELD(updateColnosLists);
     COPY_NODE_FIELD(withCheckOptionLists);
     COPY_NODE_FIELD(returningLists);
     COPY_NODE_FIELD(fdwPrivLists);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9f7918c7e9..99fb38c05a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -408,6 +408,7 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
     WRITE_BOOL_FIELD(partColsUpdated);
     WRITE_NODE_FIELD(resultRelations);
     WRITE_NODE_FIELD(plans);
+    WRITE_NODE_FIELD(updateColnosLists);
     WRITE_NODE_FIELD(withCheckOptionLists);
     WRITE_NODE_FIELD(returningLists);
     WRITE_NODE_FIELD(fdwPrivLists);
@@ -2143,6 +2144,7 @@ _outModifyTablePath(StringInfo str, const ModifyTablePath *node)
     WRITE_NODE_FIELD(resultRelations);
     WRITE_NODE_FIELD(subpaths);
     WRITE_NODE_FIELD(subroots);
+    WRITE_NODE_FIELD(updateColnosLists);
     WRITE_NODE_FIELD(withCheckOptionLists);
     WRITE_NODE_FIELD(returningLists);
     WRITE_NODE_FIELD(rowMarks);
@@ -2268,12 +2270,12 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
     WRITE_NODE_FIELD(distinct_pathkeys);
     WRITE_NODE_FIELD(sort_pathkeys);
     WRITE_NODE_FIELD(processed_tlist);
+    WRITE_NODE_FIELD(update_colnos);
     WRITE_NODE_FIELD(minmax_aggs);
     WRITE_FLOAT_FIELD(total_table_pages, "%.0f");
     WRITE_FLOAT_FIELD(tuple_fraction, "%.4f");
     WRITE_FLOAT_FIELD(limit_tuples, "%.0f");
     WRITE_UINT_FIELD(qual_security_level);
-    WRITE_ENUM_FIELD(inhTargetKind, InheritanceKind);
     WRITE_BOOL_FIELD(hasJoinRTEs);
     WRITE_BOOL_FIELD(hasLateralRTEs);
     WRITE_BOOL_FIELD(hasHavingQual);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 377185f7c6..0b6331d3da 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1683,6 +1683,7 @@ _readModifyTable(void)
     READ_BOOL_FIELD(partColsUpdated);
     READ_NODE_FIELD(resultRelations);
     READ_NODE_FIELD(plans);
+    READ_NODE_FIELD(updateColnosLists);
     READ_NODE_FIELD(withCheckOptionLists);
     READ_NODE_FIELD(returningLists);
     READ_NODE_FIELD(fdwPrivLists);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 906cab7053..4bb482879f 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -302,6 +302,7 @@ static ModifyTable *make_modifytable(PlannerInfo *root,
                                      Index nominalRelation, Index rootRelation,
                                      bool partColsUpdated,
                                      List *resultRelations, List *subplans, List *subroots,
+                                     List *updateColnosLists,
                                      List *withCheckOptionLists, List *returningLists,
                                      List *rowMarks, OnConflictExpr *onconflict, int epqParam);
 static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
@@ -2642,7 +2643,8 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)
     ModifyTable *plan;
     List       *subplans = NIL;
     ListCell   *subpaths,
-               *subroots;
+               *subroots,
+               *lc;

     /* Build the plan for each input path */
     forboth(subpaths, best_path->subpaths,
@@ -2665,9 +2667,6 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)
          */
         subplan = create_plan_recurse(subroot, subpath, CP_EXACT_TLIST);

-        /* Transfer resname/resjunk labeling, too, to keep executor happy */
-        apply_tlist_labeling(subplan->targetlist, subroot->processed_tlist);
-
         subplans = lappend(subplans, subplan);
     }

@@ -2680,6 +2679,7 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)
                             best_path->resultRelations,
                             subplans,
                             best_path->subroots,
+                            best_path->updateColnosLists,
                             best_path->withCheckOptionLists,
                             best_path->returningLists,
                             best_path->rowMarks,
@@ -2688,6 +2688,41 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)

     copy_generic_path_info(&plan->plan, &best_path->path);

+    forboth(lc, subplans,
+            subroots, best_path->subroots)
+    {
+        Plan       *subplan = (Plan *) lfirst(lc);
+        PlannerInfo *subroot = (PlannerInfo *) lfirst(subroots);
+
+        /*
+         * Fix up the resnos of query's TLEs to make them match their ordinal
+         * position in the list, which they may not in the case of an UPDATE.
+         * It's safe to revise that targetlist now, because nothing after this
+         * point needs those resnos to match target relation's attribute
+         * numbers.
+         * XXX - we do this simply because apply_tlist_labeling() asserts that
+         * resnos in processed_tlist and resnos in subplan targetlist are
+         * exactly same, but maybe we can just remove the assert?
+         */
+        if (plan->operation == CMD_UPDATE)
+        {
+            ListCell   *l;
+            AttrNumber    resno = 1;
+
+            foreach(l, subroot->processed_tlist)
+            {
+                TargetEntry *tle = lfirst(l);
+
+                tle = flatCopyTargetEntry(tle);
+                tle->resno = resno++;
+                lfirst(l) = tle;
+            }
+        }
+
+        /* Transfer resname/resjunk labeling, too, to keep executor happy */
+        apply_tlist_labeling(subplan->targetlist, subroot->processed_tlist);
+    }
+
     return plan;
 }

@@ -6880,6 +6915,7 @@ make_modifytable(PlannerInfo *root,
                  Index nominalRelation, Index rootRelation,
                  bool partColsUpdated,
                  List *resultRelations, List *subplans, List *subroots,
+                 List *updateColnosLists,
                  List *withCheckOptionLists, List *returningLists,
                  List *rowMarks, OnConflictExpr *onconflict, int epqParam)
 {
@@ -6892,6 +6928,9 @@ make_modifytable(PlannerInfo *root,

     Assert(list_length(resultRelations) == list_length(subplans));
     Assert(list_length(resultRelations) == list_length(subroots));
+    Assert(operation == CMD_UPDATE ?
+           list_length(resultRelations) == list_length(updateColnosLists) :
+           updateColnosLists == NIL);
     Assert(withCheckOptionLists == NIL ||
            list_length(resultRelations) == list_length(withCheckOptionLists));
     Assert(returningLists == NIL ||
@@ -6936,6 +6975,7 @@ make_modifytable(PlannerInfo *root,
         node->exclRelRTI = onconflict->exclRelIndex;
         node->exclRelTlist = onconflict->exclRelTlist;
     }
+    node->updateColnosLists = updateColnosLists;
     node->withCheckOptionLists = withCheckOptionLists;
     node->returningLists = returningLists;
     node->rowMarks = rowMarks;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f529d107d2..ccb9166a8e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -620,6 +620,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
     memset(root->upper_rels, 0, sizeof(root->upper_rels));
     memset(root->upper_targets, 0, sizeof(root->upper_targets));
     root->processed_tlist = NIL;
+    root->update_colnos = NIL;
     root->grouping_map = NULL;
     root->minmax_aggs = NIL;
     root->qual_security_level = 0;
@@ -1222,6 +1223,7 @@ inheritance_planner(PlannerInfo *root)
     List       *subpaths = NIL;
     List       *subroots = NIL;
     List       *resultRelations = NIL;
+    List       *updateColnosLists = NIL;
     List       *withCheckOptionLists = NIL;
     List       *returningLists = NIL;
     List       *rowMarks;
@@ -1687,6 +1689,11 @@ inheritance_planner(PlannerInfo *root)
         /* Build list of target-relation RT indexes */
         resultRelations = lappend_int(resultRelations, appinfo->child_relid);

+        /* Accumulate lists of UPDATE target columns */
+        if (parse->commandType == CMD_UPDATE)
+            updateColnosLists = lappend(updateColnosLists,
+                                        subroot->update_colnos);
+
         /* Build lists of per-relation WCO and RETURNING targetlists */
         if (parse->withCheckOptions)
             withCheckOptionLists = lappend(withCheckOptionLists,
@@ -1732,6 +1739,9 @@ inheritance_planner(PlannerInfo *root)
         subpaths = list_make1(dummy_path);
         subroots = list_make1(root);
         resultRelations = list_make1_int(parse->resultRelation);
+        if (parse->commandType == CMD_UPDATE)
+            updateColnosLists = lappend(updateColnosLists,
+                                        root->update_colnos);
         if (parse->withCheckOptions)
             withCheckOptionLists = list_make1(parse->withCheckOptions);
         if (parse->returningList)
@@ -1788,6 +1798,7 @@ inheritance_planner(PlannerInfo *root)
                                      resultRelations,
                                      subpaths,
                                      subroots,
+                                     updateColnosLists,
                                      withCheckOptionLists,
                                      returningLists,
                                      rowMarks,
@@ -2313,6 +2324,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
         if (parse->commandType != CMD_SELECT && !inheritance_update)
         {
             Index        rootRelation;
+            List *updateColnosLists;
             List       *withCheckOptionLists;
             List       *returningLists;
             List       *rowMarks;
@@ -2327,6 +2339,12 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
             else
                 rootRelation = 0;

+            /* Set up the UPDATE target columns list-of-lists, if needed. */
+            if (parse->commandType == CMD_UPDATE)
+                updateColnosLists = list_make1(root->update_colnos);
+            else
+                updateColnosLists = NIL;
+
             /*
              * Set up the WITH CHECK OPTION and RETURNING lists-of-lists, if
              * needed.
@@ -2361,6 +2379,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
                                         list_make1_int(parse->resultRelation),
                                         list_make1(path),
                                         list_make1(root),
+                                        updateColnosLists,
                                         withCheckOptionLists,
                                         returningLists,
                                         rowMarks,
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index d961592e01..e18553ac7c 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -925,6 +925,7 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
     memset(subroot->upper_rels, 0, sizeof(subroot->upper_rels));
     memset(subroot->upper_targets, 0, sizeof(subroot->upper_targets));
     subroot->processed_tlist = NIL;
+    subroot->update_colnos = NIL;
     subroot->grouping_map = NULL;
     subroot->minmax_aggs = NIL;
     subroot->qual_security_level = 0;
diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 23f9f861f4..488e8cfd4d 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -3,13 +3,19 @@
  * preptlist.c
  *      Routines to preprocess the parse tree target list
  *
- * For INSERT and UPDATE queries, the targetlist must contain an entry for
- * each attribute of the target relation in the correct order.  For UPDATE and
- * DELETE queries, it must also contain junk tlist entries needed to allow the
- * executor to identify the rows to be updated or deleted.  For all query
- * types, we may need to add junk tlist entries for Vars used in the RETURNING
- * list and row ID information needed for SELECT FOR UPDATE locking and/or
- * EvalPlanQual checking.
+ * For an INSERT, the targetlist must contain an entry for each attribute of
+ * the target relation in the correct order.
+ *
+ * For an UPDATE, the targetlist just contains the expressions for the new
+ * column values.
+ *
+ * For UPDATE and DELETE queries, the targetlist must also contain "junk"
+ * tlist entries needed to allow the executor to identify the rows to be
+ * updated or deleted; for example, the ctid of a heap row.
+ *
+ * For all query types, there can be additional junk tlist entries, such as
+ * sort keys, Vars needed for a RETURNING list, and row ID information needed
+ * for SELECT FOR UPDATE locking and/or EvalPlanQual checking.
  *
  * The query rewrite phase also does preprocessing of the targetlist (see
  * rewriteTargetListIU).  The division of labor between here and there is
@@ -52,6 +58,7 @@
 #include "rewrite/rewriteHandler.h"
 #include "utils/rel.h"

+static List *make_update_colnos(List *tlist);
 static List *expand_targetlist(List *tlist, int command_type,
                                Index result_relation, Relation rel);

@@ -63,7 +70,8 @@ static List *expand_targetlist(List *tlist, int command_type,
  *      Returns the new targetlist.
  *
  * As a side effect, if there's an ON CONFLICT UPDATE clause, its targetlist
- * is also preprocessed (and updated in-place).
+ * is also preprocessed (and updated in-place).  Also, if this is an UPDATE,
+ * we return a list of target column numbers in root->update_colnos.
  */
 List *
 preprocess_targetlist(PlannerInfo *root)
@@ -108,14 +116,19 @@ preprocess_targetlist(PlannerInfo *root)
         rewriteTargetListUD(parse, target_rte, target_relation);

     /*
-     * for heap_form_tuple to work, the targetlist must match the exact order
-     * of the attributes. We also need to fill in any missing attributes. -ay
-     * 10/94
+     * In an INSERT, the executor expects the targetlist to match the exact
+     * order of the target table's attributes, including entries for
+     * attributes not mentioned in the source query.
+     *
+     * In an UPDATE, we don't rearrange the tlist order, but we need to make a
+     * separate list of the target attribute numbers, in tlist order.
      */
     tlist = parse->targetList;
-    if (command_type == CMD_INSERT || command_type == CMD_UPDATE)
+    if (command_type == CMD_INSERT)
         tlist = expand_targetlist(tlist, command_type,
                                   result_relation, target_relation);
+    else if (command_type == CMD_UPDATE)
+        root->update_colnos = make_update_colnos(tlist);

     /*
      * Add necessary junk columns for rowmarked rels.  These values are needed
@@ -239,6 +252,29 @@ preprocess_targetlist(PlannerInfo *root)
     return tlist;
 }

+/*
+ * make_update_colnos
+ *         Extract a list of the target-table column numbers that
+ *         an UPDATE's targetlist wants to assign to.
+ *
+ * We just need to capture the resno's of the non-junk tlist entries.
+ */
+static List *
+make_update_colnos(List *tlist)
+{
+    List*update_colnos = NIL;
+    ListCell *lc;
+
+    foreach(lc, tlist)
+    {
+        TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+        if (!tle->resjunk)
+            update_colnos = lappend_int(update_colnos, tle->resno);
+    }
+    return update_colnos;
+}
+

 /*****************************************************************************
  *
@@ -251,6 +287,10 @@ preprocess_targetlist(PlannerInfo *root)
  *      Given a target list as generated by the parser and a result relation,
  *      add targetlist entries for any missing attributes, and ensure the
  *      non-junk attributes appear in proper field order.
+ *
+ * command_type is a bit of an archaism now: it's CMD_INSERT when we're
+ * processing an INSERT, all right, but the only other use of this function
+ * is for ON CONFLICT UPDATE tlists, for which command_type is CMD_UPDATE.
  */
 static List *
 expand_targetlist(List *tlist, int command_type,
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 69b83071cf..a97929c13f 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3548,6 +3548,8 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
  * 'resultRelations' is an integer list of actual RT indexes of target rel(s)
  * 'subpaths' is a list of Path(s) producing source data (one per rel)
  * 'subroots' is a list of PlannerInfo structs (one per rel)
+ * 'updateColnosLists' is a list of UPDATE target column number lists
+ *        (one sublist per rel); or NIL if not an UPDATE
  * 'withCheckOptionLists' is a list of WCO lists (one per rel)
  * 'returningLists' is a list of RETURNING tlists (one per rel)
  * 'rowMarks' is a list of PlanRowMarks (non-locking only)
@@ -3561,6 +3563,7 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
                         bool partColsUpdated,
                         List *resultRelations, List *subpaths,
                         List *subroots,
+                        List *updateColnosLists,
                         List *withCheckOptionLists, List *returningLists,
                         List *rowMarks, OnConflictExpr *onconflict,
                         int epqParam)
@@ -3571,6 +3574,9 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,

     Assert(list_length(resultRelations) == list_length(subpaths));
     Assert(list_length(resultRelations) == list_length(subroots));
+    Assert(operation == CMD_UPDATE ?
+           list_length(resultRelations) == list_length(updateColnosLists) :
+           updateColnosLists == NIL);
     Assert(withCheckOptionLists == NIL ||
            list_length(resultRelations) == list_length(withCheckOptionLists));
     Assert(returningLists == NIL ||
@@ -3633,6 +3639,7 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
     pathnode->resultRelations = resultRelations;
     pathnode->subpaths = subpaths;
     pathnode->subroots = subroots;
+    pathnode->updateColnosLists = updateColnosLists;
     pathnode->withCheckOptionLists = withCheckOptionLists;
     pathnode->returningLists = returningLists;
     pathnode->rowMarks = rowMarks;
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 0672f497c6..f9175987f8 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1659,17 +1659,21 @@ rewriteTargetListUD(Query *parsetree, RangeTblEntry *target_rte,
                                                 target_relation);

         /*
-         * If we have a row-level trigger corresponding to the operation, emit
-         * a whole-row Var so that executor will have the "old" row to pass to
-         * the trigger.  Alas, this misses system columns.
+         * For UPDATE, we need to make the FDW fetch unchanged columns by
+         * asking it to fetch a whole-row Var.  That's because the top-level
+         * targetlist only contains entries for changed columns.  (Actually,
+         * we only really need this for UPDATEs that are not pushed to the
+         * remote side, but it's hard to tell if that will be the case at the
+         * point when this function is called.)
+         *
+         * We will also need the whole row if there are any row triggers, so
+         * that the executor will have the "old" row to pass to the trigger.
+         * Alas, this misses system columns.
          */
-        if (target_relation->trigdesc &&
-            ((parsetree->commandType == CMD_UPDATE &&
-              (target_relation->trigdesc->trig_update_after_row ||
-               target_relation->trigdesc->trig_update_before_row)) ||
-             (parsetree->commandType == CMD_DELETE &&
-              (target_relation->trigdesc->trig_delete_after_row ||
-               target_relation->trigdesc->trig_delete_before_row))))
+        if (parsetree->commandType == CMD_UPDATE ||
+            (target_relation->trigdesc &&
+             (target_relation->trigdesc->trig_delete_after_row ||
+              target_relation->trigdesc->trig_delete_before_row)))
         {
             var = makeWholeRowVar(target_rte,
                                   parsetree->resultRelation,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363d54..c8c09f1cb5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -156,9 +156,6 @@ extern void ResetTupleHashTable(TupleHashTable hashtable);
  */
 extern JunkFilter *ExecInitJunkFilter(List *targetList,
                                       TupleTableSlot *slot);
-extern JunkFilter *ExecInitJunkFilterInsertion(List *targetList,
-                                               TupleDesc cleanTupType,
-                                               TupleTableSlot *slot);
 extern JunkFilter *ExecInitJunkFilterConversion(List *targetList,
                                                 TupleDesc cleanTupType,
                                                 TupleTableSlot *slot);
@@ -270,6 +267,12 @@ extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
                                                TupleTableSlot *slot,
                                                PlanState *parent,
                                                TupleDesc inputDesc);
+extern ProjectionInfo *ExecBuildUpdateProjection(List *subTargetList,
+                          List *targetColnos,
+                          TupleDesc relDesc,
+                          ExprContext *econtext,
+                          TupleTableSlot *slot,
+                          PlanState *parent);
 extern ExprState *ExecPrepareExpr(Expr *node, EState *estate);
 extern ExprState *ExecPrepareQual(List *qual, EState *estate);
 extern ExprState *ExecPrepareCheck(List *qual, EState *estate);
@@ -622,4 +625,9 @@ extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 extern void CheckSubscriptionRelkind(char relkind, const char *nspname,
                                      const char *relname);

+/* needed by trigger.c */
+extern TupleTableSlot *ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
+                          TupleTableSlot *planSlot,
+                          TupleTableSlot *oldSlot);
+
 #endif                            /* EXECUTOR_H  */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad6204e..7af6d48525 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -356,10 +356,6 @@ typedef struct ProjectionInfo
  *                        attribute numbers of the "original" tuple and the
  *                        attribute numbers of the "clean" tuple.
  *      resultSlot:        tuple slot used to hold cleaned tuple.
- *      junkAttNo:        not used by junkfilter code.  Can be used by caller
- *                        to remember the attno of a specific junk attribute
- *                        (nodeModifyTable.c keeps the "ctid" or "wholerow"
- *                        attno here).
  * ----------------
  */
 typedef struct JunkFilter
@@ -369,7 +365,6 @@ typedef struct JunkFilter
     TupleDesc    jf_cleanTupType;
     AttrNumber *jf_cleanMap;
     TupleTableSlot *jf_resultSlot;
-    AttrNumber    jf_junkAttNo;
 } JunkFilter;

 /*
@@ -423,6 +418,19 @@ typedef struct ResultRelInfo
     /* array of key/attr info for indices */
     IndexInfo **ri_IndexRelationInfo;

+    /*
+     * For UPDATE/DELETE result relations, the attribute number of the row
+     * identity junk attribute in the source plan's output tuples
+     */
+    AttrNumber        ri_RowIdAttNo;
+
+    /* Projection to generate new tuple in an INSERT/UPDATE */
+    ProjectionInfo *ri_projectNew;
+    /* Slot to hold that tuple */
+    TupleTableSlot *ri_newTupleSlot;
+    /* Slot to hold the old tuple being updated */
+    TupleTableSlot *ri_oldTupleSlot;
+
     /* triggers to be fired, if any */
     TriggerDesc *ri_TrigDesc;

@@ -470,9 +478,6 @@ typedef struct ResultRelInfo
     /* number of stored generated columns we need to compute */
     int            ri_NumGeneratedNeeded;

-    /* for removing junk attributes from tuples */
-    JunkFilter *ri_junkFilter;
-
     /* list of RETURNING expressions */
     List       *ri_returningList;

diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c13642e35e..bed9f4da09 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -309,15 +309,23 @@ struct PlannerInfo

     /*
      * The fully-processed targetlist is kept here.  It differs from
-     * parse->targetList in that (for INSERT and UPDATE) it's been reordered
-     * to match the target table, and defaults have been filled in.  Also,
-     * additional resjunk targets may be present.  preprocess_targetlist()
-     * does most of this work, but note that more resjunk targets can get
-     * added during appendrel expansion.  (Hence, upper_targets mustn't get
-     * set up till after that.)
+     * parse->targetList in that (for INSERT) it's been reordered to match the
+     * target table, and defaults have been filled in.  Also, additional
+     * resjunk targets may be present.  preprocess_targetlist() does most of
+     * that work, but note that more resjunk targets can get added during
+     * appendrel expansion.  (Hence, upper_targets mustn't get set up till
+     * after that.)
      */
     List       *processed_tlist;

+    /*
+     * For UPDATE, processed_tlist remains in the order the user wrote the
+     * assignments.  This list contains the target table's attribute numbers
+     * to which the first N entries of processed_tlist are to be assigned.
+     * (Any additional entries in processed_tlist must be resjunk.)
+     */
+    List       *update_colnos;
+
     /* Fields filled during create_plan() for use in setrefs.c */
     AttrNumber *grouping_map;    /* for GroupingFunc fixup */
     List       *minmax_aggs;    /* List of MinMaxAggInfos */
@@ -1839,6 +1847,7 @@ typedef struct ModifyTablePath
     List       *resultRelations;    /* integer list of RT indexes */
     List       *subpaths;        /* Path(s) producing source data */
     List       *subroots;        /* per-target-table PlannerInfos */
+    List       *updateColnosLists; /* per-target-table update_colnos lists */
     List       *withCheckOptionLists;    /* per-target-table WCO lists */
     List       *returningLists; /* per-target-table RETURNING tlists */
     List       *rowMarks;        /* PlanRowMarks (non-locking only) */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 6e62104d0b..7d74bd92b8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -219,6 +219,7 @@ typedef struct ModifyTable
     bool        partColsUpdated;    /* some part key in hierarchy updated */
     List       *resultRelations;    /* integer list of RT indexes */
     List       *plans;            /* plan(s) producing source data */
+    List       *updateColnosLists; /* per-target-table update_colnos lists */
     List       *withCheckOptionLists;    /* per-target-table WCO lists */
     List       *returningLists; /* per-target-table RETURNING tlists */
     List       *fdwPrivLists;    /* per-target-table FDW private data lists */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 54f4b782fc..9673a4a638 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -265,6 +265,7 @@ extern ModifyTablePath *create_modifytable_path(PlannerInfo *root,
                                                 bool partColsUpdated,
                                                 List *resultRelations, List *subpaths,
                                                 List *subroots,
+                                                List *updateColnosLists,
                                                 List *withCheckOptionLists, List *returningLists,
                                                 List *rowMarks, OnConflictExpr *onconflict,
                                                 int epqParam);
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 2b68aef654..94e43c3410 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -545,25 +545,25 @@ create table some_tab_child () inherits (some_tab);
 insert into some_tab_child values(1,2);
 explain (verbose, costs off)
 update some_tab set a = a + 1 where false;
-            QUERY PLAN
-----------------------------------
+           QUERY PLAN
+--------------------------------
  Update on public.some_tab
    Update on public.some_tab
    ->  Result
-         Output: (a + 1), b, ctid
+         Output: (a + 1), ctid
          One-Time Filter: false
 (5 rows)

 update some_tab set a = a + 1 where false;
 explain (verbose, costs off)
 update some_tab set a = a + 1 where false returning b, a;
-            QUERY PLAN
-----------------------------------
+           QUERY PLAN
+--------------------------------
  Update on public.some_tab
    Output: b, a
    Update on public.some_tab
    ->  Result
-         Output: (a + 1), b, ctid
+         Output: (a + 1), ctid
          One-Time Filter: false
 (6 rows)

diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 24905332b1..770eab38b5 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -1283,12 +1283,12 @@ SELECT * FROM rw_view1;
 (4 rows)

 EXPLAIN (verbose, costs off) UPDATE rw_view1 SET b = b + 1 RETURNING *;
-                         QUERY PLAN
--------------------------------------------------------------
+                   QUERY PLAN
+-------------------------------------------------
  Update on public.base_tbl
    Output: base_tbl.a, base_tbl.b
    ->  Seq Scan on public.base_tbl
-         Output: base_tbl.a, (base_tbl.b + 1), base_tbl.ctid
+         Output: (base_tbl.b + 1), base_tbl.ctid
 (4 rows)

 UPDATE rw_view1 SET b = b + 1 RETURNING *;
@@ -2340,7 +2340,7 @@ UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
    Update on public.t12 t1_2
    Update on public.t111 t1_3
    ->  Index Scan using t1_a_idx on public.t1
-         Output: 100, t1.b, t1.c, t1.ctid
+         Output: 100, t1.ctid
          Index Cond: ((t1.a > 5) AND (t1.a < 7))
          Filter: ((t1.a <> 6) AND (SubPlan 1) AND snoop(t1.a) AND leakproof(t1.a))
          SubPlan 1
@@ -2350,15 +2350,15 @@ UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
                  ->  Seq Scan on public.t111 t12_2
                        Filter: (t12_2.a = t1.a)
    ->  Index Scan using t11_a_idx on public.t11 t1_1
-         Output: 100, t1_1.b, t1_1.c, t1_1.d, t1_1.ctid
+         Output: 100, t1_1.ctid
          Index Cond: ((t1_1.a > 5) AND (t1_1.a < 7))
          Filter: ((t1_1.a <> 6) AND (SubPlan 1) AND snoop(t1_1.a) AND leakproof(t1_1.a))
    ->  Index Scan using t12_a_idx on public.t12 t1_2
-         Output: 100, t1_2.b, t1_2.c, t1_2.e, t1_2.ctid
+         Output: 100, t1_2.ctid
          Index Cond: ((t1_2.a > 5) AND (t1_2.a < 7))
          Filter: ((t1_2.a <> 6) AND (SubPlan 1) AND snoop(t1_2.a) AND leakproof(t1_2.a))
    ->  Index Scan using t111_a_idx on public.t111 t1_3
-         Output: 100, t1_3.b, t1_3.c, t1_3.d, t1_3.e, t1_3.ctid
+         Output: 100, t1_3.ctid
          Index Cond: ((t1_3.a > 5) AND (t1_3.a < 7))
          Filter: ((t1_3.a <> 6) AND (SubPlan 1) AND snoop(t1_3.a) AND leakproof(t1_3.a))
 (27 rows)
@@ -2376,15 +2376,15 @@ SELECT * FROM t1 WHERE a=100; -- Nothing should have been changed to 100

 EXPLAIN (VERBOSE, COSTS OFF)
 UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
-                               QUERY PLAN
--------------------------------------------------------------------------
+                              QUERY PLAN
+-----------------------------------------------------------------------
  Update on public.t1
    Update on public.t1
    Update on public.t11 t1_1
    Update on public.t12 t1_2
    Update on public.t111 t1_3
    ->  Index Scan using t1_a_idx on public.t1
-         Output: (t1.a + 1), t1.b, t1.c, t1.ctid
+         Output: (t1.a + 1), t1.ctid
          Index Cond: ((t1.a > 5) AND (t1.a = 8))
          Filter: ((SubPlan 1) AND snoop(t1.a) AND leakproof(t1.a))
          SubPlan 1
@@ -2394,15 +2394,15 @@ UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
                  ->  Seq Scan on public.t111 t12_2
                        Filter: (t12_2.a = t1.a)
    ->  Index Scan using t11_a_idx on public.t11 t1_1
-         Output: (t1_1.a + 1), t1_1.b, t1_1.c, t1_1.d, t1_1.ctid
+         Output: (t1_1.a + 1), t1_1.ctid
          Index Cond: ((t1_1.a > 5) AND (t1_1.a = 8))
          Filter: ((SubPlan 1) AND snoop(t1_1.a) AND leakproof(t1_1.a))
    ->  Index Scan using t12_a_idx on public.t12 t1_2
-         Output: (t1_2.a + 1), t1_2.b, t1_2.c, t1_2.e, t1_2.ctid
+         Output: (t1_2.a + 1), t1_2.ctid
          Index Cond: ((t1_2.a > 5) AND (t1_2.a = 8))
          Filter: ((SubPlan 1) AND snoop(t1_2.a) AND leakproof(t1_2.a))
    ->  Index Scan using t111_a_idx on public.t111 t1_3
-         Output: (t1_3.a + 1), t1_3.b, t1_3.c, t1_3.d, t1_3.e, t1_3.ctid
+         Output: (t1_3.a + 1), t1_3.ctid
          Index Cond: ((t1_3.a > 5) AND (t1_3.a = 8))
          Filter: ((SubPlan 1) AND snoop(t1_3.a) AND leakproof(t1_3.a))
 (27 rows)
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index bf939d79f6..dece036069 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -172,14 +172,14 @@ EXPLAIN (VERBOSE, COSTS OFF)
 UPDATE update_test t
   SET (a, b) = (SELECT b, a FROM update_test s WHERE s.a = t.a)
   WHERE CURRENT_USER = SESSION_USER;
-                            QUERY PLAN
-------------------------------------------------------------------
+                         QUERY PLAN
+-------------------------------------------------------------
  Update on public.update_test t
    ->  Result
-         Output: $1, $2, t.c, (SubPlan 1 (returns $1,$2)), t.ctid
+         Output: $1, $2, (SubPlan 1 (returns $1,$2)), t.ctid
          One-Time Filter: (CURRENT_USER = SESSION_USER)
          ->  Seq Scan on public.update_test t
-               Output: t.c, t.a, t.ctid
+               Output: t.a, t.ctid
          SubPlan 1 (returns $1,$2)
            ->  Seq Scan on public.update_test s
                  Output: s.b, s.a
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index cff23b0211..83d81886cc 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -46,6 +46,7 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
+#include "optimizer/inherit.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/prep.h"
 #include "optimizer/tlist.h"
@@ -1275,7 +1276,7 @@ deparseLockingClause(deparse_expr_cxt *context)
          * that DECLARE CURSOR ... FOR UPDATE is supported, which it isn't
          * before 8.3.
          */
-        if (relid == root->parse->resultRelation &&
+        if (is_result_relation(root, relid) &&
             (root->parse->commandType == CMD_UPDATE ||
              root->parse->commandType == CMD_DELETE))
         {
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b46e7e623f..a4cd127e0e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -6354,7 +6354,7 @@ UPDATE rw_view SET b = b + 5;
    Foreign Update on public.foreign_tbl parent_tbl_1
      Remote SQL: UPDATE public.child_tbl SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Foreign Scan on public.foreign_tbl parent_tbl_1
-         Output: (parent_tbl_1.b + 5), parent_tbl_1.ctid, parent_tbl_1.*
+         Output: (parent_tbl_1.b + 5), parent_tbl_1.ctid, 0, parent_tbl_1.*
          Remote SQL: SELECT a, b, ctid FROM public.child_tbl WHERE ((a < b)) FOR UPDATE
 (6 rows)

@@ -6369,7 +6369,7 @@ UPDATE rw_view SET b = b + 15;
    Foreign Update on public.foreign_tbl parent_tbl_1
      Remote SQL: UPDATE public.child_tbl SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Foreign Scan on public.foreign_tbl parent_tbl_1
-         Output: (parent_tbl_1.b + 15), parent_tbl_1.ctid, parent_tbl_1.*
+         Output: (parent_tbl_1.b + 15), parent_tbl_1.ctid, 0, parent_tbl_1.*
          Remote SQL: SELECT a, b, ctid FROM public.child_tbl WHERE ((a < b)) FOR UPDATE
 (6 rows)

@@ -7256,33 +7256,19 @@ update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
                                       QUERY PLAN
 ---------------------------------------------------------------------------------------
  Update on public.bar
-   Update on public.bar
-   Foreign Update on public.bar2 bar_1
+   Update on public.bar bar_1
+   Foreign Update on public.bar2 bar_2
      Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1
    ->  Hash Join
-         Output: (bar.f2 + 100), bar.ctid, foo.ctid, foo.*, foo.tableoid
+         Output: (bar.f2 + 100), bar.ctid, foo.ctid, (0), bar.*, foo.*, foo.tableoid
          Inner Unique: true
          Hash Cond: (bar.f1 = foo.f1)
-         ->  Seq Scan on public.bar
-               Output: bar.f2, bar.ctid, bar.f1
-         ->  Hash
-               Output: foo.ctid, foo.f1, foo.*, foo.tableoid
-               ->  HashAggregate
-                     Output: foo.ctid, foo.f1, foo.*, foo.tableoid
-                     Group Key: foo.f1
-                     ->  Append
-                           ->  Seq Scan on public.foo foo_1
-                                 Output: foo_1.ctid, foo_1.f1, foo_1.*, foo_1.tableoid
-                           ->  Foreign Scan on public.foo2 foo_2
-                                 Output: foo_2.ctid, foo_2.f1, foo_2.*, foo_2.tableoid
-                                 Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct1
-   ->  Hash Join
-         Output: (bar_1.f2 + 100), bar_1.ctid, bar_1.*, foo.ctid, foo.*, foo.tableoid
-         Inner Unique: true
-         Hash Cond: (bar_1.f1 = foo.f1)
-         ->  Foreign Scan on public.bar2 bar_1
-               Output: bar_1.f2, bar_1.ctid, bar_1.*, bar_1.f1
-               Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
+         ->  Append
+               ->  Seq Scan on public.bar bar_1
+                     Output: bar_1.f2, bar_1.ctid, bar_1.f1, 0, bar_1.*
+               ->  Foreign Scan on public.bar2 bar_2
+                     Output: bar_2.f2, bar_2.ctid, bar_2.f1, 1, bar_2.*
+                     Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
          ->  Hash
                Output: foo.ctid, foo.f1, foo.*, foo.tableoid
                ->  HashAggregate
@@ -7294,7 +7280,7 @@ update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
                            ->  Foreign Scan on public.foo2 foo_2
                                  Output: foo_2.ctid, foo_2.f1, foo_2.*, foo_2.tableoid
                                  Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct1
-(39 rows)
+(25 rows)

 update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
 select tableoid::regclass, * from bar order by 1,2;
@@ -7314,39 +7300,24 @@ update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
-                                      QUERY PLAN
---------------------------------------------------------------------------------------
+                                         QUERY PLAN
+--------------------------------------------------------------------------------------------
  Update on public.bar
-   Update on public.bar
-   Foreign Update on public.bar2 bar_1
+   Update on public.bar bar_1
+   Foreign Update on public.bar2 bar_2
      Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1
-   ->  Hash Join
-         Output: (bar.f2 + 100), bar.ctid, (ROW(foo.f1))
-         Hash Cond: (foo.f1 = bar.f1)
-         ->  Append
-               ->  Seq Scan on public.foo
-                     Output: ROW(foo.f1), foo.f1
-               ->  Foreign Scan on public.foo2 foo_1
-                     Output: ROW(foo_1.f1), foo_1.f1
-                     Remote SQL: SELECT f1 FROM public.loct1
-               ->  Seq Scan on public.foo foo_2
-                     Output: ROW((foo_2.f1 + 3)), (foo_2.f1 + 3)
-               ->  Foreign Scan on public.foo2 foo_3
-                     Output: ROW((foo_3.f1 + 3)), (foo_3.f1 + 3)
-                     Remote SQL: SELECT f1 FROM public.loct1
-         ->  Hash
-               Output: bar.f2, bar.ctid, bar.f1
-               ->  Seq Scan on public.bar
-                     Output: bar.f2, bar.ctid, bar.f1
    ->  Merge Join
-         Output: (bar_1.f2 + 100), bar_1.ctid, bar_1.*, (ROW(foo.f1))
-         Merge Cond: (bar_1.f1 = foo.f1)
+         Output: (bar.f2 + 100), bar.ctid, (ROW(foo.f1)), (0), bar.*
+         Merge Cond: (bar.f1 = foo.f1)
          ->  Sort
-               Output: bar_1.f2, bar_1.ctid, bar_1.*, bar_1.f1
-               Sort Key: bar_1.f1
-               ->  Foreign Scan on public.bar2 bar_1
-                     Output: bar_1.f2, bar_1.ctid, bar_1.*, bar_1.f1
-                     Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
+               Output: bar.f2, bar.ctid, bar.f1, (0), bar.*
+               Sort Key: bar.f1
+               ->  Append
+                     ->  Seq Scan on public.bar bar_1
+                           Output: bar_1.f2, bar_1.ctid, bar_1.f1, 0, bar_1.*
+                     ->  Foreign Scan on public.bar2 bar_2
+                           Output: bar_2.f2, bar_2.ctid, bar_2.f1, 1, bar_2.*
+                           Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
          ->  Sort
                Output: (ROW(foo.f1)), foo.f1
                Sort Key: foo.f1
@@ -7361,7 +7332,7 @@ where bar.f1 = ss.f1;
                      ->  Foreign Scan on public.foo2 foo_3
                            Output: ROW((foo_3.f1 + 3)), (foo_3.f1 + 3)
                            Remote SQL: SELECT f1 FROM public.loct1
-(45 rows)
+(30 rows)

 update bar set f2 = f2 + 100
 from
@@ -7487,18 +7458,19 @@ ERROR:  WHERE CURRENT OF is not supported for this table type
 rollback;
 explain (verbose, costs off)
 delete from foo where f1 < 5 returning *;
-                                   QUERY PLAN
---------------------------------------------------------------------------------
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
  Delete on public.foo
-   Output: foo.f1, foo.f2
-   Delete on public.foo
-   Foreign Delete on public.foo2 foo_1
-   ->  Index Scan using i_foo_f1 on public.foo
-         Output: foo.ctid
-         Index Cond: (foo.f1 < 5)
-   ->  Foreign Delete on public.foo2 foo_1
-         Remote SQL: DELETE FROM public.loct1 WHERE ((f1 < 5)) RETURNING f1, f2
-(9 rows)
+   Output: foo_1.f1, foo_1.f2
+   Delete on public.foo foo_1
+   Foreign Delete on public.foo2 foo_2
+   ->  Append
+         ->  Index Scan using i_foo_f1 on public.foo foo_1
+               Output: foo_1.ctid, 0
+               Index Cond: (foo_1.f1 < 5)
+         ->  Foreign Delete on public.foo2 foo_2
+               Remote SQL: DELETE FROM public.loct1 WHERE ((f1 < 5)) RETURNING f1, f2
+(10 rows)

 delete from foo where f1 < 5 returning *;
  f1 | f2
@@ -7512,17 +7484,20 @@ delete from foo where f1 < 5 returning *;

 explain (verbose, costs off)
 update bar set f2 = f2 + 100 returning *;
-                                  QUERY PLAN
-------------------------------------------------------------------------------
+                                        QUERY PLAN
+------------------------------------------------------------------------------------------
  Update on public.bar
-   Output: bar.f1, bar.f2
-   Update on public.bar
-   Foreign Update on public.bar2 bar_1
-   ->  Seq Scan on public.bar
-         Output: (bar.f2 + 100), bar.ctid
-   ->  Foreign Update on public.bar2 bar_1
-         Remote SQL: UPDATE public.loct2 SET f2 = (f2 + 100) RETURNING f1, f2
-(8 rows)
+   Output: bar_1.f1, bar_1.f2
+   Update on public.bar bar_1
+   Foreign Update on public.bar2 bar_2
+   ->  Result
+         Output: (bar.f2 + 100), bar.ctid, (0), bar.*
+         ->  Append
+               ->  Seq Scan on public.bar bar_1
+                     Output: bar_1.f2, bar_1.ctid, 0, bar_1.*
+               ->  Foreign Update on public.bar2 bar_2
+                     Remote SQL: UPDATE public.loct2 SET f2 = (f2 + 100) RETURNING f1, f2
+(11 rows)

 update bar set f2 = f2 + 100 returning *;
  f1 | f2
@@ -7547,15 +7522,18 @@ update bar set f2 = f2 + 100;
                                                QUERY PLAN
 --------------------------------------------------------------------------------------------------------
  Update on public.bar
-   Update on public.bar
-   Foreign Update on public.bar2 bar_1
+   Update on public.bar bar_1
+   Foreign Update on public.bar2 bar_2
      Remote SQL: UPDATE public.loct2 SET f1 = $2, f2 = $3, f3 = $4 WHERE ctid = $1 RETURNING f1, f2, f3
-   ->  Seq Scan on public.bar
-         Output: (bar.f2 + 100), bar.ctid
-   ->  Foreign Scan on public.bar2 bar_1
-         Output: (bar_1.f2 + 100), bar_1.ctid, bar_1.*
-         Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
-(9 rows)
+   ->  Result
+         Output: (bar.f2 + 100), bar.ctid, (0), bar.*
+         ->  Append
+               ->  Seq Scan on public.bar bar_1
+                     Output: bar_1.f2, bar_1.ctid, 0, bar_1.*
+               ->  Foreign Scan on public.bar2 bar_2
+                     Output: bar_2.f2, bar_2.ctid, 1, bar_2.*
+                     Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
+(12 rows)

 update bar set f2 = f2 + 100;
 NOTICE:  trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2
@@ -7572,19 +7550,20 @@ NOTICE:  trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2
 NOTICE:  OLD: (7,277,77),NEW: (7,377,77)
 explain (verbose, costs off)
 delete from bar where f2 < 400;
-                                         QUERY PLAN
----------------------------------------------------------------------------------------------
+                                            QUERY PLAN
+---------------------------------------------------------------------------------------------------
  Delete on public.bar
-   Delete on public.bar
-   Foreign Delete on public.bar2 bar_1
+   Delete on public.bar bar_1
+   Foreign Delete on public.bar2 bar_2
      Remote SQL: DELETE FROM public.loct2 WHERE ctid = $1 RETURNING f1, f2, f3
-   ->  Seq Scan on public.bar
-         Output: bar.ctid
-         Filter: (bar.f2 < 400)
-   ->  Foreign Scan on public.bar2 bar_1
-         Output: bar_1.ctid, bar_1.*
-         Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 WHERE ((f2 < 400)) FOR UPDATE
-(10 rows)
+   ->  Append
+         ->  Seq Scan on public.bar bar_1
+               Output: bar_1.ctid, 0, bar_1.*
+               Filter: (bar_1.f2 < 400)
+         ->  Foreign Scan on public.bar2 bar_2
+               Output: bar_2.ctid, 1, bar_2.*
+               Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 WHERE ((f2 < 400)) FOR UPDATE
+(11 rows)

 delete from bar where f2 < 400;
 NOTICE:  trig_row_before(23, skidoo) BEFORE ROW DELETE ON bar2
@@ -7615,23 +7594,28 @@ analyze remt1;
 analyze remt2;
 explain (verbose, costs off)
 update parent set b = parent.b || remt2.b from remt2 where parent.a = remt2.a returning *;
-                                                                  QUERY PLAN
                        

------------------------------------------------------------------------------------------------------------------------------------------------
+                                          QUERY PLAN
+----------------------------------------------------------------------------------------------
  Update on public.parent
-   Output: parent.a, parent.b, remt2.a, remt2.b
-   Update on public.parent
-   Foreign Update on public.remt1 parent_1
+   Output: parent_1.a, parent_1.b, remt2.a, remt2.b
+   Update on public.parent parent_1
+   Foreign Update on public.remt1 parent_2
+     Remote SQL: UPDATE public.loct1 SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Nested Loop
-         Output: (parent.b || remt2.b), parent.ctid, remt2.*, remt2.a, remt2.b
+         Output: (parent.b || remt2.b), parent.ctid, remt2.*, remt2.a, remt2.b, (0), parent.*
          Join Filter: (parent.a = remt2.a)
-         ->  Seq Scan on public.parent
-               Output: parent.b, parent.ctid, parent.a
-         ->  Foreign Scan on public.remt2
+         ->  Append
+               ->  Seq Scan on public.parent parent_1
+                     Output: parent_1.b, parent_1.ctid, parent_1.a, 0, parent_1.*
+               ->  Foreign Scan on public.remt1 parent_2
+                     Output: parent_2.b, parent_2.ctid, parent_2.a, 1, parent_2.*
+                     Remote SQL: SELECT a, b, ctid FROM public.loct1 FOR UPDATE
+         ->  Materialize
                Output: remt2.b, remt2.*, remt2.a
-               Remote SQL: SELECT a, b FROM public.loct2
-   ->  Foreign Update
-         Remote SQL: UPDATE public.loct1 r4 SET b = (r4.b || r2.b) FROM public.loct2 r2 WHERE ((r4.a = r2.a))
RETURNINGr4.a, r4.b, r2.a, r2.b 
-(14 rows)
+               ->  Foreign Scan on public.remt2
+                     Output: remt2.b, remt2.*, remt2.a
+                     Remote SQL: SELECT a, b FROM public.loct2
+(19 rows)

 update parent set b = parent.b || remt2.b from remt2 where parent.a = remt2.a returning *;
  a |   b    | a |  b
@@ -7642,23 +7626,28 @@ update parent set b = parent.b || remt2.b from remt2 where parent.a = remt2.a re

 explain (verbose, costs off)
 delete from parent using remt2 where parent.a = remt2.a returning parent;
-                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
+                                 QUERY PLAN
+-----------------------------------------------------------------------------
  Delete on public.parent
-   Output: parent.*
-   Delete on public.parent
-   Foreign Delete on public.remt1 parent_1
+   Output: parent_1.*
+   Delete on public.parent parent_1
+   Foreign Delete on public.remt1 parent_2
+     Remote SQL: DELETE FROM public.loct1 WHERE ctid = $1 RETURNING a, b
    ->  Nested Loop
-         Output: parent.ctid, remt2.*
+         Output: parent.ctid, remt2.*, (0)
          Join Filter: (parent.a = remt2.a)
-         ->  Seq Scan on public.parent
-               Output: parent.ctid, parent.a
-         ->  Foreign Scan on public.remt2
+         ->  Append
+               ->  Seq Scan on public.parent parent_1
+                     Output: parent_1.ctid, parent_1.a, 0
+               ->  Foreign Scan on public.remt1 parent_2
+                     Output: parent_2.ctid, parent_2.a, 1
+                     Remote SQL: SELECT a, ctid FROM public.loct1 FOR UPDATE
+         ->  Materialize
                Output: remt2.*, remt2.a
-               Remote SQL: SELECT a, b FROM public.loct2
-   ->  Foreign Delete
-         Remote SQL: DELETE FROM public.loct1 r4 USING public.loct2 r2 WHERE ((r4.a = r2.a)) RETURNING r4.a, r4.b
-(14 rows)
+               ->  Foreign Scan on public.remt2
+                     Output: remt2.*, remt2.a
+                     Remote SQL: SELECT a, b FROM public.loct2
+(19 rows)

 delete from parent using remt2 where parent.a = remt2.a returning parent;
    parent
@@ -7810,13 +7799,11 @@ create table locp (a int check (a in (2)), b text);
 alter table utrtest attach partition remp for values in (1);
 alter table utrtest attach partition locp for values in (2);
 insert into utrtest values (1, 'foo');
-insert into utrtest values (2, 'qux');
 select tableoid::regclass, * FROM utrtest;
  tableoid | a |  b
 ----------+---+-----
  remp     | 1 | foo
- locp     | 2 | qux
-(2 rows)
+(1 row)

 select tableoid::regclass, * FROM remp;
  tableoid | a |  b
@@ -7825,18 +7812,21 @@ select tableoid::regclass, * FROM remp;
 (1 row)

 select tableoid::regclass, * FROM locp;
- tableoid | a |  b
-----------+---+-----
- locp     | 2 | qux
-(1 row)
+ tableoid | a | b
+----------+---+---
+(0 rows)

 -- It's not allowed to move a row from a partition that is foreign to another
 update utrtest set a = 2 where b = 'foo' returning *;
 ERROR:  new row for relation "loct" violates check constraint "loct_a_check"
 DETAIL:  Failing row contains (2, foo).
 CONTEXT:  remote SQL command: UPDATE public.loct SET a = 2 WHERE ((b = 'foo'::text)) RETURNING a, b
--- But the reverse is allowed
+-- But the reverse is allowed provided the target foreign partition is itself
+-- not an UPDATE target
+insert into utrtest values (2, 'qux');
 update utrtest set a = 1 where b = 'qux' returning *;
+ERROR:  cannot route tuples into foreign table to be updated "remp"
+update utrtest set a = 1 where a = 2 returning *;
  a |  b
 ---+-----
  1 | qux
@@ -7868,32 +7858,6 @@ create trigger loct_br_insert_trigger before insert on loct
     for each row execute procedure br_insert_trigfunc();
 delete from utrtest;
 insert into utrtest values (2, 'qux');
--- Check case where the foreign partition is a subplan target rel
-explain (verbose, costs off)
-update utrtest set a = 1 where a = 1 or a = 2 returning *;
-                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
- Update on public.utrtest
-   Output: utrtest_1.a, utrtest_1.b
-   Foreign Update on public.remp utrtest_1
-   Update on public.locp utrtest_2
-   ->  Foreign Update on public.remp utrtest_1
-         Remote SQL: UPDATE public.loct SET a = 1 WHERE (((a = 1) OR (a = 2))) RETURNING a, b
-   ->  Seq Scan on public.locp utrtest_2
-         Output: 1, utrtest_2.ctid
-         Filter: ((utrtest_2.a = 1) OR (utrtest_2.a = 2))
-(9 rows)
-
--- The new values are concatenated with ' triggered !'
-update utrtest set a = 1 where a = 1 or a = 2 returning *;
- a |        b
----+-----------------
- 1 | qux triggered !
-(1 row)
-
-delete from utrtest;
-insert into utrtest values (2, 'qux');
--- Check case where the foreign partition isn't a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 2 returning *;
                QUERY PLAN
@@ -7902,7 +7866,7 @@ update utrtest set a = 1 where a = 2 returning *;
    Output: utrtest_1.a, utrtest_1.b
    Update on public.locp utrtest_1
    ->  Seq Scan on public.locp utrtest_1
-         Output: 1, utrtest_1.ctid
+         Output: 1, utrtest_1.ctid, 0
          Filter: (utrtest_1.a = 2)
 (6 rows)

@@ -7914,132 +7878,6 @@ update utrtest set a = 1 where a = 2 returning *;
 (1 row)

 drop trigger loct_br_insert_trigger on loct;
--- We can move rows to a foreign partition that has been updated already,
--- but can't move rows to a foreign partition that hasn't been updated yet
-delete from utrtest;
-insert into utrtest values (1, 'foo');
-insert into utrtest values (2, 'qux');
--- Test the former case:
--- with a direct modification plan
-explain (verbose, costs off)
-update utrtest set a = 1 returning *;
-                           QUERY PLAN
------------------------------------------------------------------
- Update on public.utrtest
-   Output: utrtest_1.a, utrtest_1.b
-   Foreign Update on public.remp utrtest_1
-   Update on public.locp utrtest_2
-   ->  Foreign Update on public.remp utrtest_1
-         Remote SQL: UPDATE public.loct SET a = 1 RETURNING a, b
-   ->  Seq Scan on public.locp utrtest_2
-         Output: 1, utrtest_2.ctid
-(8 rows)
-
-update utrtest set a = 1 returning *;
- a |  b
----+-----
- 1 | foo
- 1 | qux
-(2 rows)
-
-delete from utrtest;
-insert into utrtest values (1, 'foo');
-insert into utrtest values (2, 'qux');
--- with a non-direct modification plan
-explain (verbose, costs off)
-update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
-                                    QUERY PLAN
-----------------------------------------------------------------------------------
- Update on public.utrtest
-   Output: utrtest_1.a, utrtest_1.b, "*VALUES*".column1
-   Foreign Update on public.remp utrtest_1
-     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
-   Update on public.locp utrtest_2
-   ->  Hash Join
-         Output: 1, utrtest_1.ctid, utrtest_1.*, "*VALUES*".*, "*VALUES*".column1
-         Hash Cond: (utrtest_1.a = "*VALUES*".column1)
-         ->  Foreign Scan on public.remp utrtest_1
-               Output: utrtest_1.ctid, utrtest_1.*, utrtest_1.a
-               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
-         ->  Hash
-               Output: "*VALUES*".*, "*VALUES*".column1
-               ->  Values Scan on "*VALUES*"
-                     Output: "*VALUES*".*, "*VALUES*".column1
-   ->  Hash Join
-         Output: 1, utrtest_2.ctid, "*VALUES*".*, "*VALUES*".column1
-         Hash Cond: (utrtest_2.a = "*VALUES*".column1)
-         ->  Seq Scan on public.locp utrtest_2
-               Output: utrtest_2.ctid, utrtest_2.a
-         ->  Hash
-               Output: "*VALUES*".*, "*VALUES*".column1
-               ->  Values Scan on "*VALUES*"
-                     Output: "*VALUES*".*, "*VALUES*".column1
-(24 rows)
-
-update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
-ERROR:  invalid attribute number 5
--- Change the definition of utrtest so that the foreign partition get updated
--- after the local partition
-delete from utrtest;
-alter table utrtest detach partition remp;
-drop foreign table remp;
-alter table loct drop constraint loct_a_check;
-alter table loct add check (a in (3));
-create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
-alter table utrtest attach partition remp for values in (3);
-insert into utrtest values (2, 'qux');
-insert into utrtest values (3, 'xyzzy');
--- Test the latter case:
--- with a direct modification plan
-explain (verbose, costs off)
-update utrtest set a = 3 returning *;
-                           QUERY PLAN
------------------------------------------------------------------
- Update on public.utrtest
-   Output: utrtest_1.a, utrtest_1.b
-   Update on public.locp utrtest_1
-   Foreign Update on public.remp utrtest_2
-   ->  Seq Scan on public.locp utrtest_1
-         Output: 3, utrtest_1.ctid
-   ->  Foreign Update on public.remp utrtest_2
-         Remote SQL: UPDATE public.loct SET a = 3 RETURNING a, b
-(8 rows)
-
-update utrtest set a = 3 returning *; -- ERROR
-ERROR:  cannot route tuples into foreign table to be updated "remp"
--- with a non-direct modification plan
-explain (verbose, costs off)
-update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
-                                    QUERY PLAN
-----------------------------------------------------------------------------------
- Update on public.utrtest
-   Output: utrtest_1.a, utrtest_1.b, "*VALUES*".column1
-   Update on public.locp utrtest_1
-   Foreign Update on public.remp utrtest_2
-     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
-   ->  Hash Join
-         Output: 3, utrtest_1.ctid, "*VALUES*".*, "*VALUES*".column1
-         Hash Cond: (utrtest_1.a = "*VALUES*".column1)
-         ->  Seq Scan on public.locp utrtest_1
-               Output: utrtest_1.ctid, utrtest_1.a
-         ->  Hash
-               Output: "*VALUES*".*, "*VALUES*".column1
-               ->  Values Scan on "*VALUES*"
-                     Output: "*VALUES*".*, "*VALUES*".column1
-   ->  Hash Join
-         Output: 3, utrtest_2.ctid, utrtest_2.*, "*VALUES*".*, "*VALUES*".column1
-         Hash Cond: (utrtest_2.a = "*VALUES*".column1)
-         ->  Foreign Scan on public.remp utrtest_2
-               Output: utrtest_2.ctid, utrtest_2.*, utrtest_2.a
-               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
-         ->  Hash
-               Output: "*VALUES*".*, "*VALUES*".column1
-               ->  Values Scan on "*VALUES*"
-                     Output: "*VALUES*".*, "*VALUES*".column1
-(24 rows)
-
-update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *; -- ERROR
-ERROR:  cannot route tuples into foreign table to be updated "remp"
 drop table utrtest;
 drop table loct;
 -- Test copy tuple routing
@@ -9422,7 +9260,7 @@ CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
     FOR VALUES IN (2);
 INSERT INTO batch_cp_upd_test VALUES (1), (2);
 -- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND t.a = 2;
 SELECT tableoid::regclass, * FROM batch_cp_upd_test;
        tableoid       | a
 ----------------------+---
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 6ba6786c8b..4ed583b35b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -28,6 +28,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
+#include "optimizer/inherit.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -1854,7 +1855,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
                                     rte,
                                     resultRelInfo,
                                     mtstate->operation,
-                                    mtstate->mt_plans[subplan_index]->plan,
+                                    outerPlanState(mtstate)->plan,
                                     query,
                                     target_attrs,
                                     values_end_len,
@@ -2054,8 +2055,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
      */
     if (plan && plan->operation == CMD_UPDATE &&
         (resultRelInfo->ri_usesFdwDirectModify ||
-         resultRelInfo->ri_FdwState) &&
-        resultRelInfo > mtstate->resultRelInfo + mtstate->mt_whichplan)
+         resultRelInfo->ri_FdwState))
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                  errmsg("cannot route tuples into foreign table to be updated \"%s\"",
@@ -2251,6 +2251,82 @@ postgresRecheckForeignScan(ForeignScanState *node, TupleTableSlot *slot)
     return true;
 }

+/*
+ * modifytable_result_subplan_pushable
+ *        Helper routine for postgresPlanDirectModify to find subplan
+ *        corresponding to subplan_index'th result relation of the given
+ *        ModifyTable node and check if it's pushable, returning true if
+ *        so and setting *subplan_p to thus found subplan
+ *
+ * *subplan_p will be set to NULL if a pushable subplan can't be located.
+ */
+static bool
+modifytable_result_subplan_pushable(PlannerInfo *root,
+                                    ModifyTable *plan,
+                                    int subplan_index,
+                                    Plan **subplan_p)
+{
+    Plan   *subplan = outerPlan(plan);
+
+    /*
+     * In a non-inherited update, check the top-level plan itself.
+     */
+    if (IsA(subplan, ForeignScan))
+    {
+        *subplan_p = subplan;
+        return true;
+    }
+
+    /*
+     * In an inherited update, unless the result relation is joined to another
+     * relation, the top-level plan would be an Append/MergeAppend with result
+     * relation subplans underneath, and in some cases even a Result node on
+     * top of the Append/MergeAppend.  These nodes atop result relation
+     * subplans can be ignored as no-op as far determining if the subplan can
+     * be pushed to remote side is concerned, because their job is to for the
+     * most part passing the tuples fetched from the subplan along to the
+     * ModifyTable node which performs the actual update/delete operation.
+     * It's true that Result node isn't entirely no-op, because it is added
+     * to compute the query's targetlist, but if the targetlist is pushable,
+     * it can be safely ignored too.
+     */
+    if (IsA(subplan, Append))
+    {
+        Append       *appendplan = (Append *) subplan;
+
+        subplan = (Plan *) list_nth(appendplan->appendplans, subplan_index);
+    }
+    else if (IsA(subplan, Result) && IsA(outerPlan(subplan), Append))
+    {
+        Append       *appendplan = (Append *) outerPlan(subplan);
+
+        subplan = (Plan *) list_nth(appendplan->appendplans, subplan_index);
+    }
+    else if (IsA(subplan, MergeAppend))
+    {
+        MergeAppend       *maplan = (MergeAppend *) subplan;
+
+        subplan = (Plan *) list_nth(maplan->mergeplans, subplan_index);
+    }
+    else if (IsA(subplan, Result) && IsA(outerPlan(subplan), MergeAppend))
+    {
+        MergeAppend       *maplan = (MergeAppend *) outerPlan(subplan);
+
+        subplan = (Plan *) list_nth(maplan->mergeplans, subplan_index);
+    }
+
+    if (IsA(subplan, ForeignScan))
+    {
+        *subplan_p = subplan;
+        return true;
+    }
+
+    /* Caller won't use it, but set anyway. */
+    *subplan_p = NULL;
+
+    return false;
+}
+
 /*
  * postgresPlanDirectModify
  *        Consider a direct foreign table modification
@@ -2272,6 +2348,7 @@ postgresPlanDirectModify(PlannerInfo *root,
     Relation    rel;
     StringInfoData sql;
     ForeignScan *fscan;
+    List       *processed_tlist = NIL;
     List       *targetAttrs = NIL;
     List       *remote_exprs;
     List       *params_list = NIL;
@@ -2289,12 +2366,14 @@ postgresPlanDirectModify(PlannerInfo *root,
         return false;

     /*
-     * It's unsafe to modify a foreign table directly if there are any local
-     * joins needed.
+     * The following checks if the subplan corresponding to this result
+     * relation is pushable, if so, returns the ForeignScan node for the
+     * pushable subplan.
      */
-    subplan = (Plan *) list_nth(plan->plans, subplan_index);
-    if (!IsA(subplan, ForeignScan))
+    if (!modifytable_result_subplan_pushable(root, plan, subplan_index,
+                                             &subplan))
         return false;
+    Assert(IsA(subplan, ForeignScan));
     fscan = (ForeignScan *) subplan;

     /*
@@ -2313,6 +2392,11 @@ postgresPlanDirectModify(PlannerInfo *root,
     }
     else
         foreignrel = root->simple_rel_array[resultRelation];
+
+    /* Sanity check. */
+    if (!bms_is_member(resultRelation, foreignrel->relids))
+        elog(ERROR, "invalid subplan for result relation %u", resultRelation);
+
     rte = root->simple_rte_array[resultRelation];
     fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;

@@ -2325,11 +2409,12 @@ postgresPlanDirectModify(PlannerInfo *root,
         ListCell *lc, *lc2;

         /*
-         * The expressions of concern are the first N columns of the subplan
-         * targetlist, where N is the length of root->update_colnos.
+         * The expressions of concern are the first N columns of the processed
+         * targetlist, where N is the length of the rel's update_colnos.
          */
-        targetAttrs = root->update_colnos;
-        forboth(lc, subplan->targetlist, lc2, targetAttrs)
+        get_result_update_info(root, resultRelation,
+                               &processed_tlist, &targetAttrs);
+        forboth(lc, processed_tlist, lc2, targetAttrs)
         {
             TargetEntry *tle = lfirst_node(TargetEntry, lc);
             AttrNumber attno = lfirst_int(lc2);
@@ -2392,7 +2477,7 @@ postgresPlanDirectModify(PlannerInfo *root,
         case CMD_UPDATE:
             deparseDirectUpdateSql(&sql, root, resultRelation, rel,
                                    foreignrel,
-                                   ((Plan *) fscan)->targetlist,
+                                   processed_tlist,
                                    targetAttrs,
                                    remote_exprs, ¶ms_list,
                                    returningList, &retrieved_attrs);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2b525ea44a..46bf4411f8 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2079,7 +2079,6 @@ alter table utrtest attach partition remp for values in (1);
 alter table utrtest attach partition locp for values in (2);

 insert into utrtest values (1, 'foo');
-insert into utrtest values (2, 'qux');

 select tableoid::regclass, * FROM utrtest;
 select tableoid::regclass, * FROM remp;
@@ -2088,8 +2087,11 @@ select tableoid::regclass, * FROM locp;
 -- It's not allowed to move a row from a partition that is foreign to another
 update utrtest set a = 2 where b = 'foo' returning *;

--- But the reverse is allowed
+-- But the reverse is allowed provided the target foreign partition is itself
+-- not an UPDATE target
+insert into utrtest values (2, 'qux');
 update utrtest set a = 1 where b = 'qux' returning *;
+update utrtest set a = 1 where a = 2 returning *;

 select tableoid::regclass, * FROM utrtest;
 select tableoid::regclass, * FROM remp;
@@ -2104,17 +2106,6 @@ create trigger loct_br_insert_trigger before insert on loct

 delete from utrtest;
 insert into utrtest values (2, 'qux');
-
--- Check case where the foreign partition is a subplan target rel
-explain (verbose, costs off)
-update utrtest set a = 1 where a = 1 or a = 2 returning *;
--- The new values are concatenated with ' triggered !'
-update utrtest set a = 1 where a = 1 or a = 2 returning *;
-
-delete from utrtest;
-insert into utrtest values (2, 'qux');
-
--- Check case where the foreign partition isn't a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 2 returning *;
 -- The new values are concatenated with ' triggered !'
@@ -2122,51 +2113,6 @@ update utrtest set a = 1 where a = 2 returning *;

 drop trigger loct_br_insert_trigger on loct;

--- We can move rows to a foreign partition that has been updated already,
--- but can't move rows to a foreign partition that hasn't been updated yet
-
-delete from utrtest;
-insert into utrtest values (1, 'foo');
-insert into utrtest values (2, 'qux');
-
--- Test the former case:
--- with a direct modification plan
-explain (verbose, costs off)
-update utrtest set a = 1 returning *;
-update utrtest set a = 1 returning *;
-
-delete from utrtest;
-insert into utrtest values (1, 'foo');
-insert into utrtest values (2, 'qux');
-
--- with a non-direct modification plan
-explain (verbose, costs off)
-update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
-update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
-
--- Change the definition of utrtest so that the foreign partition get updated
--- after the local partition
-delete from utrtest;
-alter table utrtest detach partition remp;
-drop foreign table remp;
-alter table loct drop constraint loct_a_check;
-alter table loct add check (a in (3));
-create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
-alter table utrtest attach partition remp for values in (3);
-insert into utrtest values (2, 'qux');
-insert into utrtest values (3, 'xyzzy');
-
--- Test the latter case:
--- with a direct modification plan
-explain (verbose, costs off)
-update utrtest set a = 3 returning *;
-update utrtest set a = 3 returning *; -- ERROR
-
--- with a non-direct modification plan
-explain (verbose, costs off)
-update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
-update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *; -- ERROR
-
 drop table utrtest;
 drop table loct;

@@ -2923,7 +2869,7 @@ CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
 INSERT INTO batch_cp_upd_test VALUES (1), (2);

 -- The following moves a row from the local partition to the foreign one
-UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a AND t.a = 2;
 SELECT tableoid::regclass, * FROM batch_cp_upd_test;

 -- Clean up
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6989957d50..351ad4c4c9 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -445,7 +445,9 @@ AddForeignUpdateTargets(Query *parsetree,
      extra values to be fetched.  Each such entry must be marked
      <structfield>resjunk</structfield> = <literal>true</literal>, and must have a distinct
      <structfield>resname</structfield> that will identify it at execution time.
-     Avoid using names matching <literal>ctid<replaceable>N</replaceable></literal>,
+     Avoid using names matching <literal>resultrelindex</literal>,
+     <literal>ctid</literal>,
+     <literal>ctid<replaceable>N</replaceable></literal>,
      <literal>wholerow</literal>, or
      <literal>wholerow<replaceable>N</replaceable></literal>, as the core system can
      generate junk columns of these names.
@@ -495,8 +497,8 @@ PlanForeignModify(PlannerInfo *root,
      <literal>resultRelation</literal> identifies the target foreign table by its
      range table index.  <literal>subplan_index</literal> identifies which target of
      the <structname>ModifyTable</structname> plan node this is, counting from zero;
-     use this if you want to index into <literal>plan->plans</literal> or other
-     substructure of the <literal>plan</literal> node.
+     use this if you want to index into per-target-relation substructures of the
+     <literal>plan</literal> node.
     </para>

     <para>
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2ed696d429..74dbb709fe 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -666,6 +666,7 @@ CopyFrom(CopyFromState cstate)
     mtstate->ps.plan = NULL;
     mtstate->ps.state = estate;
     mtstate->operation = CMD_INSERT;
+    mtstate->mt_nrels = 1;
     mtstate->resultRelInfo = resultRelInfo;
     mtstate->rootResultRelInfo = resultRelInfo;

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..0b1808d503 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2078,7 +2078,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
     haschildren = planstate->initPlan ||
         outerPlanState(planstate) ||
         innerPlanState(planstate) ||
-        IsA(plan, ModifyTable) ||
         IsA(plan, Append) ||
         IsA(plan, MergeAppend) ||
         IsA(plan, BitmapAnd) ||
@@ -2111,11 +2110,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
     /* special child plans */
     switch (nodeTag(plan))
     {
-        case T_ModifyTable:
-            ExplainMemberNodes(((ModifyTableState *) planstate)->mt_plans,
-                               ((ModifyTableState *) planstate)->mt_nplans,
-                               ancestors, es);
-            break;
         case T_Append:
             ExplainMemberNodes(((AppendState *) planstate)->appendplans,
                                ((AppendState *) planstate)->as_nplans,
@@ -3715,14 +3709,14 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
     }

     /* Should we explicitly label target relations? */
-    labeltargets = (mtstate->mt_nplans > 1 ||
-                    (mtstate->mt_nplans == 1 &&
+    labeltargets = (mtstate->mt_nrels > 1 ||
+                    (mtstate->mt_nrels == 1 &&
                      mtstate->resultRelInfo[0].ri_RangeTableIndex != node->nominalRelation));

     if (labeltargets)
         ExplainOpenGroup("Target Tables", "Target Tables", false, es);

-    for (j = 0; j < mtstate->mt_nplans; j++)
+    for (j = 0; j < mtstate->mt_nrels; j++)
     {
         ResultRelInfo *resultRelInfo = mtstate->resultRelInfo + j;
         FdwRoutine *fdwroutine = resultRelInfo->ri_FdwRoutine;
@@ -3817,10 +3811,10 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
             double        insert_path;
             double        other_path;

-            InstrEndLoop(mtstate->mt_plans[0]->instrument);
+            InstrEndLoop(outerPlanState(mtstate)->instrument);

             /* count the number of source rows */
-            total = mtstate->mt_plans[0]->instrument->ntuples;
+            total = outerPlanState(mtstate)->instrument->ntuples;
             other_path = mtstate->ps.instrument->ntuples2;
             insert_path = total - other_path;

@@ -3836,7 +3830,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
 }

 /*
- * Explain the constituent plans of a ModifyTable, Append, MergeAppend,
+ * Explain the constituent plans of an Append, MergeAppend,
  * BitmapAnd, or BitmapOr node.
  *
  * The ancestors list should already contain the immediate parent of these
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 18b2ac1865..4958452730 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -32,10 +32,14 @@ includes a RETURNING clause, the ModifyTable node delivers the computed
 RETURNING rows as output, otherwise it returns nothing.  Handling INSERT
 is pretty straightforward: the tuples returned from the plan tree below
 ModifyTable are inserted into the correct result relation.  For UPDATE,
-the plan tree returns the computed tuples to be updated, plus a "junk"
-(hidden) CTID column identifying which table row is to be replaced by each
-one.  For DELETE, the plan tree need only deliver a CTID column, and the
-ModifyTable node visits each of those rows and marks the row deleted.
+the plan tree returns the new values of the updated columns, plus "junk"
+(hidden) column(s) identifying which table row is to be updated.  The
+ModifyTable node must fetch that row to extract values for the unchanged
+columns, combine the values into a new row, and apply the update.  (For a
+heap table, the row-identity junk column is a CTID, but other things may
+be used for other table types.)  For DELETE, the plan tree need only deliver
+junk row-identity column(s), and the ModifyTable node visits each of those
+rows and marks the row deleted.

 XXX a great deal more documentation needs to be written here...

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ea1530e032..163242f54e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2416,7 +2416,8 @@ EvalPlanQualInit(EPQState *epqstate, EState *parentestate,
 /*
  * EvalPlanQualSetPlan -- set or change subplan of an EPQState.
  *
- * We need this so that ModifyTable can deal with multiple subplans.
+ * We used to need this so that ModifyTable could deal with multiple subplans.
+ * It could now be refactored out of existence.
  */
 void
 EvalPlanQualSetPlan(EPQState *epqstate, Plan *subplan, List *auxrowmarks)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 619aaffae4..558060e080 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -82,7 +82,7 @@
  *
  * subplan_resultrel_htab
  *        Hash table to store subplan ResultRelInfos by Oid.  This is used to
- *        cache ResultRelInfos from subplans of an UPDATE ModifyTable node;
+ *        cache ResultRelInfos from targets of an UPDATE ModifyTable node;
  *        NULL in other cases.  Some of these may be useful for tuple routing
  *        to save having to build duplicates.
  *
@@ -527,12 +527,12 @@ ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate,
     ctl.entrysize = sizeof(SubplanResultRelHashElem);
     ctl.hcxt = CurrentMemoryContext;

-    htab = hash_create("PartitionTupleRouting table", mtstate->mt_nplans,
+    htab = hash_create("PartitionTupleRouting table", mtstate->mt_nrels,
                        &ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
     proute->subplan_resultrel_htab = htab;

     /* Hash all subplans by their Oid */
-    for (i = 0; i < mtstate->mt_nplans; i++)
+    for (i = 0; i < mtstate->mt_nrels; i++)
     {
         ResultRelInfo *rri = &mtstate->resultRelInfo[i];
         bool        found;
@@ -628,10 +628,10 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
          */
         Assert((node->operation == CMD_INSERT &&
                 list_length(node->withCheckOptionLists) == 1 &&
-                list_length(node->plans) == 1) ||
+                list_length(node->resultRelations) == 1) ||
                (node->operation == CMD_UPDATE &&
                 list_length(node->withCheckOptionLists) ==
-                list_length(node->plans)));
+                list_length(node->resultRelations)));

         /*
          * Use the WCO list of the first plan as a reference to calculate
@@ -687,10 +687,10 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
         /* See the comment above for WCO lists. */
         Assert((node->operation == CMD_INSERT &&
                 list_length(node->returningLists) == 1 &&
-                list_length(node->plans) == 1) ||
+                list_length(node->resultRelations) == 1) ||
                (node->operation == CMD_UPDATE &&
                 list_length(node->returningLists) ==
-                list_length(node->plans)));
+                list_length(node->resultRelations)));

         /*
          * Use the RETURNING list of the first plan as a reference to
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index b9064bfe66..3042d14747 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -19,14 +19,10 @@
  *        ExecReScanModifyTable - rescan the ModifyTable node
  *
  *     NOTES
- *        Each ModifyTable node contains a list of one or more subplans,
- *        much like an Append node.  There is one subplan per result relation.
- *        The key reason for this is that in an inherited UPDATE command, each
- *        result relation could have a different schema (more or different
- *        columns) requiring a different plan tree to produce it.  In an
- *        inherited DELETE, all the subplans should produce the same output
- *        rowtype, but we might still find that different plans are appropriate
- *        for different child relations.
+ *        The ModifyTable node receives input from its outerPlan, which is
+ *        the data to insert for INSERT cases, or the changed columns' new
+ *        values plus row-locating info for UPDATE cases, or just the
+ *        row-locating info for DELETE cases.
  *
  *        If the query specifies RETURNING, then the ModifyTable returns a
  *        RETURNING tuple after completing each row insert, update, or delete.
@@ -372,10 +368,8 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
 /*
  * ExecGetInsertNewTuple
  *        This prepares a "new" tuple ready to be inserted into given result
- *        relation by removing any junk columns of the plan's output tuple.
- *
- * Note: currently, this is really dead code, because INSERT cases don't
- * receive any junk columns so there's never a projection to be done.
+ *        relation, by removing any junk columns of the plan's output tuple
+ *        and (if necessary) coercing the tuple to the right tuple format.
  */
 static TupleTableSlot *
 ExecGetInsertNewTuple(ResultRelInfo *relinfo,
@@ -384,9 +378,29 @@ ExecGetInsertNewTuple(ResultRelInfo *relinfo,
     ProjectionInfo *newProj = relinfo->ri_projectNew;
     ExprContext   *econtext;

+    /*
+     * If there's no projection to be done, just make sure the slot is of the
+     * right type for the target rel.  If the planSlot is the right type we
+     * can use it as-is, else copy the data into ri_newTupleSlot.
+     */
     if (newProj == NULL)
-        return planSlot;
+    {
+        if (relinfo->ri_newTupleSlot->tts_ops != planSlot->tts_ops)
+        {
+            ExecCopySlot(relinfo->ri_newTupleSlot, planSlot);
+            return relinfo->ri_newTupleSlot;
+        }
+        else
+            return planSlot;
+    }

+    /*
+     * Else project; since the projection output slot is ri_newTupleSlot,
+     * this will also fix any slot-type problem.
+     *
+     * Note: currently, this is dead code, because INSERT cases don't receive
+     * any junk columns so there's never a projection to be done.
+     */
     econtext = newProj->pi_exprContext;
     econtext->ecxt_outertuple = planSlot;
     return ExecProject(newProj);
@@ -396,8 +410,10 @@ ExecGetInsertNewTuple(ResultRelInfo *relinfo,
  * ExecGetUpdateNewTuple
  *        This prepares a "new" tuple by combining an UPDATE subplan's output
  *        tuple (which contains values of changed columns) with unchanged
- *        columns taken from the old tuple.  The subplan tuple might also
- *        contain junk columns, which are ignored.
+ *        columns taken from the old tuple.
+ *
+ * The subplan tuple might also contain junk columns, which are ignored.
+ * Note that the projection also ensures we have a slot of the right type.
  */
 TupleTableSlot *
 ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
@@ -407,7 +423,6 @@ ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
     ProjectionInfo *newProj = relinfo->ri_projectNew;
     ExprContext   *econtext;

-    Assert(newProj != NULL);
     Assert(planSlot != NULL && !TTS_EMPTY(planSlot));
     Assert(oldSlot != NULL && !TTS_EMPTY(oldSlot));

@@ -1249,9 +1264,7 @@ static bool
 ExecCrossPartitionUpdate(ModifyTableState *mtstate,
                          ResultRelInfo *resultRelInfo,
                          ItemPointer tupleid, HeapTuple oldtuple,
-                         TupleTableSlot *slot,
-                         TupleTableSlot *oldSlot,
-                         TupleTableSlot *planSlot,
+                         TupleTableSlot *slot, TupleTableSlot *planSlot,
                          EPQState *epqstate, bool canSetTag,
                          TupleTableSlot **retry_slot,
                          TupleTableSlot **inserted_tuple)
@@ -1327,7 +1340,8 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
         else
         {
             /* Fetch the most recent version of old tuple. */
-            ExecClearTuple(oldSlot);
+            TupleTableSlot *oldSlot = resultRelInfo->ri_oldTupleSlot;
+
             if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
                                                tupleid,
                                                SnapshotAny,
@@ -1340,7 +1354,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
     }

     /*
-     * resultRelInfo is one of the per-subplan resultRelInfos.  So we should
+     * resultRelInfo is one of the per-relation resultRelInfos.  So we should
      * convert the tuple into root's tuple descriptor if needed, since
      * ExecInsert() starts the search from root.
      */
@@ -1384,10 +1398,10 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
  *        foreign table triggers; it is NULL when the foreign table has
  *        no relevant triggers.
  *
- *        slot contains the new tuple value to be stored, while oldSlot
- *        contains the old tuple being replaced.  planSlot is the output
- *        of the ModifyTable's subplan; we use it to access values from
- *        other input tables (for RETURNING), row-ID junk columns, etc.
+ *        slot contains the new tuple value to be stored.
+ *        planSlot is the output of the ModifyTable's subplan; we use it
+ *        to access values from other input tables (for RETURNING),
+ *        row-ID junk columns, etc.
  *
  *        Returns RETURNING result if any, otherwise NULL.
  * ----------------------------------------------------------------
@@ -1398,7 +1412,6 @@ ExecUpdate(ModifyTableState *mtstate,
            ItemPointer tupleid,
            HeapTuple oldtuple,
            TupleTableSlot *slot,
-           TupleTableSlot *oldSlot,
            TupleTableSlot *planSlot,
            EPQState *epqstate,
            EState *estate,
@@ -1536,8 +1549,8 @@ lreplace:;
              * the tuple we're trying to move has been concurrently updated.
              */
             retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
-                                              oldtuple, slot, oldSlot,
-                                              planSlot, epqstate, canSetTag,
+                                              oldtuple, slot, planSlot,
+                                              epqstate, canSetTag,
                                               &retry_slot, &inserted_tuple);
             if (retry)
             {
@@ -1616,6 +1629,7 @@ lreplace:;
                 {
                     TupleTableSlot *inputslot;
                     TupleTableSlot *epqslot;
+                    TupleTableSlot *oldSlot;

                     if (IsolationUsesXactSnapshot())
                         ereport(ERROR,
@@ -1650,7 +1664,7 @@ lreplace:;
                                 return NULL;

                             /* Fetch the most recent version of old tuple. */
-                            ExecClearTuple(oldSlot);
+                            oldSlot = resultRelInfo->ri_oldTupleSlot;
                             if (!table_tuple_fetch_row_version(resultRelationDesc,
                                                                tupleid,
                                                                SnapshotAny,
@@ -1953,7 +1967,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
     /* Execute UPDATE with projection */
     *returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
                             resultRelInfo->ri_onConflict->oc_ProjSlot,
-                            existing, planSlot,
+                            planSlot,
                             &mtstate->mt_epqstate, mtstate->ps.state,
                             canSetTag);

@@ -2132,6 +2146,7 @@ ExecModifyTable(PlanState *pstate)
     PlanState  *subplanstate;
     TupleTableSlot *slot;
     TupleTableSlot *planSlot;
+    TupleTableSlot *oldSlot;
     ItemPointer tupleid;
     ItemPointerData tuple_ctid;
     HeapTupleData oldtupdata;
@@ -2173,11 +2188,11 @@ ExecModifyTable(PlanState *pstate)
     }

     /* Preload local variables */
-    resultRelInfo = node->resultRelInfo + node->mt_whichplan;
-    subplanstate = node->mt_plans[node->mt_whichplan];
+    resultRelInfo = node->resultRelInfo;
+    subplanstate = outerPlanState(node);

     /*
-     * Fetch rows from subplan(s), and execute the required table modification
+     * Fetch rows from subplan, and execute the required table modification
      * for each row.
      */
     for (;;)
@@ -2200,29 +2215,28 @@ ExecModifyTable(PlanState *pstate)

         planSlot = ExecProcNode(subplanstate);

+        /* No more tuples to process? */
         if (TupIsNull(planSlot))
-        {
-            /* advance to next subplan if any */
-            node->mt_whichplan++;
-            if (node->mt_whichplan < node->mt_nplans)
-            {
-                resultRelInfo++;
-                subplanstate = node->mt_plans[node->mt_whichplan];
-                EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
-                                    node->mt_arowmarks[node->mt_whichplan]);
-                continue;
-            }
-            else
-                break;
-        }
+            break;

         /*
-         * Ensure input tuple is the right format for the target relation.
+         * When there are multiple result relations, each tuple contains a
+         * junk column that gives the index of the rel from which it came.
+         * Extract it and select the correct result relation.
          */
-        if (node->mt_scans[node->mt_whichplan]->tts_ops != planSlot->tts_ops)
+        if (AttributeNumberIsValid(node->mt_resultIndexAttno))
         {
-            ExecCopySlot(node->mt_scans[node->mt_whichplan], planSlot);
-            planSlot = node->mt_scans[node->mt_whichplan];
+            Datum    datum;
+            bool    isNull;
+            int            resultindex;
+
+            datum = ExecGetJunkAttribute(planSlot, node->mt_resultIndexAttno,
+                                         &isNull);
+            if (isNull)
+                elog(ERROR, "resultrelindex is NULL");
+            resultindex = DatumGetInt32(datum);
+            Assert(resultindex >= 0 && resultindex < node->mt_nrels);
+            resultRelInfo = node->resultRelInfo + resultindex;
         }

         /*
@@ -2333,39 +2347,34 @@ ExecModifyTable(PlanState *pstate)
                                   estate, node->canSetTag);
                 break;
             case CMD_UPDATE:
+                /*
+                 * Make the new tuple by combining plan's output tuple with
+                 * the old tuple being updated.
+                 */
+                oldSlot = resultRelInfo->ri_oldTupleSlot;
+                if (oldtuple != NULL)
                 {
-                    TupleTableSlot *oldSlot = resultRelInfo->ri_oldTupleSlot;
-
-                    /*
-                     * Make the new tuple by combining plan's output tuple
-                     * with the old tuple being updated.
-                     */
-                    ExecClearTuple(oldSlot);
-                    if (oldtuple != NULL)
-                    {
-                        /* Foreign table update, store the wholerow attr. */
-                        ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
-                    }
-                    else
-                    {
-                        /* Fetch the most recent version of old tuple. */
-                        Relation    relation = resultRelInfo->ri_RelationDesc;
-
-                        Assert(tupleid != NULL);
-                        if (!table_tuple_fetch_row_version(relation, tupleid,
-                                                           SnapshotAny,
-                                                           oldSlot))
-                            elog(ERROR, "failed to fetch tuple being updated");
-                    }
-                    slot = ExecGetUpdateNewTuple(resultRelInfo, planSlot,
-                                                 oldSlot);
-
-                    /* Now apply the update. */
-                    slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple,
-                                      slot, oldSlot, planSlot,
-                                      &node->mt_epqstate, estate,
-                                      node->canSetTag);
+                    /* Use the wholerow junk attr as the old tuple. */
+                    ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
                 }
+                else
+                {
+                    /* Fetch the most recent version of old tuple. */
+                    Relation    relation = resultRelInfo->ri_RelationDesc;
+
+                    Assert(tupleid != NULL);
+                    if (!table_tuple_fetch_row_version(relation, tupleid,
+                                                       SnapshotAny,
+                                                       oldSlot))
+                        elog(ERROR, "failed to fetch tuple being updated");
+                }
+                slot = ExecGetUpdateNewTuple(resultRelInfo, planSlot,
+                                             oldSlot);
+
+                /* Now apply the update. */
+                slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+                                  planSlot, &node->mt_epqstate, estate,
+                                  node->canSetTag);
                 break;
             case CMD_DELETE:
                 slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
@@ -2425,12 +2434,12 @@ ModifyTableState *
 ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 {
     ModifyTableState *mtstate;
+    Plan       *subplan = outerPlan(node);
     CmdType        operation = node->operation;
-    int            nplans = list_length(node->plans);
+    int            nrels = list_length(node->resultRelations);
     ResultRelInfo *resultRelInfo;
-    Plan       *subplan;
-    ListCell   *l,
-               *l1;
+    List *arowmarks;
+    ListCell   *l;
     int            i;
     Relation    rel;
     bool        update_tuple_routing_needed = node->partColsUpdated;
@@ -2450,10 +2459,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
     mtstate->canSetTag = node->canSetTag;
     mtstate->mt_done = false;

-    mtstate->mt_plans = (PlanState **) palloc0(sizeof(PlanState *) * nplans);
+    mtstate->mt_nrels = nrels;
     mtstate->resultRelInfo = (ResultRelInfo *)
-        palloc(nplans * sizeof(ResultRelInfo));
-    mtstate->mt_scans = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
+        palloc(nrels * sizeof(ResultRelInfo));

     /*----------
      * Resolve the target relation. This is the same as:
@@ -2482,9 +2490,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
                                linitial_int(node->resultRelations));
     }

-    mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
-    mtstate->mt_nplans = nplans;
-
     /* set up epqstate with dummy subplan data for the moment */
     EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
     mtstate->fireBSTriggers = true;
@@ -2497,23 +2502,17 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
         ExecSetupTransitionCaptureState(mtstate, estate);

     /*
-     * call ExecInitNode on each of the plans to be executed and save the
-     * results into the array "mt_plans".  This is also a convenient place to
-     * verify that the proposed target relations are valid and open their
-     * indexes for insertion of new index entries.
+     * Open all the result relations and initialize the ResultRelInfo structs.
+     * (But root relation was initialized above, if it's part of the array.)
+     * We must do this before initializing the subplan, because direct-modify
+     * FDWs expect their ResultRelInfos to be available.
      */
     resultRelInfo = mtstate->resultRelInfo;
     i = 0;
-    forboth(l, node->resultRelations, l1, node->plans)
+    foreach(l, node->resultRelations)
     {
         Index        resultRelation = lfirst_int(l);

-        subplan = (Plan *) lfirst(l1);
-
-        /*
-         * This opens result relation and fills ResultRelInfo. (root relation
-         * was initialized already.)
-         */
         if (resultRelInfo != mtstate->rootResultRelInfo)
             ExecInitResultRelation(estate, resultRelInfo, resultRelation);

@@ -2526,6 +2525,22 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
          */
         CheckValidResultRel(resultRelInfo, operation);

+        resultRelInfo++;
+        i++;
+    }
+
+    /*
+     * Now we may initialize the subplan.
+     */
+    outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+
+    /*
+     * Do additional per-result-relation initialization.
+     */
+    for (i = 0; i < nrels; i++)
+    {
+        resultRelInfo = &mtstate->resultRelInfo[i];
+
         /*
          * If there are indices on the result relation, open them and save
          * descriptors in the result relation info, so that we can add new
@@ -2551,12 +2566,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
             operation == CMD_UPDATE)
             update_tuple_routing_needed = true;

-        /* Now init the plan for this result rel */
-        mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
-        mtstate->mt_scans[i] =
-            ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
-                                   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
-
         /* Also let FDWs init themselves for foreign-table result rels */
         if (!resultRelInfo->ri_usesFdwDirectModify &&
             resultRelInfo->ri_FdwRoutine != NULL &&
@@ -2588,11 +2597,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
             resultRelInfo->ri_ChildToRootMap =
                 convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
                                        RelationGetDescr(mtstate->rootResultRelInfo->ri_RelationDesc));
-        resultRelInfo++;
-        i++;
     }

-    /* Get the target relation */
+    /* Get the root target relation */
     rel = mtstate->rootResultRelInfo->ri_RelationDesc;

     /*
@@ -2708,8 +2715,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
         TupleDesc    relationDesc;
         TupleDesc    tupDesc;

-        /* insert may only have one plan, inheritance is not expanded */
-        Assert(nplans == 1);
+        /* insert may only have one relation, inheritance is not expanded */
+        Assert(nrels == 1);

         /* already exists if created by RETURNING processing above */
         if (mtstate->ps.ps_ExprContext == NULL)
@@ -2761,34 +2768,24 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
      * EvalPlanQual mechanism needs to be told about them.  Locate the
      * relevant ExecRowMarks.
      */
+    arowmarks = NIL;
     foreach(l, node->rowMarks)
     {
         PlanRowMark *rc = lfirst_node(PlanRowMark, l);
         ExecRowMark *erm;
+        ExecAuxRowMark *aerm;

         /* ignore "parent" rowmarks; they are irrelevant at runtime */
         if (rc->isParent)
             continue;

-        /* find ExecRowMark (same for all subplans) */
+        /* Find ExecRowMark and build ExecAuxRowMark */
         erm = ExecFindRowMark(estate, rc->rti, false);
-
-        /* build ExecAuxRowMark for each subplan */
-        for (i = 0; i < nplans; i++)
-        {
-            ExecAuxRowMark *aerm;
-
-            subplan = mtstate->mt_plans[i]->plan;
-            aerm = ExecBuildAuxRowMark(erm, subplan->targetlist);
-            mtstate->mt_arowmarks[i] = lappend(mtstate->mt_arowmarks[i], aerm);
-        }
+        aerm = ExecBuildAuxRowMark(erm, subplan->targetlist);
+        arowmarks = lappend(arowmarks, aerm);
     }

-    /* select first subplan */
-    mtstate->mt_whichplan = 0;
-    subplan = (Plan *) linitial(node->plans);
-    EvalPlanQualSetPlan(&mtstate->mt_epqstate, subplan,
-                        mtstate->mt_arowmarks[0]);
+    EvalPlanQualSetPlan(&mtstate->mt_epqstate, subplan, arowmarks);

     /*
      * Initialize projection(s) to create tuples suitable for result rel(s).
@@ -2801,15 +2798,14 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
      *
      * If there are multiple result relations, each one needs its own
      * projection.  Note multiple rels are only possible for UPDATE/DELETE, so
-     * we can't be fooled by some needing a filter and some not.
+     * we can't be fooled by some needing a projection and some not.
      *
      * This section of code is also a convenient place to verify that the
      * output of an INSERT or UPDATE matches the target table(s).
      */
-    for (i = 0; i < nplans; i++)
+    for (i = 0; i < nrels; i++)
     {
         resultRelInfo = &mtstate->resultRelInfo[i];
-        subplan = mtstate->mt_plans[i]->plan;

         /*
          * Prepare to generate tuples suitable for the target relation.
@@ -2818,6 +2814,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
         {
             List       *insertTargetList = NIL;
             bool        need_projection = false;
+
             foreach(l, subplan->targetlist)
             {
                 TargetEntry *tle = (TargetEntry *) lfirst(l);
@@ -2827,14 +2824,24 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
                 else
                     need_projection = true;
             }
+
+            /*
+             * The junk-free list must produce a tuple suitable for the result
+             * relation.
+             */
+            ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
+                                insertTargetList);
+
+            /* We'll need a slot matching the table's format. */
+            resultRelInfo->ri_newTupleSlot =
+                table_slot_create(resultRelInfo->ri_RelationDesc,
+                                  &mtstate->ps.state->es_tupleTable);
+
+            /* Build ProjectionInfo if needed (it probably isn't). */
             if (need_projection)
             {
                 TupleDesc    relDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);

-                resultRelInfo->ri_newTupleSlot =
-                    table_slot_create(resultRelInfo->ri_RelationDesc,
-                                      &mtstate->ps.state->es_tupleTable);
-
                 /* need an expression context to do the projection */
                 if (mtstate->ps.ps_ExprContext == NULL)
                     ExecAssignExprContext(estate, &mtstate->ps);
@@ -2846,13 +2853,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
                                             &mtstate->ps,
                                             relDesc);
             }
-
-            /*
-             * The junk-free list must produce a tuple suitable for the result
-             * relation.
-             */
-            ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
-                                insertTargetList);
         }
         else if (operation == CMD_UPDATE)
         {
@@ -2863,7 +2863,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)

             /*
              * For UPDATE, we use the old tuple to fill up missing values in
-             * the tuple produced by the plan to get the new tuple.
+             * the tuple produced by the plan to get the new tuple.  We need
+             * two slots, both matching the table's desired format.
              */
             resultRelInfo->ri_oldTupleSlot =
                 table_slot_create(resultRelInfo->ri_RelationDesc,
@@ -2931,6 +2932,16 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
         }
     }

+    /*
+     * If this is an inherited update/delete, there will be a junk attribute
+     * named "resultrelindex" present in the subplan's targetlist.  It will be
+     * used to identify the result relation for a given tuple to be updated/
+     * deleted.
+     */
+    mtstate->mt_resultIndexAttno =
+        ExecFindJunkAttributeInTlist(subplan->targetlist, "resultrelindex");
+    Assert(AttributeNumberIsValid(mtstate->mt_resultIndexAttno) || nrels == 1);
+
     /*
      * Determine if the FDW supports batch insert and determine the batch
      * size (a FDW may support batching, but it may be disabled for the
@@ -2942,7 +2953,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
     if (operation == CMD_INSERT)
     {
         resultRelInfo = mtstate->resultRelInfo;
-        for (i = 0; i < nplans; i++)
+        for (i = 0; i < nrels; i++)
         {
             if (!resultRelInfo->ri_usesFdwDirectModify &&
                 resultRelInfo->ri_FdwRoutine != NULL &&
@@ -2991,7 +3002,7 @@ ExecEndModifyTable(ModifyTableState *node)
     /*
      * Allow any FDWs to shut down
      */
-    for (i = 0; i < node->mt_nplans; i++)
+    for (i = 0; i < node->mt_nrels; i++)
     {
         ResultRelInfo *resultRelInfo = node->resultRelInfo + i;

@@ -3031,10 +3042,9 @@ ExecEndModifyTable(ModifyTableState *node)
     EvalPlanQualEnd(&node->mt_epqstate);

     /*
-     * shut down subplans
+     * shut down subplan
      */
-    for (i = 0; i < node->mt_nplans; i++)
-        ExecEndNode(node->mt_plans[i]);
+    ExecEndNode(outerPlanState(node));
 }

 void
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1ec586729b..832bfb1095 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -206,7 +206,6 @@ _copyModifyTable(const ModifyTable *from)
     COPY_SCALAR_FIELD(rootRelation);
     COPY_SCALAR_FIELD(partColsUpdated);
     COPY_NODE_FIELD(resultRelations);
-    COPY_NODE_FIELD(plans);
     COPY_NODE_FIELD(updateColnosLists);
     COPY_NODE_FIELD(withCheckOptionLists);
     COPY_NODE_FIELD(returningLists);
@@ -2393,6 +2392,7 @@ _copyAppendRelInfo(const AppendRelInfo *from)
     COPY_SCALAR_FIELD(parent_reltype);
     COPY_SCALAR_FIELD(child_reltype);
     COPY_NODE_FIELD(translated_vars);
+    COPY_NODE_FIELD(translated_fake_vars);
     COPY_SCALAR_FIELD(num_child_cols);
     COPY_POINTER_FIELD(parent_colnos, from->num_child_cols * sizeof(AttrNumber));
     COPY_SCALAR_FIELD(parent_reloid);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 3292dda342..643a8a73e3 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -907,6 +907,7 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
     COMPARE_SCALAR_FIELD(parent_reltype);
     COMPARE_SCALAR_FIELD(child_reltype);
     COMPARE_NODE_FIELD(translated_vars);
+    COMPARE_NODE_FIELD(translated_fake_vars);
     COMPARE_SCALAR_FIELD(num_child_cols);
     COMPARE_POINTER_FIELD(parent_colnos, a->num_child_cols * sizeof(AttrNumber));
     COMPARE_SCALAR_FIELD(parent_reloid);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 38226530c6..5f7b2fae27 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2276,6 +2276,9 @@ expression_tree_walker(Node *node,
                 if (expression_tree_walker((Node *) appinfo->translated_vars,
                                            walker, context))
                     return true;
+                if (expression_tree_walker((Node *) appinfo->translated_fake_vars,
+                                           walker, context))
+                    return true;
             }
             break;
         case T_PlaceHolderInfo:
@@ -3197,6 +3200,7 @@ expression_tree_mutator(Node *node,

                 FLATCOPY(newnode, appinfo, AppendRelInfo);
                 MUTATE(newnode->translated_vars, appinfo->translated_vars, List *);
+                MUTATE(newnode->translated_fake_vars, appinfo->translated_fake_vars, List *);
                 /* Assume nothing need be done with parent_colnos[] */
                 return (Node *) newnode;
             }
@@ -4002,12 +4006,6 @@ planstate_tree_walker(PlanState *planstate,
     /* special child plans */
     switch (nodeTag(plan))
     {
-        case T_ModifyTable:
-            if (planstate_walk_members(((ModifyTableState *) planstate)->mt_plans,
-                                       ((ModifyTableState *) planstate)->mt_nplans,
-                                       walker, context))
-                return true;
-            break;
         case T_Append:
             if (planstate_walk_members(((AppendState *) planstate)->appendplans,
                                        ((AppendState *) planstate)->as_nplans,
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 99fb38c05a..83adf9d82b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -407,7 +407,6 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
     WRITE_UINT_FIELD(rootRelation);
     WRITE_BOOL_FIELD(partColsUpdated);
     WRITE_NODE_FIELD(resultRelations);
-    WRITE_NODE_FIELD(plans);
     WRITE_NODE_FIELD(updateColnosLists);
     WRITE_NODE_FIELD(withCheckOptionLists);
     WRITE_NODE_FIELD(returningLists);
@@ -2136,14 +2135,13 @@ _outModifyTablePath(StringInfo str, const ModifyTablePath *node)

     _outPathInfo(str, (const Path *) node);

+    WRITE_NODE_FIELD(subpath);
     WRITE_ENUM_FIELD(operation, CmdType);
     WRITE_BOOL_FIELD(canSetTag);
     WRITE_UINT_FIELD(nominalRelation);
     WRITE_UINT_FIELD(rootRelation);
     WRITE_BOOL_FIELD(partColsUpdated);
     WRITE_NODE_FIELD(resultRelations);
-    WRITE_NODE_FIELD(subpaths);
-    WRITE_NODE_FIELD(subroots);
     WRITE_NODE_FIELD(updateColnosLists);
     WRITE_NODE_FIELD(withCheckOptionLists);
     WRITE_NODE_FIELD(returningLists);
@@ -2261,6 +2259,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
     WRITE_NODE_FIELD(full_join_clauses);
     WRITE_NODE_FIELD(join_info_list);
     WRITE_NODE_FIELD(append_rel_list);
+    WRITE_NODE_FIELD(inherit_result_rels);
     WRITE_NODE_FIELD(rowMarks);
     WRITE_NODE_FIELD(placeholder_list);
     WRITE_NODE_FIELD(fkey_list);
@@ -2271,6 +2270,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
     WRITE_NODE_FIELD(sort_pathkeys);
     WRITE_NODE_FIELD(processed_tlist);
     WRITE_NODE_FIELD(update_colnos);
+    WRITE_NODE_FIELD(inherit_junk_tlist);
     WRITE_NODE_FIELD(minmax_aggs);
     WRITE_FLOAT_FIELD(total_table_pages, "%.0f");
     WRITE_FLOAT_FIELD(tuple_fraction, "%.4f");
@@ -2286,6 +2286,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
     WRITE_BITMAPSET_FIELD(curOuterRels);
     WRITE_NODE_FIELD(curOuterParams);
     WRITE_BOOL_FIELD(partColsUpdated);
+    WRITE_INT_FIELD(lastResultRelIndex);
 }

 static void
@@ -2568,11 +2569,24 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
     WRITE_OID_FIELD(parent_reltype);
     WRITE_OID_FIELD(child_reltype);
     WRITE_NODE_FIELD(translated_vars);
+    WRITE_NODE_FIELD(translated_fake_vars);
     WRITE_INT_FIELD(num_child_cols);
     WRITE_ATTRNUMBER_ARRAY(parent_colnos, node->num_child_cols);
     WRITE_OID_FIELD(parent_reloid);
 }

+static void
+_outInheritResultRelInfo(StringInfo str, const InheritResultRelInfo *node)
+{
+    WRITE_NODE_TYPE("INHERITRESULTRELINFO");
+
+    WRITE_UINT_FIELD(resultRelation);
+    WRITE_NODE_FIELD(withCheckOptions);
+    WRITE_NODE_FIELD(returningList);
+    WRITE_NODE_FIELD(processed_tlist);
+    WRITE_NODE_FIELD(update_colnos);
+}
+
 static void
 _outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
 {
@@ -4222,6 +4236,9 @@ outNode(StringInfo str, const void *obj)
             case T_AppendRelInfo:
                 _outAppendRelInfo(str, obj);
                 break;
+            case T_InheritResultRelInfo:
+                _outInheritResultRelInfo(str, obj);
+                break;
             case T_PlaceHolderInfo:
                 _outPlaceHolderInfo(str, obj);
                 break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 0b6331d3da..feb64db702 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1407,6 +1407,7 @@ _readAppendRelInfo(void)
     READ_OID_FIELD(parent_reltype);
     READ_OID_FIELD(child_reltype);
     READ_NODE_FIELD(translated_vars);
+    READ_NODE_FIELD(translated_fake_vars);
     READ_INT_FIELD(num_child_cols);
     READ_ATTRNUMBER_ARRAY(parent_colnos, local_node->num_child_cols);
     READ_OID_FIELD(parent_reloid);
@@ -1682,7 +1683,6 @@ _readModifyTable(void)
     READ_UINT_FIELD(rootRelation);
     READ_BOOL_FIELD(partColsUpdated);
     READ_NODE_FIELD(resultRelations);
-    READ_NODE_FIELD(plans);
     READ_NODE_FIELD(updateColnosLists);
     READ_NODE_FIELD(withCheckOptionLists);
     READ_NODE_FIELD(returningLists);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d73ac562eb..de765eb709 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1049,10 +1049,25 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
             adjust_appendrel_attrs(root,
                                    (Node *) rel->joininfo,
                                    1, &appinfo);
-        childrel->reltarget->exprs = (List *)
-            adjust_appendrel_attrs(root,
-                                   (Node *) rel->reltarget->exprs,
-                                   1, &appinfo);
+
+        /*
+         * If the child is a result relation, the executor expects that any
+         * wholerow Vars in the targetlist are of its reltype, not parent's
+         * reltype.  So use adjust_target_appendrel_attrs() to translate the
+         * reltarget expressions, because it does not wrap a translated
+         * wholerow Var with ConcertRowtypeExpr to convert it back to the
+         * parent's reltype.
+         */
+        if (is_result_relation(root, childRTindex))
+            childrel->reltarget->exprs = (List *)
+                adjust_target_appendrel_attrs(root,
+                                              (Node *) rel->reltarget->exprs,
+                                              appinfo);
+        else
+            childrel->reltarget->exprs = (List *)
+                adjust_appendrel_attrs(root,
+                                       (Node *) rel->reltarget->exprs,
+                                       1, &appinfo);

         /*
          * We have to make child entries in the EquivalenceClass data
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index ff536e6b24..8a65de783c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -27,6 +27,7 @@
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
+#include "optimizer/inherit.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -3397,7 +3398,7 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
      * and pass them through to EvalPlanQual via a side channel; but for now,
      * we just don't remove implied quals at all for target relations.
      */
-    is_target_rel = (rel->relid == root->parse->resultRelation ||
+    is_target_rel = (is_result_relation(root, rel->relid) ||
                      get_plan_rowmark(root->rowMarks, rel->relid) != NULL);

     /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 4bb482879f..291636b5cf 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -301,7 +301,7 @@ static ModifyTable *make_modifytable(PlannerInfo *root,
                                      CmdType operation, bool canSetTag,
                                      Index nominalRelation, Index rootRelation,
                                      bool partColsUpdated,
-                                     List *resultRelations, List *subplans, List *subroots,
+                                     List *resultRelations, Plan *subplan,
                                      List *updateColnosLists,
                                      List *withCheckOptionLists, List *returningLists,
                                      List *rowMarks, OnConflictExpr *onconflict, int epqParam);
@@ -2257,7 +2257,6 @@ create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path)
      * create_modifytable_plan).  Fortunately we can't be because there would
      * never be grouping in an UPDATE/DELETE; but let's Assert that.
      */
-    Assert(root->inhTargetKind == INHKIND_NONE);
     Assert(root->grouping_map == NULL);
     root->grouping_map = grouping_map;

@@ -2419,12 +2418,7 @@ create_minmaxagg_plan(PlannerInfo *root, MinMaxAggPath *best_path)
      * with InitPlan output params.  (We can't just do that locally in the
      * MinMaxAgg node, because path nodes above here may have Agg references
      * as well.)  Save the mmaggregates list to tell setrefs.c to do that.
-     *
-     * This doesn't work if we're in an inheritance subtree (see notes in
-     * create_modifytable_plan).  Fortunately we can't be because there would
-     * never be aggregates in an UPDATE/DELETE; but let's Assert that.
      */
-    Assert(root->inhTargetKind == INHKIND_NONE);
     Assert(root->minmax_aggs == NIL);
     root->minmax_aggs = best_path->mmaggregates;

@@ -2641,34 +2635,11 @@ static ModifyTable *
 create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)
 {
     ModifyTable *plan;
-    List       *subplans = NIL;
-    ListCell   *subpaths,
-               *subroots,
-               *lc;
-
-    /* Build the plan for each input path */
-    forboth(subpaths, best_path->subpaths,
-            subroots, best_path->subroots)
-    {
-        Path       *subpath = (Path *) lfirst(subpaths);
-        PlannerInfo *subroot = (PlannerInfo *) lfirst(subroots);
-        Plan       *subplan;
-
-        /*
-         * In an inherited UPDATE/DELETE, reference the per-child modified
-         * subroot while creating Plans from Paths for the child rel.  This is
-         * a kluge, but otherwise it's too hard to ensure that Plan creation
-         * functions (particularly in FDWs) don't depend on the contents of
-         * "root" matching what they saw at Path creation time.  The main
-         * downside is that creation functions for Plans that might appear
-         * below a ModifyTable cannot expect to modify the contents of "root"
-         * and have it "stick" for subsequent processing such as setrefs.c.
-         * That's not great, but it seems better than the alternative.
-         */
-        subplan = create_plan_recurse(subroot, subpath, CP_EXACT_TLIST);
+    Path       *subpath = best_path->subpath;
+    Plan       *subplan;

-        subplans = lappend(subplans, subplan);
-    }
+    /* Build the plan. */
+    subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);

     plan = make_modifytable(root,
                             best_path->operation,
@@ -2677,8 +2648,7 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)
                             best_path->rootRelation,
                             best_path->partColsUpdated,
                             best_path->resultRelations,
-                            subplans,
-                            best_path->subroots,
+                            subplan,
                             best_path->updateColnosLists,
                             best_path->withCheckOptionLists,
                             best_path->returningLists,
@@ -2688,11 +2658,10 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)

     copy_generic_path_info(&plan->plan, &best_path->path);

-    forboth(lc, subplans,
-            subroots, best_path->subroots)
+    if (plan->operation == CMD_UPDATE)
     {
-        Plan       *subplan = (Plan *) lfirst(lc);
-        PlannerInfo *subroot = (PlannerInfo *) lfirst(subroots);
+        ListCell   *l;
+        AttrNumber    resno = 1;

         /*
          * Fix up the resnos of query's TLEs to make them match their ordinal
@@ -2704,25 +2673,19 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)
          * resnos in processed_tlist and resnos in subplan targetlist are
          * exactly same, but maybe we can just remove the assert?
          */
-        if (plan->operation == CMD_UPDATE)
+        foreach(l, root->processed_tlist)
         {
-            ListCell   *l;
-            AttrNumber    resno = 1;
+            TargetEntry *tle = lfirst(l);

-            foreach(l, subroot->processed_tlist)
-            {
-                TargetEntry *tle = lfirst(l);
-
-                tle = flatCopyTargetEntry(tle);
-                tle->resno = resno++;
-                lfirst(l) = tle;
-            }
+            tle = flatCopyTargetEntry(tle);
+            tle->resno = resno++;
+            lfirst(l) = tle;
         }
-
-        /* Transfer resname/resjunk labeling, too, to keep executor happy */
-        apply_tlist_labeling(subplan->targetlist, subroot->processed_tlist);
     }

+    /* Transfer resname/resjunk labeling, too, to keep executor happy */
+    apply_tlist_labeling(subplan->targetlist, root->processed_tlist);
+
     return plan;
 }

@@ -6914,7 +6877,7 @@ make_modifytable(PlannerInfo *root,
                  CmdType operation, bool canSetTag,
                  Index nominalRelation, Index rootRelation,
                  bool partColsUpdated,
-                 List *resultRelations, List *subplans, List *subroots,
+                 List *resultRelations, Plan *subplan,
                  List *updateColnosLists,
                  List *withCheckOptionLists, List *returningLists,
                  List *rowMarks, OnConflictExpr *onconflict, int epqParam)
@@ -6923,11 +6886,8 @@ make_modifytable(PlannerInfo *root,
     List       *fdw_private_list;
     Bitmapset  *direct_modify_plans;
     ListCell   *lc;
-    ListCell   *lc2;
     int            i;

-    Assert(list_length(resultRelations) == list_length(subplans));
-    Assert(list_length(resultRelations) == list_length(subroots));
     Assert(operation == CMD_UPDATE ?
            list_length(resultRelations) == list_length(updateColnosLists) :
            updateColnosLists == NIL);
@@ -6936,7 +6896,7 @@ make_modifytable(PlannerInfo *root,
     Assert(returningLists == NIL ||
            list_length(resultRelations) == list_length(returningLists));

-    node->plan.lefttree = NULL;
+    node->plan.lefttree = subplan;
     node->plan.righttree = NULL;
     node->plan.qual = NIL;
     /* setrefs.c will fill in the targetlist, if needed */
@@ -6948,7 +6908,6 @@ make_modifytable(PlannerInfo *root,
     node->rootRelation = rootRelation;
     node->partColsUpdated = partColsUpdated;
     node->resultRelations = resultRelations;
-    node->plans = subplans;
     if (!onconflict)
     {
         node->onConflictAction = ONCONFLICT_NONE;
@@ -6988,10 +6947,9 @@ make_modifytable(PlannerInfo *root,
     fdw_private_list = NIL;
     direct_modify_plans = NULL;
     i = 0;
-    forboth(lc, resultRelations, lc2, subroots)
+    foreach(lc, resultRelations)
     {
         Index        rti = lfirst_int(lc);
-        PlannerInfo *subroot = lfirst_node(PlannerInfo, lc2);
         FdwRoutine *fdwroutine;
         List       *fdw_private;
         bool        direct_modify;
@@ -7003,16 +6961,16 @@ make_modifytable(PlannerInfo *root,
          * so it's not a baserel; and there are also corner cases for
          * updatable views where the target rel isn't a baserel.)
          */
-        if (rti < subroot->simple_rel_array_size &&
-            subroot->simple_rel_array[rti] != NULL)
+        if (rti < root->simple_rel_array_size &&
+            root->simple_rel_array[rti] != NULL)
         {
-            RelOptInfo *resultRel = subroot->simple_rel_array[rti];
+            RelOptInfo *resultRel = root->simple_rel_array[rti];

             fdwroutine = resultRel->fdwroutine;
         }
         else
         {
-            RangeTblEntry *rte = planner_rt_fetch(rti, subroot);
+            RangeTblEntry *rte = planner_rt_fetch(rti, root);

             Assert(rte->rtekind == RTE_RELATION);
             if (rte->relkind == RELKIND_FOREIGN_TABLE)
@@ -7035,16 +6993,16 @@ make_modifytable(PlannerInfo *root,
             fdwroutine->IterateDirectModify != NULL &&
             fdwroutine->EndDirectModify != NULL &&
             withCheckOptionLists == NIL &&
-            !has_row_triggers(subroot, rti, operation) &&
-            !has_stored_generated_columns(subroot, rti))
-            direct_modify = fdwroutine->PlanDirectModify(subroot, node, rti, i);
+            !has_row_triggers(root, rti, operation) &&
+            !has_stored_generated_columns(root, rti))
+            direct_modify = fdwroutine->PlanDirectModify(root, node, rti, i);
         if (direct_modify)
             direct_modify_plans = bms_add_member(direct_modify_plans, i);

         if (!direct_modify &&
             fdwroutine != NULL &&
             fdwroutine->PlanForeignModify != NULL)
-            fdw_private = fdwroutine->PlanForeignModify(subroot, node, rti, i);
+            fdw_private = fdwroutine->PlanForeignModify(root, node, rti, i);
         else
             fdw_private = NIL;
         fdw_private_list = lappend(fdw_private_list, fdw_private);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ccb9166a8e..2bd7842b45 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -129,9 +129,7 @@ typedef struct
 /* Local functions */
 static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
 static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
-static void inheritance_planner(PlannerInfo *root);
-static void grouping_planner(PlannerInfo *root, bool inheritance_update,
-                             double tuple_fraction);
+static void grouping_planner(PlannerInfo *root, double tuple_fraction);
 static grouping_sets_data *preprocess_grouping_sets(PlannerInfo *root);
 static List *remap_to_groupclause_idx(List *groupClause, List *gsets,
                                       int *tleref_to_colnum_map);
@@ -616,15 +614,16 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
     root->eq_classes = NIL;
     root->ec_merging_done = false;
     root->append_rel_list = NIL;
+    root->inherit_result_rels = NIL;
     root->rowMarks = NIL;
     memset(root->upper_rels, 0, sizeof(root->upper_rels));
     memset(root->upper_targets, 0, sizeof(root->upper_targets));
     root->processed_tlist = NIL;
     root->update_colnos = NIL;
+    root->inherit_junk_tlist = NIL;
     root->grouping_map = NULL;
     root->minmax_aggs = NIL;
     root->qual_security_level = 0;
-    root->inhTargetKind = INHKIND_NONE;
     root->hasPseudoConstantQuals = false;
     root->hasAlternativeSubPlans = false;
     root->hasRecursion = hasRecursion;
@@ -634,6 +633,8 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
         root->wt_param_id = -1;
     root->non_recursive_path = NULL;
     root->partColsUpdated = false;
+    root->inherit_result_rels = NIL;
+    root->lastResultRelIndex = 0;

     /*
      * If there is a WITH list, process each WITH query and either convert it
@@ -832,6 +833,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
     root->append_rel_list = (List *)
         preprocess_expression(root, (Node *) root->append_rel_list,
                               EXPRKIND_APPINFO);
+    /* We assume we don't need to preprocess inherit_result_rels contents */

     /* Also need to preprocess expressions within RTEs */
     foreach(l, parse->rtable)
@@ -999,15 +1001,8 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
     if (hasResultRTEs)
         remove_useless_result_rtes(root);

-    /*
-     * Do the main planning.  If we have an inherited target relation, that
-     * needs special processing, else go straight to grouping_planner.
-     */
-    if (parse->resultRelation &&
-        rt_fetch(parse->resultRelation, parse->rtable)->inh)
-        inheritance_planner(root);
-    else
-        grouping_planner(root, false, tuple_fraction);
+    /* Do the main planning. */
+    grouping_planner(root, tuple_fraction);

     /*
      * Capture the set of outer-level param IDs we have access to, for use in
@@ -1181,631 +1176,6 @@ preprocess_phv_expression(PlannerInfo *root, Expr *expr)
     return (Expr *) preprocess_expression(root, (Node *) expr, EXPRKIND_PHV);
 }

-/*
- * inheritance_planner
- *      Generate Paths in the case where the result relation is an
- *      inheritance set.
- *
- * We have to handle this case differently from cases where a source relation
- * is an inheritance set. Source inheritance is expanded at the bottom of the
- * plan tree (see allpaths.c), but target inheritance has to be expanded at
- * the top.  The reason is that for UPDATE, each target relation needs a
- * different targetlist matching its own column set.  Fortunately,
- * the UPDATE/DELETE target can never be the nullable side of an outer join,
- * so it's OK to generate the plan this way.
- *
- * Returns nothing; the useful output is in the Paths we attach to
- * the (UPPERREL_FINAL, NULL) upperrel stored in *root.
- *
- * Note that we have not done set_cheapest() on the final rel; it's convenient
- * to leave this to the caller.
- */
-static void
-inheritance_planner(PlannerInfo *root)
-{
-    Query       *parse = root->parse;
-    int            top_parentRTindex = parse->resultRelation;
-    List       *select_rtable;
-    List       *select_appinfos;
-    List       *child_appinfos;
-    List       *old_child_rtis;
-    List       *new_child_rtis;
-    Bitmapset  *subqueryRTindexes;
-    Index        next_subquery_rti;
-    int            nominalRelation = -1;
-    Index        rootRelation = 0;
-    List       *final_rtable = NIL;
-    List       *final_rowmarks = NIL;
-    List       *final_appendrels = NIL;
-    int            save_rel_array_size = 0;
-    RelOptInfo **save_rel_array = NULL;
-    AppendRelInfo **save_append_rel_array = NULL;
-    List       *subpaths = NIL;
-    List       *subroots = NIL;
-    List       *resultRelations = NIL;
-    List       *updateColnosLists = NIL;
-    List       *withCheckOptionLists = NIL;
-    List       *returningLists = NIL;
-    List       *rowMarks;
-    RelOptInfo *final_rel;
-    ListCell   *lc;
-    ListCell   *lc2;
-    Index        rti;
-    RangeTblEntry *parent_rte;
-    Bitmapset  *parent_relids;
-    Query      **parent_parses;
-
-    /* Should only get here for UPDATE or DELETE */
-    Assert(parse->commandType == CMD_UPDATE ||
-           parse->commandType == CMD_DELETE);
-
-    /*
-     * We generate a modified instance of the original Query for each target
-     * relation, plan that, and put all the plans into a list that will be
-     * controlled by a single ModifyTable node.  All the instances share the
-     * same rangetable, but each instance must have its own set of subquery
-     * RTEs within the finished rangetable because (1) they are likely to get
-     * scribbled on during planning, and (2) it's not inconceivable that
-     * subqueries could get planned differently in different cases.  We need
-     * not create duplicate copies of other RTE kinds, in particular not the
-     * target relations, because they don't have either of those issues.  Not
-     * having to duplicate the target relations is important because doing so
-     * (1) would result in a rangetable of length O(N^2) for N targets, with
-     * at least O(N^3) work expended here; and (2) would greatly complicate
-     * management of the rowMarks list.
-     *
-     * To begin with, generate a bitmapset of the relids of the subquery RTEs.
-     */
-    subqueryRTindexes = NULL;
-    rti = 1;
-    foreach(lc, parse->rtable)
-    {
-        RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-        if (rte->rtekind == RTE_SUBQUERY)
-            subqueryRTindexes = bms_add_member(subqueryRTindexes, rti);
-        rti++;
-    }
-
-    /*
-     * If the parent RTE is a partitioned table, we should use that as the
-     * nominal target relation, because the RTEs added for partitioned tables
-     * (including the root parent) as child members of the inheritance set do
-     * not appear anywhere else in the plan, so the confusion explained below
-     * for non-partitioning inheritance cases is not possible.
-     */
-    parent_rte = rt_fetch(top_parentRTindex, parse->rtable);
-    Assert(parent_rte->inh);
-    if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
-    {
-        nominalRelation = top_parentRTindex;
-        rootRelation = top_parentRTindex;
-    }
-
-    /*
-     * Before generating the real per-child-relation plans, do a cycle of
-     * planning as though the query were a SELECT.  The objective here is to
-     * find out which child relations need to be processed, using the same
-     * expansion and pruning logic as for a SELECT.  We'll then pull out the
-     * RangeTblEntry-s generated for the child rels, and make use of the
-     * AppendRelInfo entries for them to guide the real planning.  (This is
-     * rather inefficient; we could perhaps stop short of making a full Path
-     * tree.  But this whole function is inefficient and slated for
-     * destruction, so let's not contort query_planner for that.)
-     */
-    {
-        PlannerInfo *subroot;
-
-        /*
-         * Flat-copy the PlannerInfo to prevent modification of the original.
-         */
-        subroot = makeNode(PlannerInfo);
-        memcpy(subroot, root, sizeof(PlannerInfo));
-
-        /*
-         * Make a deep copy of the parsetree for this planning cycle to mess
-         * around with, and change it to look like a SELECT.  (Hack alert: the
-         * target RTE still has updatedCols set if this is an UPDATE, so that
-         * expand_partitioned_rtentry will correctly update
-         * subroot->partColsUpdated.)
-         */
-        subroot->parse = copyObject(root->parse);
-
-        subroot->parse->commandType = CMD_SELECT;
-        subroot->parse->resultRelation = 0;
-
-        /*
-         * Ensure the subroot has its own copy of the original
-         * append_rel_list, since it'll be scribbled on.  (Note that at this
-         * point, the list only contains AppendRelInfos for flattened UNION
-         * ALL subqueries.)
-         */
-        subroot->append_rel_list = copyObject(root->append_rel_list);
-
-        /*
-         * Better make a private copy of the rowMarks, too.
-         */
-        subroot->rowMarks = copyObject(root->rowMarks);
-
-        /* There shouldn't be any OJ info to translate, as yet */
-        Assert(subroot->join_info_list == NIL);
-        /* and we haven't created PlaceHolderInfos, either */
-        Assert(subroot->placeholder_list == NIL);
-
-        /* Generate Path(s) for accessing this result relation */
-        grouping_planner(subroot, true, 0.0 /* retrieve all tuples */ );
-
-        /* Extract the info we need. */
-        select_rtable = subroot->parse->rtable;
-        select_appinfos = subroot->append_rel_list;
-
-        /*
-         * We need to propagate partColsUpdated back, too.  (The later
-         * planning cycles will not set this because they won't run
-         * expand_partitioned_rtentry for the UPDATE target.)
-         */
-        root->partColsUpdated = subroot->partColsUpdated;
-    }
-
-    /*----------
-     * Since only one rangetable can exist in the final plan, we need to make
-     * sure that it contains all the RTEs needed for any child plan.  This is
-     * complicated by the need to use separate subquery RTEs for each child.
-     * We arrange the final rtable as follows:
-     * 1. All original rtable entries (with their original RT indexes).
-     * 2. All the relation RTEs generated for children of the target table.
-     * 3. Subquery RTEs for children after the first.  We need N * (K - 1)
-     *    RT slots for this, if there are N subqueries and K child tables.
-     * 4. Additional RTEs generated during the child planning runs, such as
-     *    children of inheritable RTEs other than the target table.
-     * We assume that each child planning run will create an identical set
-     * of type-4 RTEs.
-     *
-     * So the next thing to do is append the type-2 RTEs (the target table's
-     * children) to the original rtable.  We look through select_appinfos
-     * to find them.
-     *
-     * To identify which AppendRelInfos are relevant as we thumb through
-     * select_appinfos, we need to look for both direct and indirect children
-     * of top_parentRTindex, so we use a bitmap of known parent relids.
-     * expand_inherited_rtentry() always processes a parent before any of that
-     * parent's children, so we should see an intermediate parent before its
-     * children.
-     *----------
-     */
-    child_appinfos = NIL;
-    old_child_rtis = NIL;
-    new_child_rtis = NIL;
-    parent_relids = bms_make_singleton(top_parentRTindex);
-    foreach(lc, select_appinfos)
-    {
-        AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
-        RangeTblEntry *child_rte;
-
-        /* append_rel_list contains all append rels; ignore others */
-        if (!bms_is_member(appinfo->parent_relid, parent_relids))
-            continue;
-
-        /* remember relevant AppendRelInfos for use below */
-        child_appinfos = lappend(child_appinfos, appinfo);
-
-        /* extract RTE for this child rel */
-        child_rte = rt_fetch(appinfo->child_relid, select_rtable);
-
-        /* and append it to the original rtable */
-        parse->rtable = lappend(parse->rtable, child_rte);
-
-        /* remember child's index in the SELECT rtable */
-        old_child_rtis = lappend_int(old_child_rtis, appinfo->child_relid);
-
-        /* and its new index in the final rtable */
-        new_child_rtis = lappend_int(new_child_rtis, list_length(parse->rtable));
-
-        /* if child is itself partitioned, update parent_relids */
-        if (child_rte->inh)
-        {
-            Assert(child_rte->relkind == RELKIND_PARTITIONED_TABLE);
-            parent_relids = bms_add_member(parent_relids, appinfo->child_relid);
-        }
-    }
-
-    /*
-     * It's possible that the RTIs we just assigned for the child rels in the
-     * final rtable are different from what they were in the SELECT query.
-     * Adjust the AppendRelInfos so that they will correctly map RT indexes to
-     * the final indexes.  We can do this left-to-right since no child rel's
-     * final RT index could be greater than what it had in the SELECT query.
-     */
-    forboth(lc, old_child_rtis, lc2, new_child_rtis)
-    {
-        int            old_child_rti = lfirst_int(lc);
-        int            new_child_rti = lfirst_int(lc2);
-
-        if (old_child_rti == new_child_rti)
-            continue;            /* nothing to do */
-
-        Assert(old_child_rti > new_child_rti);
-
-        ChangeVarNodes((Node *) child_appinfos,
-                       old_child_rti, new_child_rti, 0);
-    }
-
-    /*
-     * Now set up rangetable entries for subqueries for additional children
-     * (the first child will just use the original ones).  These all have to
-     * look more or less real, or EXPLAIN will get unhappy; so we just make
-     * them all clones of the original subqueries.
-     */
-    next_subquery_rti = list_length(parse->rtable) + 1;
-    if (subqueryRTindexes != NULL)
-    {
-        int            n_children = list_length(child_appinfos);
-
-        while (n_children-- > 1)
-        {
-            int            oldrti = -1;
-
-            while ((oldrti = bms_next_member(subqueryRTindexes, oldrti)) >= 0)
-            {
-                RangeTblEntry *subqrte;
-
-                subqrte = rt_fetch(oldrti, parse->rtable);
-                parse->rtable = lappend(parse->rtable, copyObject(subqrte));
-            }
-        }
-    }
-
-    /*
-     * The query for each child is obtained by translating the query for its
-     * immediate parent, since the AppendRelInfo data we have shows deltas
-     * between parents and children.  We use the parent_parses array to
-     * remember the appropriate query trees.  This is indexed by parent relid.
-     * Since the maximum number of parents is limited by the number of RTEs in
-     * the SELECT query, we use that number to allocate the array.  An extra
-     * entry is needed since relids start from 1.
-     */
-    parent_parses = (Query **) palloc0((list_length(select_rtable) + 1) *
-                                       sizeof(Query *));
-    parent_parses[top_parentRTindex] = parse;
-
-    /*
-     * And now we can get on with generating a plan for each child table.
-     */
-    foreach(lc, child_appinfos)
-    {
-        AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
-        Index        this_subquery_rti = next_subquery_rti;
-        Query       *parent_parse;
-        PlannerInfo *subroot;
-        RangeTblEntry *child_rte;
-        RelOptInfo *sub_final_rel;
-        Path       *subpath;
-
-        /*
-         * expand_inherited_rtentry() always processes a parent before any of
-         * that parent's children, so the parent query for this relation
-         * should already be available.
-         */
-        parent_parse = parent_parses[appinfo->parent_relid];
-        Assert(parent_parse != NULL);
-
-        /*
-         * We need a working copy of the PlannerInfo so that we can control
-         * propagation of information back to the main copy.
-         */
-        subroot = makeNode(PlannerInfo);
-        memcpy(subroot, root, sizeof(PlannerInfo));
-
-        /*
-         * Generate modified query with this rel as target.  We first apply
-         * adjust_appendrel_attrs, which copies the Query and changes
-         * references to the parent RTE to refer to the current child RTE,
-         * then fool around with subquery RTEs.
-         */
-        subroot->parse = (Query *)
-            adjust_appendrel_attrs(subroot,
-                                   (Node *) parent_parse,
-                                   1, &appinfo);
-
-        /*
-         * If there are securityQuals attached to the parent, move them to the
-         * child rel (they've already been transformed properly for that).
-         */
-        parent_rte = rt_fetch(appinfo->parent_relid, subroot->parse->rtable);
-        child_rte = rt_fetch(appinfo->child_relid, subroot->parse->rtable);
-        child_rte->securityQuals = parent_rte->securityQuals;
-        parent_rte->securityQuals = NIL;
-
-        /*
-         * HACK: setting this to a value other than INHKIND_NONE signals to
-         * relation_excluded_by_constraints() to treat the result relation as
-         * being an appendrel member.
-         */
-        subroot->inhTargetKind =
-            (rootRelation != 0) ? INHKIND_PARTITIONED : INHKIND_INHERITED;
-
-        /*
-         * If this child is further partitioned, remember it as a parent.
-         * Since a partitioned table does not have any data, we don't need to
-         * create a plan for it, and we can stop processing it here.  We do,
-         * however, need to remember its modified PlannerInfo for use when
-         * processing its children, since we'll update their varnos based on
-         * the delta from immediate parent to child, not from top to child.
-         *
-         * Note: a very non-obvious point is that we have not yet added
-         * duplicate subquery RTEs to the subroot's rtable.  We mustn't,
-         * because then its children would have two sets of duplicates,
-         * confusing matters.
-         */
-        if (child_rte->inh)
-        {
-            Assert(child_rte->relkind == RELKIND_PARTITIONED_TABLE);
-            parent_parses[appinfo->child_relid] = subroot->parse;
-            continue;
-        }
-
-        /*
-         * Set the nominal target relation of the ModifyTable node if not
-         * already done.  If the target is a partitioned table, we already set
-         * nominalRelation to refer to the partition root, above.  For
-         * non-partitioned inheritance cases, we'll use the first child
-         * relation (even if it's excluded) as the nominal target relation.
-         * Because of the way expand_inherited_rtentry works, that should be
-         * the RTE representing the parent table in its role as a simple
-         * member of the inheritance set.
-         *
-         * It would be logically cleaner to *always* use the inheritance
-         * parent RTE as the nominal relation; but that RTE is not otherwise
-         * referenced in the plan in the non-partitioned inheritance case.
-         * Instead the duplicate child RTE created by expand_inherited_rtentry
-         * is used elsewhere in the plan, so using the original parent RTE
-         * would give rise to confusing use of multiple aliases in EXPLAIN
-         * output for what the user will think is the "same" table.  OTOH,
-         * it's not a problem in the partitioned inheritance case, because
-         * there is no duplicate RTE for the parent.
-         */
-        if (nominalRelation < 0)
-            nominalRelation = appinfo->child_relid;
-
-        /*
-         * As above, each child plan run needs its own append_rel_list and
-         * rowmarks, which should start out as pristine copies of the
-         * originals.  There can't be any references to UPDATE/DELETE target
-         * rels in them; but there could be subquery references, which we'll
-         * fix up in a moment.
-         */
-        subroot->append_rel_list = copyObject(root->append_rel_list);
-        subroot->rowMarks = copyObject(root->rowMarks);
-
-        /*
-         * If this isn't the first child Query, adjust Vars and jointree
-         * entries to reference the appropriate set of subquery RTEs.
-         */
-        if (final_rtable != NIL && subqueryRTindexes != NULL)
-        {
-            int            oldrti = -1;
-
-            while ((oldrti = bms_next_member(subqueryRTindexes, oldrti)) >= 0)
-            {
-                Index        newrti = next_subquery_rti++;
-
-                ChangeVarNodes((Node *) subroot->parse, oldrti, newrti, 0);
-                ChangeVarNodes((Node *) subroot->append_rel_list,
-                               oldrti, newrti, 0);
-                ChangeVarNodes((Node *) subroot->rowMarks, oldrti, newrti, 0);
-            }
-        }
-
-        /* There shouldn't be any OJ info to translate, as yet */
-        Assert(subroot->join_info_list == NIL);
-        /* and we haven't created PlaceHolderInfos, either */
-        Assert(subroot->placeholder_list == NIL);
-
-        /* Generate Path(s) for accessing this result relation */
-        grouping_planner(subroot, true, 0.0 /* retrieve all tuples */ );
-
-        /*
-         * Select cheapest path in case there's more than one.  We always run
-         * modification queries to conclusion, so we care only for the
-         * cheapest-total path.
-         */
-        sub_final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
-        set_cheapest(sub_final_rel);
-        subpath = sub_final_rel->cheapest_total_path;
-
-        /*
-         * If this child rel was excluded by constraint exclusion, exclude it
-         * from the result plan.
-         */
-        if (IS_DUMMY_REL(sub_final_rel))
-            continue;
-
-        /*
-         * If this is the first non-excluded child, its post-planning rtable
-         * becomes the initial contents of final_rtable; otherwise, copy its
-         * modified subquery RTEs into final_rtable, to ensure we have sane
-         * copies of those.  Also save the first non-excluded child's version
-         * of the rowmarks list; we assume all children will end up with
-         * equivalent versions of that.  Likewise for append_rel_list.
-         */
-        if (final_rtable == NIL)
-        {
-            final_rtable = subroot->parse->rtable;
-            final_rowmarks = subroot->rowMarks;
-            final_appendrels = subroot->append_rel_list;
-        }
-        else
-        {
-            Assert(list_length(final_rtable) ==
-                   list_length(subroot->parse->rtable));
-            if (subqueryRTindexes != NULL)
-            {
-                int            oldrti = -1;
-
-                while ((oldrti = bms_next_member(subqueryRTindexes, oldrti)) >= 0)
-                {
-                    Index        newrti = this_subquery_rti++;
-                    RangeTblEntry *subqrte;
-                    ListCell   *newrticell;
-
-                    subqrte = rt_fetch(newrti, subroot->parse->rtable);
-                    newrticell = list_nth_cell(final_rtable, newrti - 1);
-                    lfirst(newrticell) = subqrte;
-                }
-            }
-        }
-
-        /*
-         * We need to collect all the RelOptInfos from all child plans into
-         * the main PlannerInfo, since setrefs.c will need them.  We use the
-         * last child's simple_rel_array, so we have to propagate forward the
-         * RelOptInfos that were already built in previous children.
-         */
-        Assert(subroot->simple_rel_array_size >= save_rel_array_size);
-        for (rti = 1; rti < save_rel_array_size; rti++)
-        {
-            RelOptInfo *brel = save_rel_array[rti];
-
-            if (brel)
-                subroot->simple_rel_array[rti] = brel;
-        }
-        save_rel_array_size = subroot->simple_rel_array_size;
-        save_rel_array = subroot->simple_rel_array;
-        save_append_rel_array = subroot->append_rel_array;
-
-        /*
-         * Make sure any initplans from this rel get into the outer list. Note
-         * we're effectively assuming all children generate the same
-         * init_plans.
-         */
-        root->init_plans = subroot->init_plans;
-
-        /* Build list of sub-paths */
-        subpaths = lappend(subpaths, subpath);
-
-        /* Build list of modified subroots, too */
-        subroots = lappend(subroots, subroot);
-
-        /* Build list of target-relation RT indexes */
-        resultRelations = lappend_int(resultRelations, appinfo->child_relid);
-
-        /* Accumulate lists of UPDATE target columns */
-        if (parse->commandType == CMD_UPDATE)
-            updateColnosLists = lappend(updateColnosLists,
-                                        subroot->update_colnos);
-
-        /* Build lists of per-relation WCO and RETURNING targetlists */
-        if (parse->withCheckOptions)
-            withCheckOptionLists = lappend(withCheckOptionLists,
-                                           subroot->parse->withCheckOptions);
-        if (parse->returningList)
-            returningLists = lappend(returningLists,
-                                     subroot->parse->returningList);
-
-        Assert(!parse->onConflict);
-    }
-
-    /* Result path must go into outer query's FINAL upperrel */
-    final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
-
-    /*
-     * We don't currently worry about setting final_rel's consider_parallel
-     * flag in this case, nor about allowing FDWs or create_upper_paths_hook
-     * to get control here.
-     */
-
-    if (subpaths == NIL)
-    {
-        /*
-         * We managed to exclude every child rel, so generate a dummy path
-         * representing the empty set.  Although it's clear that no data will
-         * be updated or deleted, we will still need to have a ModifyTable
-         * node so that any statement triggers are executed.  (This could be
-         * cleaner if we fixed nodeModifyTable.c to support zero child nodes,
-         * but that probably wouldn't be a net win.)
-         */
-        Path       *dummy_path;
-
-        /* tlist processing never got done, either */
-        root->processed_tlist = preprocess_targetlist(root);
-        final_rel->reltarget = create_pathtarget(root, root->processed_tlist);
-
-        /* Make a dummy path, cf set_dummy_rel_pathlist() */
-        dummy_path = (Path *) create_append_path(NULL, final_rel, NIL, NIL,
-                                                 NIL, NULL, 0, false,
-                                                 -1);
-
-        /* These lists must be nonempty to make a valid ModifyTable node */
-        subpaths = list_make1(dummy_path);
-        subroots = list_make1(root);
-        resultRelations = list_make1_int(parse->resultRelation);
-        if (parse->commandType == CMD_UPDATE)
-            updateColnosLists = lappend(updateColnosLists,
-                                        root->update_colnos);
-        if (parse->withCheckOptions)
-            withCheckOptionLists = list_make1(parse->withCheckOptions);
-        if (parse->returningList)
-            returningLists = list_make1(parse->returningList);
-        /* Disable tuple routing, too, just to be safe */
-        root->partColsUpdated = false;
-    }
-    else
-    {
-        /*
-         * Put back the final adjusted rtable into the original copy of the
-         * Query.  (We mustn't do this if we found no non-excluded children,
-         * since we never saved an adjusted rtable at all.)
-         */
-        parse->rtable = final_rtable;
-        root->simple_rel_array_size = save_rel_array_size;
-        root->simple_rel_array = save_rel_array;
-        root->append_rel_array = save_append_rel_array;
-
-        /* Must reconstruct original's simple_rte_array, too */
-        root->simple_rte_array = (RangeTblEntry **)
-            palloc0((list_length(final_rtable) + 1) * sizeof(RangeTblEntry *));
-        rti = 1;
-        foreach(lc, final_rtable)
-        {
-            RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-            root->simple_rte_array[rti++] = rte;
-        }
-
-        /* Put back adjusted rowmarks and appendrels, too */
-        root->rowMarks = final_rowmarks;
-        root->append_rel_list = final_appendrels;
-    }
-
-    /*
-     * If there was a FOR [KEY] UPDATE/SHARE clause, the LockRows node will
-     * have dealt with fetching non-locked marked rows, else we need to have
-     * ModifyTable do that.
-     */
-    if (parse->rowMarks)
-        rowMarks = NIL;
-    else
-        rowMarks = root->rowMarks;
-
-    /* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
-    add_path(final_rel, (Path *)
-             create_modifytable_path(root, final_rel,
-                                     parse->commandType,
-                                     parse->canSetTag,
-                                     nominalRelation,
-                                     rootRelation,
-                                     root->partColsUpdated,
-                                     resultRelations,
-                                     subpaths,
-                                     subroots,
-                                     updateColnosLists,
-                                     withCheckOptionLists,
-                                     returningLists,
-                                     rowMarks,
-                                     NULL,
-                                     assign_special_exec_param(root)));
-}
-
 /*--------------------
  * grouping_planner
  *      Perform planning steps related to grouping, aggregation, etc.
@@ -1813,11 +1183,6 @@ inheritance_planner(PlannerInfo *root)
  * This function adds all required top-level processing to the scan/join
  * Path(s) produced by query_planner.
  *
- * If inheritance_update is true, we're being called from inheritance_planner
- * and should not include a ModifyTable step in the resulting Path(s).
- * (inheritance_planner will create a single ModifyTable node covering all the
- * target tables.)
- *
  * tuple_fraction is the fraction of tuples we expect will be retrieved.
  * tuple_fraction is interpreted as follows:
  *      0: expect all tuples to be retrieved (normal case)
@@ -1835,8 +1200,7 @@ inheritance_planner(PlannerInfo *root)
  *--------------------
  */
 static void
-grouping_planner(PlannerInfo *root, bool inheritance_update,
-                 double tuple_fraction)
+grouping_planner(PlannerInfo *root, double tuple_fraction)
 {
     Query       *parse = root->parse;
     int64        offset_est = 0;
@@ -2317,17 +1681,112 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
                                               offset_est, count_est);
         }

-        /*
-         * If this is an INSERT/UPDATE/DELETE, and we're not being called from
-         * inheritance_planner, add the ModifyTable node.
-         */
-        if (parse->commandType != CMD_SELECT && !inheritance_update)
+        /* If this is an INSERT/UPDATE/DELETE, add the ModifyTable node. */
+        if (parse->commandType != CMD_SELECT)
         {
             Index        rootRelation;
-            List *updateColnosLists;
-            List       *withCheckOptionLists;
-            List       *returningLists;
+            List       *resultRelations = NIL;
+            List       *updateColnosLists = NIL;
+            List       *withCheckOptionLists = NIL;
+            List       *returningLists = NIL;
             List       *rowMarks;
+            ListCell *l;
+
+            if (root->inherit_result_rels)
+            {
+                /* Inherited UPDATE/DELETE */
+                foreach(l, root->inherit_result_rels)
+                {
+                    InheritResultRelInfo *resultInfo = lfirst(l);
+                    Index    resultRelation = resultInfo->resultRelation;
+
+                    /* Add only leaf children to ModifyTable. */
+                    if (planner_rt_fetch(resultInfo->resultRelation,
+                                         root)->inh)
+                        continue;
+
+                    /*
+                     * Also exclude any leaf rels that have turned dummy since
+                     * being added to the list, for example, by being excluded
+                     * by constraint exclusion.
+                     */
+                    if (IS_DUMMY_REL(find_base_rel(root, resultRelation)))
+                        continue;
+
+                    resultRelations = lappend_int(resultRelations,
+                                                  resultInfo->resultRelation);
+                    if (parse->commandType == CMD_UPDATE)
+                        updateColnosLists = lappend(updateColnosLists,
+                                                    resultInfo->update_colnos);
+                    if (resultInfo->withCheckOptions)
+                        withCheckOptionLists = lappend(withCheckOptionLists,
+                                                       resultInfo->withCheckOptions);
+                    if (resultInfo->returningList)
+                        returningLists = lappend(returningLists,
+                                                 resultInfo->returningList);
+                }
+
+                /*
+                 * We managed to exclude every child rel, so generate a dummy
+                 * path representing the empty set.  Although it's clear that
+                 * no data will be updated or deleted, we will still need to
+                 * have a ModifyTable node so that any statement triggers are
+                 * executed.  (This could be cleaner if we fixed
+                 * nodeModifyTable.c to support zero target relations, but
+                 * that probably wouldn't be a net win.)
+                 */
+                if (resultRelations == NIL)
+                {
+                    InheritResultRelInfo *resultInfo = linitial(root->inherit_result_rels);
+                    RelOptInfo *rel = find_base_rel(root, resultInfo->resultRelation);
+                    List       *newlist;
+
+                    resultRelations = list_make1_int(resultInfo->resultRelation);
+                    if (parse->commandType == CMD_UPDATE)
+                        updateColnosLists = list_make1(resultInfo->update_colnos);
+                    if (resultInfo->withCheckOptions)
+                        withCheckOptionLists = list_make1(resultInfo->withCheckOptions);
+                    if (resultInfo->returningList)
+                        returningLists = list_make1(resultInfo->returningList);
+
+                    /*
+                     * Must remove special junk attributes from the targetlist
+                     * that were added for child relations, because they are
+                     * no longer necessary and in fact may not even be
+                     * computable using root parent relation.
+                     */
+                    newlist = NIL;
+                    foreach(l, root->processed_tlist)
+                    {
+                        TargetEntry *tle = lfirst(l);
+
+                        if (!list_member(root->inherit_junk_tlist, tle))
+                            newlist = lappend(newlist, tle);
+                    }
+                    root->processed_tlist = newlist;
+                    rel->reltarget = create_pathtarget(root,
+                                                       root->processed_tlist);
+                    /*
+                     * Override the existing path with a dummy Append path,
+                     * because the old path still references the old
+                     * reltarget.
+                     */
+                    path = (Path *) create_append_path(NULL, rel, NIL, NIL,
+                                                       NIL, NULL, 0, false,
+                                                       -1);
+                }
+            }
+            else
+            {
+                /* Single-relation UPDATE/DELETE or INSERT. */
+                resultRelations = list_make1_int(parse->resultRelation);
+                if (parse->commandType == CMD_UPDATE)
+                    updateColnosLists = list_make1(root->update_colnos);
+                if (parse->withCheckOptions)
+                    withCheckOptionLists = list_make1(parse->withCheckOptions);
+                if (parse->returningList)
+                    returningLists = list_make1(parse->returningList);
+            }

             /*
              * If target is a partition root table, we need to mark the
@@ -2339,26 +1798,6 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
             else
                 rootRelation = 0;

-            /* Set up the UPDATE target columns list-of-lists, if needed. */
-            if (parse->commandType == CMD_UPDATE)
-                updateColnosLists = list_make1(root->update_colnos);
-            else
-                updateColnosLists = NIL;
-
-            /*
-             * Set up the WITH CHECK OPTION and RETURNING lists-of-lists, if
-             * needed.
-             */
-            if (parse->withCheckOptions)
-                withCheckOptionLists = list_make1(parse->withCheckOptions);
-            else
-                withCheckOptionLists = NIL;
-
-            if (parse->returningList)
-                returningLists = list_make1(parse->returningList);
-            else
-                returningLists = NIL;
-
             /*
              * If there was a FOR [KEY] UPDATE/SHARE clause, the LockRows node
              * will have dealt with fetching non-locked marked rows, else we
@@ -2371,14 +1810,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,

             path = (Path *)
                 create_modifytable_path(root, final_rel,
+                                        path,
                                         parse->commandType,
                                         parse->canSetTag,
                                         parse->resultRelation,
                                         rootRelation,
-                                        false,
-                                        list_make1_int(parse->resultRelation),
-                                        list_make1(path),
-                                        list_make1(root),
+                                        root->partColsUpdated,
+                                        resultRelations,
                                         updateColnosLists,
                                         withCheckOptionLists,
                                         returningLists,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 42f088ad71..76f07aebbd 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -297,6 +297,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
          * Neither the executor nor EXPLAIN currently need that data.
          */
         appinfo->translated_vars = NIL;
+        appinfo->translated_fake_vars = NIL;

         glob->appendRelations = lappend(glob->appendRelations, appinfo);
     }
@@ -897,26 +898,21 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
                 {
                     List       *newRL = NIL;
                     ListCell   *lcrl,
-                               *lcrr,
-                               *lcp;
+                               *lcrr;

                     /*
-                     * Pass each per-subplan returningList through
+                     * Pass each per-resultrel returningList through
                      * set_returning_clause_references().
                      */
                     Assert(list_length(splan->returningLists) == list_length(splan->resultRelations));
-                    Assert(list_length(splan->returningLists) == list_length(splan->plans));
-                    forthree(lcrl, splan->returningLists,
-                             lcrr, splan->resultRelations,
-                             lcp, splan->plans)
+                    forboth(lcrl, splan->returningLists, lcrr, splan->resultRelations)
                     {
                         List       *rlist = (List *) lfirst(lcrl);
                         Index        resultrel = lfirst_int(lcrr);
-                        Plan       *subplan = (Plan *) lfirst(lcp);

                         rlist = set_returning_clause_references(root,
                                                                 rlist,
-                                                                subplan,
+                                                                outerPlan(splan),
                                                                 resultrel,
                                                                 rtoffset);
                         newRL = lappend(newRL, rlist);
@@ -982,12 +978,6 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
                     rc->rti += rtoffset;
                     rc->prti += rtoffset;
                 }
-                foreach(l, splan->plans)
-                {
-                    lfirst(l) = set_plan_refs(root,
-                                              (Plan *) lfirst(l),
-                                              rtoffset);
-                }

                 /*
                  * Append this ModifyTable node's final result relation RT
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index f3e46e0959..b12ab7de2d 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2533,7 +2533,6 @@ finalize_plan(PlannerInfo *root, Plan *plan,
         case T_ModifyTable:
             {
                 ModifyTable *mtplan = (ModifyTable *) plan;
-                ListCell   *l;

                 /* Force descendant scan nodes to reference epqParam */
                 locally_added_param = mtplan->epqParam;
@@ -2548,16 +2547,6 @@ finalize_plan(PlannerInfo *root, Plan *plan,
                 finalize_primnode((Node *) mtplan->onConflictWhere,
                                   &context);
                 /* exclRelTlist contains only Vars, doesn't need examination */
-                foreach(l, mtplan->plans)
-                {
-                    context.paramids =
-                        bms_add_members(context.paramids,
-                                        finalize_plan(root,
-                                                      (Plan *) lfirst(l),
-                                                      gather_param,
-                                                      valid_params,
-                                                      scan_params));
-                }
             }
             break;

diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index e18553ac7c..0cf5f6d0d6 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -921,15 +921,16 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
     subroot->eq_classes = NIL;
     subroot->ec_merging_done = false;
     subroot->append_rel_list = NIL;
+    subroot->inherit_result_rels = NIL;
     subroot->rowMarks = NIL;
     memset(subroot->upper_rels, 0, sizeof(subroot->upper_rels));
     memset(subroot->upper_targets, 0, sizeof(subroot->upper_targets));
     subroot->processed_tlist = NIL;
     subroot->update_colnos = NIL;
+    subroot->inherit_junk_tlist = NIL;
     subroot->grouping_map = NULL;
     subroot->minmax_aggs = NIL;
     subroot->qual_security_level = 0;
-    subroot->inhTargetKind = INHKIND_NONE;
     subroot->hasRecursion = false;
     subroot->wt_param_id = -1;
     subroot->non_recursive_path = NULL;
@@ -1014,6 +1015,7 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
     rtoffset = list_length(parse->rtable);
     OffsetVarNodes((Node *) subquery, rtoffset, 0);
     OffsetVarNodes((Node *) subroot->append_rel_list, rtoffset, 0);
+    Assert(subroot->inherit_result_rels == NIL);

     /*
      * Upper-level vars in subquery are now one level closer to their parent
@@ -2057,6 +2059,8 @@ perform_pullup_replace_vars(PlannerInfo *root,
      * parent appendrel --- there isn't any outer join between.  Elsewhere,
      * use PHVs for safety.  (This analysis could be made tighter but it seems
      * unlikely to be worth much trouble.)
+     *
+     * XXX what of translated_fake_vars?
      */
     foreach(lc, root->append_rel_list)
     {
@@ -3514,6 +3518,8 @@ fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
         /* Also fix up any PHVs in its translated vars */
         substitute_phv_relids((Node *) appinfo->translated_vars,
                               varno, subrelids);
+
+        /* XXX what of translated_fake_vars? */
     }
 }

diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 488e8cfd4d..91f2d54da5 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -111,6 +111,10 @@ preprocess_targetlist(PlannerInfo *root)
      * to identify the rows to be updated or deleted.  Note that this step
      * scribbles on parse->targetList, which is not very desirable, but we
      * keep it that way to avoid changing APIs used by FDWs.
+     *
+     * If target relation has inheritance children, junk column(s) needed
+     * by the individual leaf child relations are added by
+     * inherit.c: add_child_junk_attrs().
      */
     if (command_type == CMD_UPDATE || command_type == CMD_DELETE)
         rewriteTargetListUD(parse, target_rte, target_relation);
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 86922a273c..fb7cf149b1 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -29,6 +29,7 @@ typedef struct
     PlannerInfo *root;
     int            nappinfos;
     AppendRelInfo **appinfos;
+    bool        need_parent_wholerow;
 } adjust_appendrel_attrs_context;

 static void make_inh_translation_list(Relation oldrelation,
@@ -37,8 +38,6 @@ static void make_inh_translation_list(Relation oldrelation,
                                       AppendRelInfo *appinfo);
 static Node *adjust_appendrel_attrs_mutator(Node *node,
                                             adjust_appendrel_attrs_context *context);
-static List *adjust_inherited_tlist(List *tlist,
-                                    AppendRelInfo *context);


 /*
@@ -200,42 +199,42 @@ adjust_appendrel_attrs(PlannerInfo *root, Node *node, int nappinfos,
     context.root = root;
     context.nappinfos = nappinfos;
     context.appinfos = appinfos;
+    context.need_parent_wholerow = true;

     /* If there's nothing to adjust, don't call this function. */
     Assert(nappinfos >= 1 && appinfos != NULL);

-    /*
-     * Must be prepared to start with a Query or a bare expression tree.
-     */
-    if (node && IsA(node, Query))
-    {
-        Query       *newnode;
-        int            cnt;
+    /* Should never be translating a Query tree. */
+    Assert (node == NULL || !IsA(node, Query));
+    result = adjust_appendrel_attrs_mutator(node, &context);

-        newnode = query_tree_mutator((Query *) node,
-                                     adjust_appendrel_attrs_mutator,
-                                     (void *) &context,
-                                     QTW_IGNORE_RC_SUBQUERIES);
-        for (cnt = 0; cnt < nappinfos; cnt++)
-        {
-            AppendRelInfo *appinfo = appinfos[cnt];
+    return result;
+}

-            if (newnode->resultRelation == appinfo->parent_relid)
-            {
-                newnode->resultRelation = appinfo->child_relid;
-                /* Fix tlist resnos too, if it's inherited UPDATE */
-                if (newnode->commandType == CMD_UPDATE)
-                    newnode->targetList =
-                        adjust_inherited_tlist(newnode->targetList,
-                                               appinfo);
-                break;
-            }
-        }
+/*
+ * adjust_target_appendrel_attrs
+ *        like adjust_appendrel_attrs, but treats wholerow Vars a bit
+ *        differently in that it doesn't convert any child table
+ *        wholerows contained in 'node' back to the parent reltype.
+ */
+Node *
+adjust_target_appendrel_attrs(PlannerInfo *root, Node *node,
+                              AppendRelInfo *appinfo)
+{
+    Node       *result;
+    adjust_appendrel_attrs_context context;

-        result = (Node *) newnode;
-    }
-    else
-        result = adjust_appendrel_attrs_mutator(node, &context);
+    context.root = root;
+    context.nappinfos = 1;
+    context.appinfos = &appinfo;
+    context.need_parent_wholerow = false;
+
+    /* If there's nothing to adjust, don't call this function. */
+    Assert(appinfo != NULL);
+
+    /* Should never be translating a Query tree. */
+    Assert (node == NULL || !IsA(node, Query));
+    result = adjust_appendrel_attrs_mutator(node, &context);

     return result;
 }
@@ -277,11 +276,16 @@ adjust_appendrel_attrs_mutator(Node *node,
             {
                 Node       *newnode;

+                /*
+                 * If this Var appears to have a unusual attno assigned, it
+                 * must be one of the "fake" vars added to a parent target
+                 * relation's reltarget; see add_inherit_junk_var().
+                 */
                 if (var->varattno > list_length(appinfo->translated_vars))
-                    elog(ERROR, "attribute %d of relation \"%s\" does not exist",
-                         var->varattno, get_rel_name(appinfo->parent_reloid));
-                newnode = copyObject(list_nth(appinfo->translated_vars,
-                                              var->varattno - 1));
+                    newnode = translate_fake_parent_var(var, appinfo);
+                else
+                    newnode = copyObject(list_nth(appinfo->translated_vars,
+                                                  var->varattno - 1));
                 if (newnode == NULL)
                     elog(ERROR, "attribute %d of relation \"%s\" does not exist",
                          var->varattno, get_rel_name(appinfo->parent_reloid));
@@ -298,7 +302,10 @@ adjust_appendrel_attrs_mutator(Node *node,
                 if (OidIsValid(appinfo->child_reltype))
                 {
                     Assert(var->vartype == appinfo->parent_reltype);
-                    if (appinfo->parent_reltype != appinfo->child_reltype)
+                    /* Make sure the Var node has the right type ID, too */
+                    var->vartype = appinfo->child_reltype;
+                    if (appinfo->parent_reltype != appinfo->child_reltype &&
+                        context->need_parent_wholerow)
                     {
                         ConvertRowtypeExpr *r = makeNode(ConvertRowtypeExpr);

@@ -306,8 +313,6 @@ adjust_appendrel_attrs_mutator(Node *node,
                         r->resulttype = appinfo->parent_reltype;
                         r->convertformat = COERCE_IMPLICIT_CAST;
                         r->location = -1;
-                        /* Make sure the Var node has the right type ID, too */
-                        var->vartype = appinfo->child_reltype;
                         return (Node *) r;
                     }
                 }
@@ -361,44 +366,6 @@ adjust_appendrel_attrs_mutator(Node *node,
         }
         return (Node *) cexpr;
     }
-    if (IsA(node, RangeTblRef))
-    {
-        RangeTblRef *rtr = (RangeTblRef *) copyObject(node);
-
-        for (cnt = 0; cnt < nappinfos; cnt++)
-        {
-            AppendRelInfo *appinfo = appinfos[cnt];
-
-            if (rtr->rtindex == appinfo->parent_relid)
-            {
-                rtr->rtindex = appinfo->child_relid;
-                break;
-            }
-        }
-        return (Node *) rtr;
-    }
-    if (IsA(node, JoinExpr))
-    {
-        /* Copy the JoinExpr node with correct mutation of subnodes */
-        JoinExpr   *j;
-        AppendRelInfo *appinfo;
-
-        j = (JoinExpr *) expression_tree_mutator(node,
-                                                 adjust_appendrel_attrs_mutator,
-                                                 (void *) context);
-        /* now fix JoinExpr's rtindex (probably never happens) */
-        for (cnt = 0; cnt < nappinfos; cnt++)
-        {
-            appinfo = appinfos[cnt];
-
-            if (j->rtindex == appinfo->parent_relid)
-            {
-                j->rtindex = appinfo->child_relid;
-                break;
-            }
-        }
-        return (Node *) j;
-    }
     if (IsA(node, PlaceHolderVar))
     {
         /* Copy the PlaceHolderVar node with correct mutation of subnodes */
@@ -487,6 +454,10 @@ adjust_appendrel_attrs_mutator(Node *node,
     Assert(!IsA(node, SubLink));
     Assert(!IsA(node, Query));

+    /* We should never see these Query substructures. */
+    Assert(!IsA(node, RangeTblRef));
+    Assert(!IsA(node, JoinExpr));
+
     return expression_tree_mutator(node, adjust_appendrel_attrs_mutator,
                                    (void *) context);
 }
@@ -620,103 +591,6 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
     return result;
 }

-/*
- * Adjust the targetlist entries of an inherited UPDATE operation
- *
- * The expressions have already been fixed, but we have to make sure that
- * the target resnos match the child table (they may not, in the case of
- * a column that was added after-the-fact by ALTER TABLE).  In some cases
- * this can force us to re-order the tlist to preserve resno ordering.
- * (We do all this work in special cases so that preptlist.c is fast for
- * the typical case.)
- *
- * The given tlist has already been through expression_tree_mutator;
- * therefore the TargetEntry nodes are fresh copies that it's okay to
- * scribble on.
- *
- * Note that this is not needed for INSERT because INSERT isn't inheritable.
- */
-static List *
-adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
-{
-    bool        changed_it = false;
-    ListCell   *tl;
-    List       *new_tlist;
-    bool        more;
-    int            attrno;
-
-    /* This should only happen for an inheritance case, not UNION ALL */
-    Assert(OidIsValid(context->parent_reloid));
-
-    /* Scan tlist and update resnos to match attnums of child rel */
-    foreach(tl, tlist)
-    {
-        TargetEntry *tle = (TargetEntry *) lfirst(tl);
-        Var           *childvar;
-
-        if (tle->resjunk)
-            continue;            /* ignore junk items */
-
-        /* Look up the translation of this column: it must be a Var */
-        if (tle->resno <= 0 ||
-            tle->resno > list_length(context->translated_vars))
-            elog(ERROR, "attribute %d of relation \"%s\" does not exist",
-                 tle->resno, get_rel_name(context->parent_reloid));
-        childvar = (Var *) list_nth(context->translated_vars, tle->resno - 1);
-        if (childvar == NULL || !IsA(childvar, Var))
-            elog(ERROR, "attribute %d of relation \"%s\" does not exist",
-                 tle->resno, get_rel_name(context->parent_reloid));
-
-        if (tle->resno != childvar->varattno)
-        {
-            tle->resno = childvar->varattno;
-            changed_it = true;
-        }
-    }
-
-    /*
-     * If we changed anything, re-sort the tlist by resno, and make sure
-     * resjunk entries have resnos above the last real resno.  The sort
-     * algorithm is a bit stupid, but for such a seldom-taken path, small is
-     * probably better than fast.
-     */
-    if (!changed_it)
-        return tlist;
-
-    new_tlist = NIL;
-    more = true;
-    for (attrno = 1; more; attrno++)
-    {
-        more = false;
-        foreach(tl, tlist)
-        {
-            TargetEntry *tle = (TargetEntry *) lfirst(tl);
-
-            if (tle->resjunk)
-                continue;        /* ignore junk items */
-
-            if (tle->resno == attrno)
-                new_tlist = lappend(new_tlist, tle);
-            else if (tle->resno > attrno)
-                more = true;
-        }
-    }
-
-    foreach(tl, tlist)
-    {
-        TargetEntry *tle = (TargetEntry *) lfirst(tl);
-
-        if (!tle->resjunk)
-            continue;            /* here, ignore non-junk items */
-
-        tle->resno = attrno;
-        new_tlist = lappend(new_tlist, tle);
-        attrno++;
-    }
-
-    return new_tlist;
-}
-
 /*
  * find_appinfos_by_relids
  *         Find AppendRelInfo structures for all relations specified by relids.
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index be1c9ddd96..50d2deb2a0 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -21,6 +21,7 @@
 #include "catalog/pg_type.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
 #include "optimizer/inherit.h"
 #include "optimizer/optimizer.h"
@@ -29,9 +30,12 @@
 #include "optimizer/planner.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "optimizer/tlist.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
 #include "partitioning/partprune.h"
+#include "rewrite/rewriteHandler.h"
+#include "utils/lsyscache.h"
 #include "utils/rel.h"


@@ -49,6 +53,16 @@ static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
                                       List *translated_vars);
 static void expand_appendrel_subquery(PlannerInfo *root, RelOptInfo *rel,
                                       RangeTblEntry *rte, Index rti);
+static void add_inherit_result_relation_info(PlannerInfo *root, Index rti,
+                            Relation relation,
+                            InheritResultRelInfo *parentInfo);
+static void adjust_inherited_tlist(List *tlist, AppendRelInfo *context);
+static void add_child_junk_attrs(PlannerInfo *root,
+                    Index childRTindex, Relation childrelation,
+                    RelOptInfo *rel, Relation relation);
+static void add_inherit_junk_var(PlannerInfo *root, char *attrname, Node *child_expr,
+                       AppendRelInfo *appinfo,
+                       RelOptInfo *parentrelinfo, Relation parentrelation);


 /*
@@ -85,6 +99,8 @@ expand_inherited_rtentry(PlannerInfo *root, RelOptInfo *rel,
     PlanRowMark *oldrc;
     bool        old_isParent = false;
     int            old_allMarkTypes = 0;
+    ListCell   *l;
+    List       *newvars = NIL;

     Assert(rte->inh);            /* else caller error */

@@ -128,6 +144,22 @@ expand_inherited_rtentry(PlannerInfo *root, RelOptInfo *rel,
         old_allMarkTypes = oldrc->allMarkTypes;
     }

+    /*
+     * Make an InheritResultRelInfo for the root parent if it's an
+     * UPDATE/DELETE result relation.
+     */
+    if (rti == root->parse->resultRelation &&
+        root->parse->commandType != CMD_INSERT)
+    {
+        /* Make an array indexable by RT indexes for easy lookup. */
+        root->inherit_result_rel_array = (InheritResultRelInfo **)
+            palloc0(root->simple_rel_array_size *
+                    sizeof(InheritResultRelInfo *));
+
+        add_inherit_result_relation_info(root, root->parse->resultRelation,
+                                         oldrelation, NULL);
+    }
+
     /* Scan the inheritance set and expand it */
     if (oldrelation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
     {
@@ -151,7 +183,6 @@ expand_inherited_rtentry(PlannerInfo *root, RelOptInfo *rel,
          * children, so it's not possible for both cases to apply.)
          */
         List       *inhOIDs;
-        ListCell   *l;

         /* Scan for all members of inheritance set, acquire needed locks */
         inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
@@ -226,7 +257,6 @@ expand_inherited_rtentry(PlannerInfo *root, RelOptInfo *rel,
         Var           *var;
         TargetEntry *tle;
         char        resname[32];
-        List       *newvars = NIL;

         /* The old PlanRowMark should already have necessitated adding TID */
         Assert(old_allMarkTypes & ~(1 << ROW_MARK_COPY));
@@ -265,13 +295,24 @@ expand_inherited_rtentry(PlannerInfo *root, RelOptInfo *rel,
             root->processed_tlist = lappend(root->processed_tlist, tle);
             newvars = lappend(newvars, var);
         }
+    }

-        /*
-         * Add the newly added Vars to parent's reltarget.  We needn't worry
-         * about the children's reltargets, they'll be made later.
-         */
+    /*
+     * Also pull any appendrel parent junk vars added due to child result
+     * relations.
+     */
+    if (rti == root->parse->resultRelation &&
+        list_length(root->inherit_junk_tlist) > 0)
+        newvars = list_concat(newvars,
+                              pull_var_clause((Node *)
+                                              root->inherit_junk_tlist, 0));
+
+    /*
+     * Add the newly added Vars to parent's reltarget.  We needn't worry
+     * about the children's reltargets, they'll be made later.
+     */
+    if (newvars != NIL)
         add_vars_to_targetlist(root, newvars, bms_make_singleton(0), false);
-    }

     table_close(oldrelation, NoLock);
 }
@@ -381,10 +422,23 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,

         /* If this child is itself partitioned, recurse */
         if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+        {
             expand_partitioned_rtentry(root, childrelinfo,
                                        childrte, childRTindex,
                                        childrel, top_parentrc, lockmode);

+            /*
+             * Add junk attributes needed by this child relation or really by
+             * its children.  We must do this after having added all the leaf
+             * children of this relation, because the add_child_junk_attrs()
+             * call below simply propagates their junk attributes that are in
+             * the form of this child relation's vars up to its own parent.
+             */
+            if (is_result_relation(root, childRTindex))
+                add_child_junk_attrs(root, childRTindex, childrel,
+                                     relinfo, parentrel);
+        }
+
         /* Close child relation, but keep locks */
         table_close(childrel, NoLock);
     }
@@ -585,6 +639,27 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,

         root->rowMarks = lappend(root->rowMarks, childrc);
     }
+
+    /*
+     * If this appears to be a child of an UPDATE/DELETE result relation, we
+     * need to remember some additional information.
+     */
+    if (is_result_relation(root, parentRTindex))
+    {
+        InheritResultRelInfo *parentInfo = root->inherit_result_rel_array[parentRTindex];
+        RelOptInfo *parentrelinfo = root->simple_rel_array[parentRTindex];
+
+        add_inherit_result_relation_info(root, childRTindex, childrel,
+                                         parentInfo);
+
+        /*
+         * Add junk attributes needed by this leaf child result relation, if
+         * one.
+         */
+        if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
+            add_child_junk_attrs(root, childRTindex, childrel, parentrelinfo,
+                                 parentrel);
+    }
 }

 /*
@@ -805,3 +880,581 @@ apply_child_basequals(PlannerInfo *root, RelOptInfo *parentrel,

     return true;
 }
+
+/*
+ * add_inherit_result_relation
+ *        Adds information to PlannerInfo about an inherited UPDATE/DELETE
+ *        result relation
+ */
+static void
+add_inherit_result_relation_info(PlannerInfo *root, Index rti,
+                                 Relation relation,
+                                 InheritResultRelInfo *parentInfo)
+{
+    InheritResultRelInfo *resultInfo = makeNode(InheritResultRelInfo);
+
+    if (parentInfo == NULL)
+    {
+        /* Root result relation. */
+        resultInfo->resultRelation = rti;
+        resultInfo->withCheckOptions = root->parse->withCheckOptions;
+        resultInfo->returningList = root->parse->returningList;
+        if (root->parse->commandType == CMD_UPDATE)
+        {
+            resultInfo->processed_tlist = root->processed_tlist;
+            resultInfo->update_colnos = root->update_colnos;
+        }
+    }
+    else
+    {
+        /* Child result relation. */
+        AppendRelInfo *appinfo = root->append_rel_array[rti];
+
+        Assert(appinfo != NULL);
+
+        resultInfo->resultRelation = rti;
+
+        if (parentInfo->withCheckOptions)
+            resultInfo->withCheckOptions = (List *)
+                adjust_appendrel_attrs(root,
+                                       (Node *) parentInfo->withCheckOptions,
+                                       1, &appinfo);
+        if (parentInfo->returningList)
+            resultInfo->returningList = (List *)
+                adjust_appendrel_attrs(root,
+                                       (Node *) parentInfo->returningList,
+                                       1, &appinfo);
+
+        /* Build UPDATE targetlist for this child. */
+        if (root->parse->commandType == CMD_UPDATE)
+        {
+            List*update_colnos = NIL;
+            ListCell *lc;
+
+            /*
+             * First fix up any Vars in the parent's version of the top-level
+             * targetlist.
+             */
+            resultInfo->processed_tlist = (List *)
+                adjust_appendrel_attrs(root,
+                                       (Node *) parentInfo->processed_tlist,
+                                       1, &appinfo);
+
+            /*
+             * adjust_appendrel_attrs() doesn't modify TLE resnos, so we need
+             * to do that here to make processed_tlist's resnos match the
+             * child.  Then we can extract update_colnos.
+             */
+            adjust_inherited_tlist(resultInfo->processed_tlist, appinfo);
+
+            /* XXX this duplicates make_update_colnos() */
+            foreach(lc, resultInfo->processed_tlist)
+            {
+                TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+                if (!tle->resjunk)
+                    update_colnos = lappend_int(update_colnos, tle->resno);
+            }
+            resultInfo->update_colnos = update_colnos;
+        }
+    }
+
+    root->inherit_result_rels = lappend(root->inherit_result_rels, resultInfo);
+    Assert(root->inherit_result_rel_array);
+    Assert(root->inherit_result_rel_array[rti] == NULL);
+    root->inherit_result_rel_array[rti] = resultInfo;
+}
+
+/*
+ * Adjust the targetlist entries of an inherited UPDATE operation
+ *
+ * The input tlist is that of an UPDATE targeting the given parent table.
+ * Expressions of the individual target entries have already been fixed to
+ * convert any parent table Vars in them into child table Vars, but the
+ * target resnos still match the parent attnos, which we fix here to match
+ * the corresponding child table attnos.
+ *
+ * Note the list is modified in-place.
+ */
+static void
+adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
+{
+    ListCell   *tl;
+
+    /* This should only happen for an inheritance case, not UNION ALL */
+    Assert(OidIsValid(context->parent_reloid));
+
+    /* Scan tlist and update resnos to match attnums of child rel */
+    foreach(tl, tlist)
+    {
+        TargetEntry *tle = (TargetEntry *) lfirst(tl);
+        Var           *childvar;
+
+        if (tle->resjunk)
+            continue;            /* ignore junk items */
+
+        /* Look up the translation of this column: it must be a Var */
+        if (tle->resno <= 0 ||
+            tle->resno > list_length(context->translated_vars))
+            elog(ERROR, "attribute %d of relation \"%s\" does not exist",
+                 tle->resno, get_rel_name(context->parent_reloid));
+        childvar = (Var *) list_nth(context->translated_vars, tle->resno - 1);
+        if (childvar == NULL || !IsA(childvar, Var))
+            elog(ERROR, "attribute %d of relation \"%s\" does not exist",
+                 tle->resno, get_rel_name(context->parent_reloid));
+
+        tle->resno = childvar->varattno;
+    }
+}
+
+/*
+ * add_child_junk_attrs
+ *        Adds junk attributes needed by leaf child result relations to
+ *        identify tuples to be updated/deleted, and for each tuple also
+ *        the result relation to perform the operation on
+ *
+ * While preprocess_targetlist() would have added junk attributes needed to
+ * identify rows to be updated/deleted based on whatever the root parent says
+ * they are, not all leaf result relations may be able to use the same junk
+ * attributes.  For example, in the case of leaf result relations that are
+ * foreign tables, junk attributes to use are determined by their FDW's
+ * AddForeignUpdateTargets().
+ *
+ * Even though leaf result relations are scanned at the bottom of the plan
+ * tree, any junk attributes needed must be present in the top-level tlist,
+ * so to add a junk attribute for a given leaf result relation really means
+ * adding corresponding column of the top parent relation to the top-level
+ * targetlist from where it will be propagated back down to the leaf result
+ * relation.  In some cases, a leaf relation's junk attribute may be such that
+ * no column of the root parent can be mapped to it, in which case we must add
+ * "fake" parent columns to the targetlist and set things up to map those
+ * columns' vars to desired junk attribute expressions in the reltargets of
+ * leaf result relation that need them.  This logic of how leaf-level junk
+ * attributes are mapped to top-level level vars and back is present in
+ * add_inherit_junk_var().
+ *
+ * The leaf-level junk attribute that is added to identify the leaf result
+ * relation for each tuple to be updated/deleted is really a Const node
+ * containing an integer value that gives the index of the leaf result
+ * relation in the subquery's list of result relations.  This adds an
+ * entry named "resultrelindex" to the top-level tlist which wraps a fake
+ * parent var that maps back to the Const node for each leaf result
+ * relation.
+ */
+static void
+add_child_junk_attrs(PlannerInfo *root,
+                     Index childRTindex, Relation childrelation,
+                     RelOptInfo *parentrelinfo, Relation parentrelation)
+{
+    AppendRelInfo  *appinfo = root->append_rel_array[childRTindex];
+    ListCell       *lc;
+    List           *child_junk_attrs = NIL;
+
+    /*
+     * For a non-leaf child relation, we simply need to bubble up to its
+     * parent any entries containing its vars that would be added for junk
+     * attributes of its own children.
+     */
+    if (childrelation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+    {
+        foreach(lc, root->inherit_junk_tlist)
+        {
+            TargetEntry *tle = lfirst(lc);
+            Var   *var = castNode(Var, tle->expr);
+
+            if (var->varno == childRTindex)
+                child_junk_attrs = lappend(child_junk_attrs, tle);
+        }
+    }
+    else
+    {
+        /* Leaf child case. */
+        RangeTblEntry  *childrte = root->simple_rte_array[childRTindex];
+        Query            parsetree;
+        Node           *childexpr;
+        TargetEntry       *tle;
+
+        /* The "resultrelindex" column. */
+        childexpr = (Node *) makeConst(INT4OID, -1, InvalidOid, sizeof(int32),
+                                       Int32GetDatum(root->lastResultRelIndex++),
+                                       false, true);
+        /* XXX resno in this is wrong */
+        tle = makeTargetEntry((Expr *) childexpr, 1, pstrdup("resultrelindex"), true);
+        child_junk_attrs = lappend(child_junk_attrs, tle);
+
+        /*
+         * Now call rewriteTargetListUD() to add junk attributes into the
+         * parsetree.  We pass a slightly altered version of the original
+         * parsetree to show the child result relation as the main target
+         * relation.  It is assumed here that rewriteTargetListUD and any
+         * code downstream to it do not inspect the parsetree beside to
+         * figure out the varno to assign to the Vars that will be added
+         * to the targetlist.
+         *
+         * XXX instead of this, should refactor rewriteTargetListUD to pull
+         * out whatever behavior is actually needed
+         */
+        memcpy(&parsetree, root->parse, sizeof(Query));
+        parsetree.resultRelation = childRTindex;
+        parsetree.targetList = NIL;
+        rewriteTargetListUD(&parsetree, childrte, childrelation);
+        child_junk_attrs = list_concat(child_junk_attrs,
+                                       parsetree.targetList);
+    }
+
+    /* Add parent vars for each of the child junk attributes. */
+    foreach(lc, child_junk_attrs)
+    {
+        TargetEntry *tle = lfirst(lc);
+
+        Assert(tle->resjunk);
+
+        add_inherit_junk_var(root, tle->resname, (Node *) tle->expr,
+                             appinfo, parentrelinfo, parentrelation);
+    }
+}
+
+/*
+ * add_inherit_junk_var
+ *        Checks if the query's top-level tlist (root->processed_tlist) or
+ *        root->inherit_junk_tlist contains an entry for a junk attribute
+ *        with given name and if the parent var therein translates to
+ *        given child junk expression
+ *
+ * If not, add the parent var to appropriate list -- top-level tlist if parent
+ * is top-level parent, root->inherit_junk_tlist otherwise.
+ *
+ * If the parent var found or added is not for a real column or is a "fake"
+ * var, which will be the case if no real column of the parent translates to
+ * provided child expression, then add mapping information in provided
+ * AppendRelInfo to translate such fake parent var to provided child
+ * expression.
+ */
+static void
+add_inherit_junk_var(PlannerInfo *root, char *attrname, Node *child_expr,
+                     AppendRelInfo *appinfo,
+                     RelOptInfo *parentrelinfo, Relation parentrelation)
+{
+    AttrNumber    max_parent_attno = RelationGetNumberOfAttributes(parentrelation);
+    AttrNumber    max_child_attno = appinfo->num_child_cols;
+    ListCell   *lc;
+    Var           *parent_var = NULL;
+    Index        parent_varno = parentrelinfo->relid;
+    AttrNumber    parent_attno;
+
+    Assert(appinfo && parent_varno == appinfo->parent_relid);
+
+    /*
+     * The way we decide if a given parent var found in the targetlist is the
+     * one that will give the desired child var back upon translation is to
+     * check whether the child var refers to an inherited user column or a
+     * system column that is same as the one that the parent var refers to.
+     * If the child var refers to a fake column, parent var must likewise
+     * refer to a fake column itself.
+     *
+     * There is a special case where the desired child expression is a Const
+     * node wrapped in an entry named "resultrelindex", in which case, simply
+     * finding an entry with that name containing a parent's var suffices.
+     *
+     * If no such parent var is found, we will add one.
+     */
+    foreach(lc, list_concat_copy(root->inherit_junk_tlist,
+                                 root->processed_tlist))
+    {
+        TargetEntry *tle = lfirst(lc);
+        Var       *var = (Var *) tle->expr;
+        Var       *child_var = (Var *) child_expr;
+
+        if (!tle->resjunk)
+            continue;
+
+        /* Ignore RETURNING expressions in the top-level tlist. */
+        if (tle->resname == NULL)
+            continue;
+
+        if (strcmp(attrname, tle->resname) != 0)
+            continue;
+
+        if (!IsA(var,  Var))
+            elog(ERROR, "junk column \"%s\" is not a Var", attrname);
+
+        /* Ignore junk vars of other relations. */
+        if (var->varno != parent_varno)
+            continue;
+
+        /* special case */
+        if (strcmp(attrname, "resultrelindex") == 0)
+        {
+            /* The parent var had better not be a normal user column. */
+            Assert(var->varattno > max_parent_attno);
+            parent_var = var;
+            break;
+        }
+
+        if (!IsA(child_expr, Var))
+            elog(ERROR, "junk column \"%s\" of child relation %u is not a Var",
+                 attrname, appinfo->child_relid);
+
+        /*
+         * So we found parent var referring to the column that the child wants
+         * added, but check if that's really the case.
+         */
+        if (var->vartype == child_var->vartype &&
+            var->vartypmod == child_var->vartypmod &&
+
+            (/* child var refers to same system column as parent var */
+             (child_var->varattno <= 0 &&
+              child_var->varattno == var->varattno) ||
+
+             /* child var refers to same user column as parent var */
+             (child_var->varattno > 0 &&
+              child_var->varattno <= max_child_attno &&
+              var->varattno == appinfo->parent_colnos[child_var->varattno]) ||
+
+             /* both child var and parent var refer to "fake" column */
+             (child_var->varattno > max_child_attno &&
+              var->varattno > max_parent_attno)))
+        {
+            parent_var = var;
+            break;
+        }
+
+        /*
+         * Getting here means that did find a parent column with the given
+         * name but it's not equivalent to the child column we're trying
+         * to add to the targetlist.  Adding a second var with child's type
+         * would not be correct.
+         */
+        elog(ERROR, "junk column \"%s\" of child relation %u conflicts with parent junk column with same name",
+             attrname, appinfo->child_relid);
+    }
+
+    /*
+     * If no parent column matching the child column found in the targetlist,
+     * add.
+     */
+    if (parent_var == NULL)
+    {
+        TargetEntry *tle;
+        bool        fake_column = true;
+        AttrNumber    resno;
+        Oid            parent_vartype = exprType((Node *) child_expr);
+        int32        parent_vartypmod = exprTypmod((Node *) child_expr);
+        Oid            parent_varcollid = exprCollation((Node *) child_expr);
+
+        /*
+         * If the child expression is either an inherited user column, or
+         * wholerow, or ctid, it can be mapped to a parent var.  If the child
+         * expression does not refer to a column, or a column that parent does
+         * not contain, then we will need to make a "fake" parent column to
+         * stand for the child expression.  We will set things up below using
+         * the child's AppendRelInfo such that when translated, the fake parent
+         * column becomes the child expression.  Note that these fake columns
+         * don't leave the planner, because the parent's reltarget is never
+         * actually computed during execution (see set_dummy_tlist_references()
+         * and how it applies to Append and similar plan nodes).
+         */
+        if (IsA(child_expr, Var))
+        {
+            Var   *child_var = (Var *) child_expr;
+
+            if (child_var->varattno > 0 &&
+                child_var->varattno <= appinfo->num_child_cols &&
+                appinfo->parent_colnos[child_var->varattno] > 0)
+            {
+                /* A user-defined parent column. */
+                parent_attno = appinfo->parent_colnos[child_var->varattno];
+                fake_column = false;
+            }
+            else if (child_var->varattno == 0)
+            {
+                /* wholerow */
+                parent_attno = 0;
+                parent_vartype = parentrelation->rd_rel->reltype;
+                fake_column = false;
+            }
+            else if (child_var->varattno == SelfItemPointerAttributeNumber)
+            {
+                /* ctid */
+                parent_attno = SelfItemPointerAttributeNumber;
+                fake_column = false;
+            }
+        }
+
+        /*
+         * A fake parent column is represented by a Var with fake varattno.
+         * We use attribute numbers starting from parent's max_attr + 1.
+         */
+        if (fake_column)
+        {
+            int        array_size;
+
+            parent_attno = parentrelinfo->max_attr + 1;
+
+            /* Must expand attr_needed array for the new fake Var. */
+            array_size = parentrelinfo->max_attr - parentrelinfo->min_attr + 1;
+            parentrelinfo->attr_needed = (Relids *)
+                    repalloc(parentrelinfo->attr_needed,
+                             (array_size + 1) * sizeof(Relids));
+            parentrelinfo->attr_widths = (int32 *)
+                    repalloc(parentrelinfo->attr_widths,
+                             (array_size + 1) * sizeof(int32));
+            parentrelinfo->attr_needed[array_size] = NULL;
+            parentrelinfo->attr_widths[array_size] = 0;
+            parentrelinfo->max_attr += 1;
+        }
+
+        parent_var = makeVar(parent_varno, parent_attno, parent_vartype,
+                             parent_vartypmod, parent_varcollid, 0);
+
+        /*
+         * Only the top-level parent's vars will make it into the top-level
+         * tlist, so choose resno likewise.  Other TLEs containing vars of
+         * intermediate parents only serve as placeholders for remembering
+         * child junk attribute names and expressions so as to avoid re-adding
+         * duplicates as the code at the beginning of this function does, so
+         * their resnos don't need to be correct.
+         */
+        if (parent_varno == root->parse->resultRelation)
+            resno = list_length(root->processed_tlist) + 1;
+        else
+            resno = 1;
+        tle = makeTargetEntry((Expr *) parent_var, resno, attrname, true);
+
+        root->inherit_junk_tlist = lappend(root->inherit_junk_tlist, tle);
+        if (parent_varno == root->parse->resultRelation)
+            root->processed_tlist = lappend(root->processed_tlist, tle);
+    }
+
+    /*
+     * While appinfo->translated_vars contains child column vars mapped from
+     * real parent column vars, we maintain a list of child expressions that
+     * are mapped from fake parent vars in appinfo->translated_fake_vars.
+     */
+    parent_attno = parent_var->varattno;
+    if (parent_attno > max_parent_attno)
+    {
+        int        fake_var_offset = max_parent_attno - parent_attno - 1;
+
+        /*
+         * For parent's fake columns with attribute number smaller than the
+         * current fake attno, we assume that they are not mapped to any
+         * expression of this child, which is indicated by having a NULL in
+         * the map.
+         */
+        if (fake_var_offset > 0)
+        {
+            int        offset;
+
+            Assert(list_length(appinfo->translated_fake_vars) > 0);
+            for (offset = 0; offset < fake_var_offset; offset++)
+            {
+                /*
+                 * Don't accidentally overwrite other expressions of this
+                 * child.
+                 */
+                if (list_nth(appinfo->translated_fake_vars, offset) != NULL)
+                    continue;
+
+                appinfo->translated_fake_vars =
+                    lappend(appinfo->translated_fake_vars, NULL);
+            }
+
+            if (list_nth(appinfo->translated_fake_vars, offset) != NULL)
+                elog(ERROR, "fake attno %u of parent relation %u already mapped",
+                     parent_var->varattno, parent_varno);
+        }
+
+        appinfo->translated_fake_vars = lappend(appinfo->translated_fake_vars,
+                                                child_expr);
+    }
+}
+
+/*
+ * translate_fake_parent_var
+ *         For a "fake" parent var, return corresponding child expression in
+ *         appinfo->translated_fake_vars if one has been added, NULL const node
+ *         otherwise
+ */
+Node *
+translate_fake_parent_var(Var *var, AppendRelInfo *appinfo)
+{
+    int        max_parent_attno = list_length(appinfo->translated_vars);
+    int        offset = var->varattno - max_parent_attno - 1;
+    Node   *result = NULL;
+
+    if (offset < list_length(appinfo->translated_fake_vars))
+        result = (Node *) list_nth(appinfo->translated_fake_vars, offset);
+
+    /*
+     * It's possible for some fake parent vars to map to a valid expression
+     * in only some child relations but not in others.  In that case, we
+     * return a NULL const node for those other relations.
+     */
+    if (result == NULL)
+        return (Node *) makeNullConst(var->vartype, var->vartypmod,
+                                      var->varcollid);
+
+    return result;
+}
+
+/*
+ * is_result_relation
+ *         Is passed-in relation a result relation of this query?
+ *
+ * While root->parse->resultRelation gives the query's original target
+ * relation, other target relations resulting from adding inheritance child
+ * relations of the main target relation are tracked elsewhere.  This
+ * function will return true for the RT index of any target relation.
+ */
+bool
+is_result_relation(PlannerInfo *root, Index relid)
+{
+    InheritResultRelInfo *resultInfo;
+
+    if (relid == root->parse->resultRelation)
+        return true;
+
+    /*
+     * There can be only one result relation in a given subquery before
+     * inheritance child relation are added.
+     */
+    if (root->inherit_result_rel_array == NULL)
+        return false;
+
+    resultInfo = root->inherit_result_rel_array[relid];
+    if (resultInfo == NULL)
+        return false;
+
+    return (relid == resultInfo->resultRelation);
+}
+
+/*
+ * get_result_update_info
+ *        Returns update targetlist and column numbers of a given result relation
+ *
+ * Note: Don't call for a relation that is not certainly a result relation!
+ */
+void
+get_result_update_info(PlannerInfo *root, Index result_relation,
+                       List **processed_tlist,
+                       List **update_colnos)
+{
+    Assert(is_result_relation(root, result_relation));
+
+    if (result_relation == root->parse->resultRelation)
+    {
+        *processed_tlist = root->processed_tlist;
+        *update_colnos = root->update_colnos;
+    }
+    else
+    {
+        InheritResultRelInfo *resultInfo;
+
+        Assert(root->inherit_result_rel_array != NULL);
+        resultInfo = root->inherit_result_rel_array[result_relation];
+        Assert(resultInfo != NULL);
+        *processed_tlist = resultInfo->processed_tlist;
+        *update_colnos = resultInfo->update_colnos;
+    }
+}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a97929c13f..6b17cf098c 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3539,6 +3539,7 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
  *      Creates a pathnode that represents performing INSERT/UPDATE/DELETE mods
  *
  * 'rel' is the parent relation associated with the result
+ * 'subpath' is a Path producing source data
  * 'operation' is the operation type
  * 'canSetTag' is true if we set the command tag/es_processed
  * 'nominalRelation' is the parent RT index for use of EXPLAIN
@@ -3546,8 +3547,6 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
  * 'partColsUpdated' is true if any partitioning columns are being updated,
  *        either from the target relation or a descendent partitioned table.
  * 'resultRelations' is an integer list of actual RT indexes of target rel(s)
- * 'subpaths' is a list of Path(s) producing source data (one per rel)
- * 'subroots' is a list of PlannerInfo structs (one per rel)
  * 'updateColnosLists' is a list of UPDATE target column number lists
  *        (one sublist per rel); or NIL if not an UPDATE
  * 'withCheckOptionLists' is a list of WCO lists (one per rel)
@@ -3558,22 +3557,18 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
  */
 ModifyTablePath *
 create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
+                        Path *subpath,
                         CmdType operation, bool canSetTag,
                         Index nominalRelation, Index rootRelation,
                         bool partColsUpdated,
-                        List *resultRelations, List *subpaths,
-                        List *subroots,
+                        List *resultRelations,
                         List *updateColnosLists,
                         List *withCheckOptionLists, List *returningLists,
                         List *rowMarks, OnConflictExpr *onconflict,
                         int epqParam)
 {
     ModifyTablePath *pathnode = makeNode(ModifyTablePath);
-    double        total_size;
-    ListCell   *lc;

-    Assert(list_length(resultRelations) == list_length(subpaths));
-    Assert(list_length(resultRelations) == list_length(subroots));
     Assert(operation == CMD_UPDATE ?
            list_length(resultRelations) == list_length(updateColnosLists) :
            updateColnosLists == NIL);
@@ -3594,7 +3589,7 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
     pathnode->path.pathkeys = NIL;

     /*
-     * Compute cost & rowcount as sum of subpath costs & rowcounts.
+     * Compute cost & rowcount as subpath cost & rowcount (if RETURNING)
      *
      * Currently, we don't charge anything extra for the actual table
      * modification work, nor for the WITH CHECK OPTIONS or RETURNING
@@ -3603,42 +3598,32 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
      * costs to change any higher-level planning choices.  But we might want
      * to make it look better sometime.
      */
-    pathnode->path.startup_cost = 0;
-    pathnode->path.total_cost = 0;
-    pathnode->path.rows = 0;
-    total_size = 0;
-    foreach(lc, subpaths)
+    pathnode->path.startup_cost = subpath->startup_cost;
+    pathnode->path.total_cost = subpath->total_cost;
+    if (returningLists != NIL)
     {
-        Path       *subpath = (Path *) lfirst(lc);
-
-        if (lc == list_head(subpaths))    /* first node? */
-            pathnode->path.startup_cost = subpath->startup_cost;
-        pathnode->path.total_cost += subpath->total_cost;
-        if (returningLists != NIL)
-        {
-            pathnode->path.rows += subpath->rows;
-            total_size += subpath->pathtarget->width * subpath->rows;
-        }
+        pathnode->path.rows = subpath->rows;
+        /*
+         * Set width to match the subpath output.  XXX this is totally wrong:
+         * we should return an average of the RETURNING tlist widths.  But
+         * it's what happened historically, and improving it is a task for
+         * another day.  (Again, it's mostly window dressing.)
+         */
+        pathnode->path.pathtarget->width = subpath->pathtarget->width;
+    }
+    else
+    {
+        pathnode->path.rows = 0;
+        pathnode->path.pathtarget->width = 0;
     }

-    /*
-     * Set width to the average width of the subpath outputs.  XXX this is
-     * totally wrong: we should return an average of the RETURNING tlist
-     * widths.  But it's what happened historically, and improving it is a task
-     * for another day.
-     */
-    if (pathnode->path.rows > 0)
-        total_size /= pathnode->path.rows;
-    pathnode->path.pathtarget->width = rint(total_size);
-
+    pathnode->subpath = subpath;
     pathnode->operation = operation;
     pathnode->canSetTag = canSetTag;
     pathnode->nominalRelation = nominalRelation;
     pathnode->rootRelation = rootRelation;
     pathnode->partColsUpdated = partColsUpdated;
     pathnode->resultRelations = resultRelations;
-    pathnode->subpaths = subpaths;
-    pathnode->subroots = subroots;
     pathnode->updateColnosLists = updateColnosLists;
     pathnode->withCheckOptionLists = withCheckOptionLists;
     pathnode->returningLists = returningLists;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 6c39bf893f..d0fb3b6834 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1453,18 +1453,11 @@ relation_excluded_by_constraints(PlannerInfo *root,

             /*
              * When constraint_exclusion is set to 'partition' we only handle
-             * appendrel members.  Normally, they are RELOPT_OTHER_MEMBER_REL
-             * relations, but we also consider inherited target relations as
-             * appendrel members for the purposes of constraint exclusion
-             * (since, indeed, they were appendrel members earlier in
-             * inheritance_planner).
-             *
-             * In both cases, partition pruning was already applied, so there
-             * is no need to consider the rel's partition constraints here.
+             * appendrel members.  Partition pruning has already been applied,
+             * so there is no need to consider the rel's partition constraints
+             * here.
              */
-            if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL ||
-                (rel->relid == root->parse->resultRelation &&
-                 root->inhTargetKind != INHKIND_NONE))
+            if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
                 break;            /* appendrel member, so process it */
             return false;

@@ -1477,9 +1470,7 @@ relation_excluded_by_constraints(PlannerInfo *root,
              * its partition constraints haven't been considered yet, so
              * include them in the processing here.
              */
-            if (rel->reloptkind == RELOPT_BASEREL &&
-                !(rel->relid == root->parse->resultRelation &&
-                  root->inhTargetKind != INHKIND_NONE))
+            if (rel->reloptkind == RELOPT_BASEREL)
                 include_partition = true;
             break;                /* always try to exclude */
     }
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 345c877aeb..9d5a4d2adb 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -108,6 +108,9 @@ setup_simple_rel_arrays(PlannerInfo *root)
         root->simple_rte_array[rti++] = rte;
     }

+    /* inherit_result_rel_array is not made here */
+    root->inherit_result_rel_array = NULL;
+
     /* append_rel_array is not needed if there are no AppendRelInfos */
     if (root->append_rel_list == NIL)
     {
@@ -183,6 +186,15 @@ expand_planner_arrays(PlannerInfo *root, int add_size)
             palloc0(sizeof(AppendRelInfo *) * new_size);
     }

+    if (root->inherit_result_rel_array)
+    {
+        root->inherit_result_rel_array = (InheritResultRelInfo **)
+            repalloc(root->inherit_result_rel_array,
+                     sizeof(InheritResultRelInfo *) * new_size);
+        MemSet(root->inherit_result_rel_array + root->simple_rel_array_size,
+               0, sizeof(InheritResultRelInfo *) * add_size);
+    }
+
     root->simple_rel_array_size = new_size;
 }

diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index f9175987f8..bb62708f49 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1661,8 +1661,9 @@ rewriteTargetListUD(Query *parsetree, RangeTblEntry *target_rte,
         /*
          * For UPDATE, we need to make the FDW fetch unchanged columns by
          * asking it to fetch a whole-row Var.  That's because the top-level
-         * targetlist only contains entries for changed columns.  (Actually,
-         * we only really need this for UPDATEs that are not pushed to the
+         * targetlist only contains entries for changed columns, but
+         * ExecUpdate will need to build the complete new tuple.  (Actually,
+         * we only really need this in UPDATEs that are not pushed to the
          * remote side, but it's hard to tell if that will be the case at the
          * point when this function is called.)
          *
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index f0de2a25c9..03c22c80c3 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -4572,16 +4572,12 @@ set_deparse_plan(deparse_namespace *dpns, Plan *plan)
      * We special-case Append and MergeAppend to pretend that the first child
      * plan is the OUTER referent; we have to interpret OUTER Vars in their
      * tlists according to one of the children, and the first one is the most
-     * natural choice.  Likewise special-case ModifyTable to pretend that the
-     * first child plan is the OUTER referent; this is to support RETURNING
-     * lists containing references to non-target relations.
+     * natural choice.
      */
     if (IsA(plan, Append))
         dpns->outer_plan = linitial(((Append *) plan)->appendplans);
     else if (IsA(plan, MergeAppend))
         dpns->outer_plan = linitial(((MergeAppend *) plan)->mergeplans);
-    else if (IsA(plan, ModifyTable))
-        dpns->outer_plan = linitial(((ModifyTable *) plan)->plans);
     else
         dpns->outer_plan = outerPlan(plan);

diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 7af6d48525..2b79a87fe4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -666,10 +666,7 @@ typedef struct ExecRowMark
  * Each LockRows and ModifyTable node keeps a list of the rowmarks it needs to
  * deal with.  In addition to a pointer to the related entry in es_rowmarks,
  * this struct carries the column number(s) of the resjunk columns associated
- * with the rowmark (see comments for PlanRowMark for more detail).  In the
- * case of ModifyTable, there has to be a separate ExecAuxRowMark list for
- * each child plan, because the resjunk columns could be at different physical
- * column positions in different subplans.
+ * with the rowmark (see comments for PlanRowMark for more detail).
  */
 typedef struct ExecAuxRowMark
 {
@@ -1071,9 +1068,8 @@ typedef struct PlanState
  * EvalPlanQualSlot), and/or found using the rowmark mechanism (non-locking
  * rowmarks by the EPQ machinery itself, locking ones by the caller).
  *
- * While the plan to be checked may be changed using EvalPlanQualSetPlan() -
- * e.g. so all source plans for a ModifyTable node can be processed - all such
- * plans need to share the same EState.
+ * While the plan to be checked may be changed using EvalPlanQualSetPlan(),
+ * all such plans need to share the same EState.
  */
 typedef struct EPQState
 {
@@ -1167,23 +1163,26 @@ typedef struct ModifyTableState
     CmdType        operation;        /* INSERT, UPDATE, or DELETE */
     bool        canSetTag;        /* do we set the command tag/es_processed? */
     bool        mt_done;        /* are we done? */
-    PlanState **mt_plans;        /* subplans (one per target rel) */
-    int            mt_nplans;        /* number of plans in the array */
-    int            mt_whichplan;    /* which one is being executed (0..n-1) */
-    TupleTableSlot **mt_scans;    /* input tuple corresponding to underlying
-                                 * plans */
-    ResultRelInfo *resultRelInfo;    /* per-subplan target relations */
+    int            mt_nrels;        /* number of entries in resultRelInfo[] */
+    ResultRelInfo *resultRelInfo;    /* info about target relation(s) */

     /*
      * Target relation mentioned in the original statement, used to fire
-     * statement-level triggers and as the root for tuple routing.
+     * statement-level triggers and as the root for tuple routing.  (This
+     * might point to one of the resultRelInfo[] entries, but it can also be a
+     * distinct struct.)
      */
     ResultRelInfo *rootResultRelInfo;

-    List      **mt_arowmarks;    /* per-subplan ExecAuxRowMark lists */
     EPQState    mt_epqstate;    /* for evaluating EvalPlanQual rechecks */
     bool        fireBSTriggers; /* do we need to fire stmt triggers? */

+    /*
+     * For inherited UPDATE and DELETE, resno of the "resultrelindex" junk
+     * attribute in the subplan's targetlist; zero in other cases.
+     */
+    int            mt_resultIndexAttno;
+
     /*
      * Slot for storing tuples in the root partitioned table's rowtype during
      * an UPDATE of a partitioned table.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e22df890ef..fadd413867 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -270,6 +270,7 @@ typedef enum NodeTag
     T_PlaceHolderVar,
     T_SpecialJoinInfo,
     T_AppendRelInfo,
+    T_InheritResultRelInfo,
     T_PlaceHolderInfo,
     T_MinMaxAggInfo,
     T_PlannerParamItem,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index bed9f4da09..da46151e46 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -77,18 +77,6 @@ typedef enum UpperRelationKind
     /* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
 } UpperRelationKind;

-/*
- * This enum identifies which type of relation is being planned through the
- * inheritance planner.  INHKIND_NONE indicates the inheritance planner
- * was not used.
- */
-typedef enum InheritanceKind
-{
-    INHKIND_NONE,
-    INHKIND_INHERITED,
-    INHKIND_PARTITIONED
-} InheritanceKind;
-
 /*----------
  * PlannerGlobal
  *        Global information for planning/optimization
@@ -212,6 +200,14 @@ struct PlannerInfo
      */
     struct AppendRelInfo **append_rel_array;

+    /*
+     * Same length as other "simple" rel arrays and holds pointers to
+     * InheritResultRelInfo for this subquery's result relations indexed by RT
+     * index, or NULL if the rel is not a result relation.  This array is not
+     * allocated unless the query is an inherited UPDATE/DELETE.
+     */
+    struct InheritResultRelInfo **inherit_result_rel_array;
+
     /*
      * all_baserels is a Relids set of all base relids (but not "other"
      * relids) in the query; that is, the Relids identifier of the final join
@@ -283,6 +279,8 @@ struct PlannerInfo
      */
     List       *append_rel_list;    /* list of AppendRelInfos */

+    List       *inherit_result_rels;    /* List of InheritResultRelInfo */
+
     List       *rowMarks;        /* list of PlanRowMarks */

     List       *placeholder_list;    /* list of PlaceHolderInfos */
@@ -326,6 +324,9 @@ struct PlannerInfo
      */
     List       *update_colnos;

+    /* Scratch space for inherit.c: add_inherit_junk_var() */
+    List       *inherit_junk_tlist;    /* List of TargetEntry */
+
     /* Fields filled during create_plan() for use in setrefs.c */
     AttrNumber *grouping_map;    /* for GroupingFunc fixup */
     List       *minmax_aggs;    /* List of MinMaxAggInfos */
@@ -341,9 +342,6 @@ struct PlannerInfo
     Index        qual_security_level;    /* minimum security_level for quals */
     /* Note: qual_security_level is zero if there are no securityQuals */

-    InheritanceKind inhTargetKind;    /* indicates if the target relation is an
-                                     * inheritance child or partition or a
-                                     * partitioned table */
     bool        hasJoinRTEs;    /* true if any RTEs are RTE_JOIN kind */
     bool        hasLateralRTEs; /* true if any RTEs are marked LATERAL */
     bool        hasHavingQual;    /* true if havingQual was non-null */
@@ -374,6 +372,9 @@ struct PlannerInfo

     /* Does this query modify any partition key columns? */
     bool        partColsUpdated;
+
+    /* Highest result relation index assigned in this subquery */
+    int            lastResultRelIndex;
 };


@@ -1833,20 +1834,19 @@ typedef struct LockRowsPath
  * ModifyTablePath represents performing INSERT/UPDATE/DELETE modifications
  *
  * We represent most things that will be in the ModifyTable plan node
- * literally, except we have child Path(s) not Plan(s).  But analysis of the
+ * literally, except we have a child Path not Plan.  But analysis of the
  * OnConflictExpr is deferred to createplan.c, as is collection of FDW data.
  */
 typedef struct ModifyTablePath
 {
     Path        path;
+    Path       *subpath;        /* Path producing source data */
     CmdType        operation;        /* INSERT, UPDATE, or DELETE */
     bool        canSetTag;        /* do we set the command tag/es_processed? */
     Index        nominalRelation;    /* Parent RT index for use of EXPLAIN */
     Index        rootRelation;    /* Root RT index, if target is partitioned */
-    bool        partColsUpdated;    /* some part key in hierarchy updated */
+    bool        partColsUpdated;    /* some part key in hierarchy updated? */
     List       *resultRelations;    /* integer list of RT indexes */
-    List       *subpaths;        /* Path(s) producing source data */
-    List       *subroots;        /* per-target-table PlannerInfos */
     List       *updateColnosLists; /* per-target-table update_colnos lists */
     List       *withCheckOptionLists;    /* per-target-table WCO lists */
     List       *returningLists; /* per-target-table RETURNING tlists */
@@ -2286,6 +2286,13 @@ typedef struct AppendRelInfo
      */
     List       *translated_vars;    /* Expressions in the child's Vars */

+    /*
+     * The following contains expressions that the child relation is expected
+     * to output for each "fake" parent Var that add_inherit_junk_var() adds
+     * to the parent's reltarget; also see translate_fake_parent_var().
+     */
+    List       *translated_fake_vars;
+
     /*
      * This array simplifies translations in the reverse direction, from
      * child's column numbers to parent's.  The entry at [ccolno - 1] is the
@@ -2303,6 +2310,40 @@ typedef struct AppendRelInfo
     Oid            parent_reloid;    /* OID of parent relation */
 } AppendRelInfo;

+/*
+ * InheritResultRelInfo
+ *        Information about result relations of an inherited UPDATE/DELETE
+ *        operation
+ *
+ * If the main target relation is an inheritance parent, we build an
+ * InheritResultRelInfo for it and for every child result relation resulting
+ * from expanding it.  This is to store the information relevant to each
+ * result relation that must be added to the ModifyTable, such as its update
+ * targetlist, WITH CHECK OPTIONS, and RETURNING expression lists.  For the
+ * main result relation (root inheritance parent), that information is same
+ * as what's in Query and PlannerInfo.  For child result relations, we make
+ * copies of those expressions with appropriate translation of any Vars.
+ * Also, update_colnos for a given child relation has been adjusted to show
+ * that relation's attribute numbers.
+ *
+ * While it's okay for the code outside of the core planner to look at
+ * update_colnos, processed_tlist is only kept around for internal planner use.
+ * For example, an FDW's PlanDirectModify() may look at update_colnos to check
+ * if the assigned expressions are pushable.
+ */
+typedef struct InheritResultRelInfo
+{
+    NodeTag        type;
+
+    Index        resultRelation;
+    List       *withCheckOptions;
+    List       *returningList;
+
+    /* Only valid for UPDATE. */
+    List       *processed_tlist;
+    List       *update_colnos;
+} InheritResultRelInfo;
+
 /*
  * For each distinct placeholder expression generated during planning, we
  * store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 7d74bd92b8..f371390f7f 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -196,7 +196,7 @@ typedef struct ProjectSet

 /* ----------------
  *     ModifyTable node -
- *        Apply rows produced by subplan(s) to result table(s),
+ *        Apply rows produced by outer plan to result table(s),
  *        by inserting, updating, or deleting.
  *
  * If the originally named target table is a partitioned table, both
@@ -206,7 +206,7 @@ typedef struct ProjectSet
  * EXPLAIN should claim is the INSERT/UPDATE/DELETE target.
  *
  * Note that rowMarks and epqParam are presumed to be valid for all the
- * subplan(s); they can't contain any info that varies across subplans.
+ * table(s); they can't contain any info that varies across tables.
  * ----------------
  */
 typedef struct ModifyTable
@@ -216,9 +216,8 @@ typedef struct ModifyTable
     bool        canSetTag;        /* do we set the command tag/es_processed? */
     Index        nominalRelation;    /* Parent RT index for use of EXPLAIN */
     Index        rootRelation;    /* Root RT index, if target is partitioned */
-    bool        partColsUpdated;    /* some part key in hierarchy updated */
+    bool        partColsUpdated;    /* some part key in hierarchy updated? */
     List       *resultRelations;    /* integer list of RT indexes */
-    List       *plans;            /* plan(s) producing source data */
     List       *updateColnosLists; /* per-target-table update_colnos lists */
     List       *withCheckOptionLists;    /* per-target-table WCO lists */
     List       *returningLists; /* per-target-table RETURNING tlists */
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index d4ce037088..193fbf0e0e 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1386,10 +1386,14 @@ typedef struct InferenceElem
  * column for the item; so there may be missing or out-of-order resnos.
  * It is even legal to have duplicated resnos; consider
  *        UPDATE table SET arraycol[1] = ..., arraycol[2] = ..., ...
- * The two meanings come together in the executor, because the planner
- * transforms INSERT/UPDATE tlists into a normalized form with exactly
- * one entry for each column of the destination table.  Before that's
- * happened, however, it is risky to assume that resno == position.
+ * In an INSERT, the rewriter and planner will normalize the tlist by
+ * reordering it into physical column order and filling in default values
+ * for any columns not assigned values by the original query.  In an UPDATE,
+ * no such thing ever happens; the tlist elements are eventually renumbered
+ * to match their ordinal positions, but this has nothing to do with which
+ * table column will be updated.  (Look to the update column numbers list,
+ * which parallels the finished tlist, to find that out.)
+ *
  * Generally get_tle_by_resno() should be used rather than list_nth()
  * to fetch tlist entries by resno, and only in SELECT should you assume
  * that resno is a unique identifier.
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index 4cbf8c26cc..a52333a364 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -22,6 +22,8 @@ extern AppendRelInfo *make_append_rel_info(Relation parentrel,
                                            Index parentRTindex, Index childRTindex);
 extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
                                     int nappinfos, AppendRelInfo **appinfos);
+extern Node *adjust_target_appendrel_attrs(PlannerInfo *root, Node *node,
+                                    AppendRelInfo *appinfo);
 extern Node *adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node,
                                                Relids child_relids,
                                                Relids top_parent_relids);
@@ -31,5 +33,6 @@ extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
                                              Relids child_relids, Relids top_parent_relids);
 extern AppendRelInfo **find_appinfos_by_relids(PlannerInfo *root,
                                                Relids relids, int *nappinfos);
+extern Node *translate_fake_parent_var(Var *var, AppendRelInfo *appinfo);

 #endif                            /* APPENDINFO_H */
diff --git a/src/include/optimizer/inherit.h b/src/include/optimizer/inherit.h
index e9472f2f73..1c27d9123d 100644
--- a/src/include/optimizer/inherit.h
+++ b/src/include/optimizer/inherit.h
@@ -24,4 +24,10 @@ extern bool apply_child_basequals(PlannerInfo *root, RelOptInfo *parentrel,
                                   RelOptInfo *childrel, RangeTblEntry *childRTE,
                                   AppendRelInfo *appinfo);

+extern bool is_result_relation(PlannerInfo *root, Index relid);
+
+extern void get_result_update_info(PlannerInfo *root, Index result_relation,
+                                   List **processed_tlist,
+                                   List **update_colnos);
+
 #endif                            /* INHERIT_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9673a4a638..d539bc2783 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -260,11 +260,11 @@ extern LockRowsPath *create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
                                           Path *subpath, List *rowMarks, int epqParam);
 extern ModifyTablePath *create_modifytable_path(PlannerInfo *root,
                                                 RelOptInfo *rel,
+                                                Path *subpath,
                                                 CmdType operation, bool canSetTag,
                                                 Index nominalRelation, Index rootRelation,
                                                 bool partColsUpdated,
-                                                List *resultRelations, List *subpaths,
-                                                List *subroots,
+                                                List *resultRelations,
                                                 List *updateColnosLists,
                                                 List *withCheckOptionLists, List *returningLists,
                                                 List *rowMarks, OnConflictExpr *onconflict,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 94e43c3410..d4b112c5e4 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -545,27 +545,25 @@ create table some_tab_child () inherits (some_tab);
 insert into some_tab_child values(1,2);
 explain (verbose, costs off)
 update some_tab set a = a + 1 where false;
-           QUERY PLAN
---------------------------------
+                   QUERY PLAN
+-------------------------------------------------
  Update on public.some_tab
-   Update on public.some_tab
    ->  Result
-         Output: (a + 1), ctid
+         Output: (some_tab.a + 1), some_tab.ctid
          One-Time Filter: false
-(5 rows)
+(4 rows)

 update some_tab set a = a + 1 where false;
 explain (verbose, costs off)
 update some_tab set a = a + 1 where false returning b, a;
-           QUERY PLAN
---------------------------------
+                   QUERY PLAN
+-------------------------------------------------
  Update on public.some_tab
-   Output: b, a
-   Update on public.some_tab
+   Output: some_tab.b, some_tab.a
    ->  Result
-         Output: (a + 1), ctid
+         Output: (some_tab.a + 1), some_tab.ctid
          One-Time Filter: false
-(6 rows)
+(5 rows)

 update some_tab set a = a + 1 where false returning b, a;
  b | a
@@ -670,7 +668,7 @@ explain update parted_tab set a = 2 where false;
                        QUERY PLAN
 --------------------------------------------------------
  Update on parted_tab  (cost=0.00..0.00 rows=0 width=0)
-   ->  Result  (cost=0.00..0.00 rows=0 width=0)
+   ->  Result  (cost=0.00..0.00 rows=0 width=10)
          One-Time Filter: false
 (3 rows)

diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index ff157ceb1c..73c0f3e04b 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -212,7 +212,7 @@ explain (costs off, format json) insert into insertconflicttest values (0, 'Bilb
        "Plans": [                                                      +
          {                                                             +
            "Node Type": "Result",                                      +
-           "Parent Relationship": "Member",                            +
+           "Parent Relationship": "Outer",                             +
            "Parallel Aware": false                                     +
          }                                                             +
        ]                                                               +
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 0057f41caa..27f7525b3e 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1926,37 +1926,27 @@ WHERE EXISTS (
     FROM int4_tbl,
          LATERAL (SELECT int4_tbl.f1 FROM int8_tbl LIMIT 2) ss
     WHERE prt1_l.c IS NULL);
-                          QUERY PLAN
----------------------------------------------------------------
+                        QUERY PLAN
+----------------------------------------------------------
  Delete on prt1_l
    Delete on prt1_l_p1 prt1_l_1
    Delete on prt1_l_p3_p1 prt1_l_2
    Delete on prt1_l_p3_p2 prt1_l_3
    ->  Nested Loop Semi Join
-         ->  Seq Scan on prt1_l_p1 prt1_l_1
-               Filter: (c IS NULL)
-         ->  Nested Loop
-               ->  Seq Scan on int4_tbl
-               ->  Subquery Scan on ss
-                     ->  Limit
-                           ->  Seq Scan on int8_tbl
-   ->  Nested Loop Semi Join
-         ->  Seq Scan on prt1_l_p3_p1 prt1_l_2
-               Filter: (c IS NULL)
-         ->  Nested Loop
-               ->  Seq Scan on int4_tbl
-               ->  Subquery Scan on ss_1
-                     ->  Limit
-                           ->  Seq Scan on int8_tbl int8_tbl_1
-   ->  Nested Loop Semi Join
-         ->  Seq Scan on prt1_l_p3_p2 prt1_l_3
-               Filter: (c IS NULL)
-         ->  Nested Loop
-               ->  Seq Scan on int4_tbl
-               ->  Subquery Scan on ss_2
-                     ->  Limit
-                           ->  Seq Scan on int8_tbl int8_tbl_2
-(28 rows)
+         ->  Append
+               ->  Seq Scan on prt1_l_p1 prt1_l_1
+                     Filter: (c IS NULL)
+               ->  Seq Scan on prt1_l_p3_p1 prt1_l_2
+                     Filter: (c IS NULL)
+               ->  Seq Scan on prt1_l_p3_p2 prt1_l_3
+                     Filter: (c IS NULL)
+         ->  Materialize
+               ->  Nested Loop
+                     ->  Seq Scan on int4_tbl
+                     ->  Subquery Scan on ss
+                           ->  Limit
+                                 ->  Seq Scan on int8_tbl
+(18 rows)

 --
 -- negative testcases
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index bde29e38a9..c4e827caec 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2463,74 +2463,43 @@ deallocate ab_q6;
 insert into ab values (1,2);
 explain (analyze, costs off, summary off, timing off)
 update ab_a1 set b = 3 from ab where ab.a = 1 and ab.a = ab_a1.a;
-                                     QUERY PLAN
--------------------------------------------------------------------------------------
+                                        QUERY PLAN
+-------------------------------------------------------------------------------------------
  Update on ab_a1 (actual rows=0 loops=1)
    Update on ab_a1_b1 ab_a1_1
    Update on ab_a1_b2 ab_a1_2
    Update on ab_a1_b3 ab_a1_3
-   ->  Nested Loop (actual rows=0 loops=1)
-         ->  Append (actual rows=1 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
-                     Recheck Cond: (a = 1)
-                     Heap Blocks: exact=1
-                     ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
-         ->  Materialize (actual rows=0 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_1 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
    ->  Nested Loop (actual rows=1 loops=1)
          ->  Append (actual rows=1 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
+               ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_1 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
                            Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
-                     Recheck Cond: (a = 1)
-                     Heap Blocks: exact=1
-                     ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-         ->  Materialize (actual rows=1 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b2 ab_a1_2 (actual rows=1 loops=1)
                      Recheck Cond: (a = 1)
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
                            Index Cond: (a = 1)
-   ->  Nested Loop (actual rows=0 loops=1)
-         ->  Append (actual rows=1 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
-                     Recheck Cond: (a = 1)
-                     Heap Blocks: exact=1
-                     ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-         ->  Materialize (actual rows=0 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b3 ab_a1_3 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
                            Index Cond: (a = 1)
-(65 rows)
+         ->  Materialize (actual rows=1 loops=1)
+               ->  Append (actual rows=1 loops=1)
+                     ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
+                           Recheck Cond: (a = 1)
+                           ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                                 Index Cond: (a = 1)
+                     ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
+                           Recheck Cond: (a = 1)
+                           Heap Blocks: exact=1
+                           ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
+                                 Index Cond: (a = 1)
+                     ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
+                           Recheck Cond: (a = 1)
+                           ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
+                                 Index Cond: (a = 1)
+(34 rows)

 table ab;
  a | b
@@ -2551,29 +2520,12 @@ update ab_a1 set b = 3 from ab_a2 where ab_a2.b = (select 1);
    Update on ab_a1_b3 ab_a1_3
    InitPlan 1 (returns $0)
      ->  Result (actual rows=1 loops=1)
-   ->  Nested Loop (actual rows=1 loops=1)
-         ->  Seq Scan on ab_a1_b1 ab_a1_1 (actual rows=1 loops=1)
-         ->  Materialize (actual rows=1 loops=1)
-               ->  Append (actual rows=1 loops=1)
-                     ->  Seq Scan on ab_a2_b1 ab_a2_1 (actual rows=1 loops=1)
-                           Filter: (b = $0)
-                     ->  Seq Scan on ab_a2_b2 ab_a2_2 (never executed)
-                           Filter: (b = $0)
-                     ->  Seq Scan on ab_a2_b3 ab_a2_3 (never executed)
-                           Filter: (b = $0)
-   ->  Nested Loop (actual rows=1 loops=1)
-         ->  Seq Scan on ab_a1_b2 ab_a1_2 (actual rows=1 loops=1)
-         ->  Materialize (actual rows=1 loops=1)
-               ->  Append (actual rows=1 loops=1)
-                     ->  Seq Scan on ab_a2_b1 ab_a2_1 (actual rows=1 loops=1)
-                           Filter: (b = $0)
-                     ->  Seq Scan on ab_a2_b2 ab_a2_2 (never executed)
-                           Filter: (b = $0)
-                     ->  Seq Scan on ab_a2_b3 ab_a2_3 (never executed)
-                           Filter: (b = $0)
-   ->  Nested Loop (actual rows=1 loops=1)
-         ->  Seq Scan on ab_a1_b3 ab_a1_3 (actual rows=1 loops=1)
-         ->  Materialize (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=3 loops=1)
+         ->  Append (actual rows=3 loops=1)
+               ->  Seq Scan on ab_a1_b1 ab_a1_1 (actual rows=1 loops=1)
+               ->  Seq Scan on ab_a1_b2 ab_a1_2 (actual rows=1 loops=1)
+               ->  Seq Scan on ab_a1_b3 ab_a1_3 (actual rows=1 loops=1)
+         ->  Materialize (actual rows=1 loops=3)
                ->  Append (actual rows=1 loops=1)
                      ->  Seq Scan on ab_a2_b1 ab_a2_1 (actual rows=1 loops=1)
                            Filter: (b = $0)
@@ -2581,7 +2533,7 @@ update ab_a1 set b = 3 from ab_a2 where ab_a2.b = (select 1);
                            Filter: (b = $0)
                      ->  Seq Scan on ab_a2_b3 ab_a2_3 (never executed)
                            Filter: (b = $0)
-(36 rows)
+(19 rows)

 select tableoid::regclass, * from ab;
  tableoid | a | b
@@ -3420,28 +3372,30 @@ explain (costs off) select * from pp_lp where a = 1;
 (5 rows)

 explain (costs off) update pp_lp set value = 10 where a = 1;
-            QUERY PLAN
-----------------------------------
+               QUERY PLAN
+----------------------------------------
  Update on pp_lp
    Update on pp_lp1 pp_lp_1
    Update on pp_lp2 pp_lp_2
-   ->  Seq Scan on pp_lp1 pp_lp_1
-         Filter: (a = 1)
-   ->  Seq Scan on pp_lp2 pp_lp_2
-         Filter: (a = 1)
-(7 rows)
+   ->  Append
+         ->  Seq Scan on pp_lp1 pp_lp_1
+               Filter: (a = 1)
+         ->  Seq Scan on pp_lp2 pp_lp_2
+               Filter: (a = 1)
+(8 rows)

 explain (costs off) delete from pp_lp where a = 1;
-            QUERY PLAN
-----------------------------------
+               QUERY PLAN
+----------------------------------------
  Delete on pp_lp
    Delete on pp_lp1 pp_lp_1
    Delete on pp_lp2 pp_lp_2
-   ->  Seq Scan on pp_lp1 pp_lp_1
-         Filter: (a = 1)
-   ->  Seq Scan on pp_lp2 pp_lp_2
-         Filter: (a = 1)
-(7 rows)
+   ->  Append
+         ->  Seq Scan on pp_lp1 pp_lp_1
+               Filter: (a = 1)
+         ->  Seq Scan on pp_lp2 pp_lp_2
+               Filter: (a = 1)
+(8 rows)

 set constraint_exclusion = 'off'; -- this should not affect the result.
 explain (costs off) select * from pp_lp where a = 1;
@@ -3455,28 +3409,30 @@ explain (costs off) select * from pp_lp where a = 1;
 (5 rows)

 explain (costs off) update pp_lp set value = 10 where a = 1;
-            QUERY PLAN
-----------------------------------
+               QUERY PLAN
+----------------------------------------
  Update on pp_lp
    Update on pp_lp1 pp_lp_1
    Update on pp_lp2 pp_lp_2
-   ->  Seq Scan on pp_lp1 pp_lp_1
-         Filter: (a = 1)
-   ->  Seq Scan on pp_lp2 pp_lp_2
-         Filter: (a = 1)
-(7 rows)
+   ->  Append
+         ->  Seq Scan on pp_lp1 pp_lp_1
+               Filter: (a = 1)
+         ->  Seq Scan on pp_lp2 pp_lp_2
+               Filter: (a = 1)
+(8 rows)

 explain (costs off) delete from pp_lp where a = 1;
-            QUERY PLAN
-----------------------------------
+               QUERY PLAN
+----------------------------------------
  Delete on pp_lp
    Delete on pp_lp1 pp_lp_1
    Delete on pp_lp2 pp_lp_2
-   ->  Seq Scan on pp_lp1 pp_lp_1
-         Filter: (a = 1)
-   ->  Seq Scan on pp_lp2 pp_lp_2
-         Filter: (a = 1)
-(7 rows)
+   ->  Append
+         ->  Seq Scan on pp_lp1 pp_lp_1
+               Filter: (a = 1)
+         ->  Seq Scan on pp_lp2 pp_lp_2
+               Filter: (a = 1)
+(8 rows)

 drop table pp_lp;
 -- Ensure enable_partition_prune does not affect non-partitioned tables.
@@ -3500,28 +3456,31 @@ explain (costs off) select * from inh_lp where a = 1;
 (5 rows)

 explain (costs off) update inh_lp set value = 10 where a = 1;
-             QUERY PLAN
-------------------------------------
+                   QUERY PLAN
+------------------------------------------------
  Update on inh_lp
-   Update on inh_lp
-   Update on inh_lp1 inh_lp_1
-   ->  Seq Scan on inh_lp
-         Filter: (a = 1)
-   ->  Seq Scan on inh_lp1 inh_lp_1
-         Filter: (a = 1)
-(7 rows)
+   Update on inh_lp inh_lp_1
+   Update on inh_lp1 inh_lp_2
+   ->  Result
+         ->  Append
+               ->  Seq Scan on inh_lp inh_lp_1
+                     Filter: (a = 1)
+               ->  Seq Scan on inh_lp1 inh_lp_2
+                     Filter: (a = 1)
+(9 rows)

 explain (costs off) delete from inh_lp where a = 1;
-             QUERY PLAN
-------------------------------------
+                QUERY PLAN
+------------------------------------------
  Delete on inh_lp
-   Delete on inh_lp
-   Delete on inh_lp1 inh_lp_1
-   ->  Seq Scan on inh_lp
-         Filter: (a = 1)
-   ->  Seq Scan on inh_lp1 inh_lp_1
-         Filter: (a = 1)
-(7 rows)
+   Delete on inh_lp inh_lp_1
+   Delete on inh_lp1 inh_lp_2
+   ->  Append
+         ->  Seq Scan on inh_lp inh_lp_1
+               Filter: (a = 1)
+         ->  Seq Scan on inh_lp1 inh_lp_2
+               Filter: (a = 1)
+(8 rows)

 -- Ensure we don't exclude normal relations when we only expect to exclude
 -- inheritance children
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..b02a682471 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1632,19 +1632,21 @@ EXPLAIN (COSTS OFF) EXECUTE p2(2);
 --
 SET SESSION AUTHORIZATION regress_rls_bob;
 EXPLAIN (COSTS OFF) UPDATE t1 SET b = b || b WHERE f_leak(b);
-                  QUERY PLAN
------------------------------------------------
+                        QUERY PLAN
+-----------------------------------------------------------
  Update on t1
-   Update on t1
-   Update on t2 t1_1
-   Update on t3 t1_2
-   ->  Seq Scan on t1
-         Filter: (((a % 2) = 0) AND f_leak(b))
-   ->  Seq Scan on t2 t1_1
-         Filter: (((a % 2) = 0) AND f_leak(b))
-   ->  Seq Scan on t3 t1_2
-         Filter: (((a % 2) = 0) AND f_leak(b))
-(10 rows)
+   Update on t1 t1_1
+   Update on t2 t1_2
+   Update on t3 t1_3
+   ->  Result
+         ->  Append
+               ->  Seq Scan on t1 t1_1
+                     Filter: (((a % 2) = 0) AND f_leak(b))
+               ->  Seq Scan on t2 t1_2
+                     Filter: (((a % 2) = 0) AND f_leak(b))
+               ->  Seq Scan on t3 t1_3
+                     Filter: (((a % 2) = 0) AND f_leak(b))
+(12 rows)

 UPDATE t1 SET b = b || b WHERE f_leak(b);
 NOTICE:  f_leak => bbb
@@ -1722,31 +1724,27 @@ NOTICE:  f_leak => cde
 NOTICE:  f_leak => yyyyyy
 EXPLAIN (COSTS OFF) UPDATE t1 SET b=t1.b FROM t2
 WHERE t1.a = 3 and t2.a = 3 AND f_leak(t1.b) AND f_leak(t2.b);
-                           QUERY PLAN
------------------------------------------------------------------
+                              QUERY PLAN
+-----------------------------------------------------------------------
  Update on t1
-   Update on t1
-   Update on t2 t1_1
-   Update on t3 t1_2
-   ->  Nested Loop
-         ->  Seq Scan on t1
-               Filter: ((a = 3) AND ((a % 2) = 0) AND f_leak(b))
-         ->  Seq Scan on t2
-               Filter: ((a = 3) AND ((a % 2) = 1) AND f_leak(b))
-   ->  Nested Loop
-         ->  Seq Scan on t2 t1_1
-               Filter: ((a = 3) AND ((a % 2) = 0) AND f_leak(b))
-         ->  Seq Scan on t2
-               Filter: ((a = 3) AND ((a % 2) = 1) AND f_leak(b))
+   Update on t1 t1_1
+   Update on t2 t1_2
+   Update on t3 t1_3
    ->  Nested Loop
-         ->  Seq Scan on t3 t1_2
-               Filter: ((a = 3) AND ((a % 2) = 0) AND f_leak(b))
          ->  Seq Scan on t2
                Filter: ((a = 3) AND ((a % 2) = 1) AND f_leak(b))
-(19 rows)
+         ->  Append
+               ->  Seq Scan on t1 t1_1
+                     Filter: ((a = 3) AND ((a % 2) = 0) AND f_leak(b))
+               ->  Seq Scan on t2 t1_2
+                     Filter: ((a = 3) AND ((a % 2) = 0) AND f_leak(b))
+               ->  Seq Scan on t3 t1_3
+                     Filter: ((a = 3) AND ((a % 2) = 0) AND f_leak(b))
+(14 rows)

 UPDATE t1 SET b=t1.b FROM t2
 WHERE t1.a = 3 and t2.a = 3 AND f_leak(t1.b) AND f_leak(t2.b);
+NOTICE:  f_leak => cde
 EXPLAIN (COSTS OFF) UPDATE t2 SET b=t2.b FROM t1
 WHERE t1.a = 3 and t2.a = 3 AND f_leak(t1.b) AND f_leak(t2.b);
                               QUERY PLAN
@@ -1795,46 +1793,30 @@ NOTICE:  f_leak => cde
 EXPLAIN (COSTS OFF) UPDATE t1 t1_1 SET b = t1_2.b FROM t1 t1_2
 WHERE t1_1.a = 4 AND t1_2.a = t1_1.a AND t1_2.b = t1_1.b
 AND f_leak(t1_1.b) AND f_leak(t1_2.b) RETURNING *, t1_1, t1_2;
-                              QUERY PLAN
------------------------------------------------------------------------
+                                 QUERY PLAN
+-----------------------------------------------------------------------------
  Update on t1 t1_1
-   Update on t1 t1_1
-   Update on t2 t1_1_1
-   Update on t3 t1_1_2
+   Update on t1 t1_1_1
+   Update on t2 t1_1_2
+   Update on t3 t1_1_3
    ->  Nested Loop
          Join Filter: (t1_1.b = t1_2.b)
-         ->  Seq Scan on t1 t1_1
-               Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-         ->  Append
-               ->  Seq Scan on t1 t1_2_1
-                     Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-               ->  Seq Scan on t2 t1_2_2
-                     Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-               ->  Seq Scan on t3 t1_2_3
-                     Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-   ->  Nested Loop
-         Join Filter: (t1_1_1.b = t1_2.b)
-         ->  Seq Scan on t2 t1_1_1
-               Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
          ->  Append
-               ->  Seq Scan on t1 t1_2_1
-                     Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-               ->  Seq Scan on t2 t1_2_2
-                     Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-               ->  Seq Scan on t3 t1_2_3
+               ->  Seq Scan on t1 t1_1_1
                      Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-   ->  Nested Loop
-         Join Filter: (t1_1_2.b = t1_2.b)
-         ->  Seq Scan on t3 t1_1_2
-               Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-         ->  Append
-               ->  Seq Scan on t1 t1_2_1
+               ->  Seq Scan on t2 t1_1_2
                      Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-               ->  Seq Scan on t2 t1_2_2
+               ->  Seq Scan on t3 t1_1_3
                      Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-               ->  Seq Scan on t3 t1_2_3
-                     Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-(37 rows)
+         ->  Materialize
+               ->  Append
+                     ->  Seq Scan on t1 t1_2_1
+                           Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
+                     ->  Seq Scan on t2 t1_2_2
+                           Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
+                     ->  Seq Scan on t3 t1_2_3
+                           Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
+(21 rows)

 UPDATE t1 t1_1 SET b = t1_2.b FROM t1 t1_2
 WHERE t1_1.a = 4 AND t1_2.a = t1_1.a AND t1_2.b = t1_1.b
@@ -1842,8 +1824,6 @@ AND f_leak(t1_1.b) AND f_leak(t1_2.b) RETURNING *, t1_1, t1_2;
 NOTICE:  f_leak => daddad_updt
 NOTICE:  f_leak => daddad_updt
 NOTICE:  f_leak => defdef
-NOTICE:  f_leak => defdef
-NOTICE:  f_leak => daddad_updt
 NOTICE:  f_leak => defdef
  id  | a |      b      | id  | a |      b      |        t1_1         |        t1_2
 -----+---+-------------+-----+---+-------------+---------------------+---------------------
@@ -1880,19 +1860,20 @@ EXPLAIN (COSTS OFF) DELETE FROM only t1 WHERE f_leak(b);
 (3 rows)

 EXPLAIN (COSTS OFF) DELETE FROM t1 WHERE f_leak(b);
-                  QUERY PLAN
------------------------------------------------
+                     QUERY PLAN
+-----------------------------------------------------
  Delete on t1
-   Delete on t1
-   Delete on t2 t1_1
-   Delete on t3 t1_2
-   ->  Seq Scan on t1
-         Filter: (((a % 2) = 0) AND f_leak(b))
-   ->  Seq Scan on t2 t1_1
-         Filter: (((a % 2) = 0) AND f_leak(b))
-   ->  Seq Scan on t3 t1_2
-         Filter: (((a % 2) = 0) AND f_leak(b))
-(10 rows)
+   Delete on t1 t1_1
+   Delete on t2 t1_2
+   Delete on t3 t1_3
+   ->  Append
+         ->  Seq Scan on t1 t1_1
+               Filter: (((a % 2) = 0) AND f_leak(b))
+         ->  Seq Scan on t2 t1_2
+               Filter: (((a % 2) = 0) AND f_leak(b))
+         ->  Seq Scan on t3 t1_3
+               Filter: (((a % 2) = 0) AND f_leak(b))
+(11 rows)

 DELETE FROM only t1 WHERE f_leak(b) RETURNING tableoid::regclass, *, t1;
 NOTICE:  f_leak => bbbbbb_updt
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 770eab38b5..d759d3a896 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -1607,26 +1607,21 @@ UPDATE rw_view1 SET a = a + 1000 FROM other_tbl_parent WHERE a = id;
                                QUERY PLAN
 -------------------------------------------------------------------------
  Update on base_tbl_parent
-   Update on base_tbl_parent
-   Update on base_tbl_child base_tbl_parent_1
-   ->  Hash Join
-         Hash Cond: (other_tbl_parent.id = base_tbl_parent.a)
-         ->  Append
-               ->  Seq Scan on other_tbl_parent other_tbl_parent_1
-               ->  Seq Scan on other_tbl_child other_tbl_parent_2
-         ->  Hash
-               ->  Seq Scan on base_tbl_parent
+   Update on base_tbl_parent base_tbl_parent_1
+   Update on base_tbl_child base_tbl_parent_2
    ->  Merge Join
-         Merge Cond: (base_tbl_parent_1.a = other_tbl_parent.id)
+         Merge Cond: (base_tbl_parent.a = other_tbl_parent.id)
          ->  Sort
-               Sort Key: base_tbl_parent_1.a
-               ->  Seq Scan on base_tbl_child base_tbl_parent_1
+               Sort Key: base_tbl_parent.a
+               ->  Append
+                     ->  Seq Scan on base_tbl_parent base_tbl_parent_1
+                     ->  Seq Scan on base_tbl_child base_tbl_parent_2
          ->  Sort
                Sort Key: other_tbl_parent.id
                ->  Append
                      ->  Seq Scan on other_tbl_parent other_tbl_parent_1
                      ->  Seq Scan on other_tbl_child other_tbl_parent_2
-(20 rows)
+(15 rows)

 UPDATE rw_view1 SET a = a + 1000 FROM other_tbl_parent WHERE a = id;
 SELECT * FROM ONLY base_tbl_parent ORDER BY a;
@@ -2332,36 +2327,39 @@ SELECT * FROM v1 WHERE a=8;

 EXPLAIN (VERBOSE, COSTS OFF)
 UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
-                                       QUERY PLAN
------------------------------------------------------------------------------------------
+                                             QUERY PLAN
+-----------------------------------------------------------------------------------------------------
  Update on public.t1
-   Update on public.t1
-   Update on public.t11 t1_1
-   Update on public.t12 t1_2
-   Update on public.t111 t1_3
-   ->  Index Scan using t1_a_idx on public.t1
-         Output: 100, t1.ctid
-         Index Cond: ((t1.a > 5) AND (t1.a < 7))
-         Filter: ((t1.a <> 6) AND (SubPlan 1) AND snoop(t1.a) AND leakproof(t1.a))
-         SubPlan 1
-           ->  Append
-                 ->  Seq Scan on public.t12 t12_1
-                       Filter: (t12_1.a = t1.a)
-                 ->  Seq Scan on public.t111 t12_2
-                       Filter: (t12_2.a = t1.a)
-   ->  Index Scan using t11_a_idx on public.t11 t1_1
-         Output: 100, t1_1.ctid
-         Index Cond: ((t1_1.a > 5) AND (t1_1.a < 7))
-         Filter: ((t1_1.a <> 6) AND (SubPlan 1) AND snoop(t1_1.a) AND leakproof(t1_1.a))
-   ->  Index Scan using t12_a_idx on public.t12 t1_2
-         Output: 100, t1_2.ctid
-         Index Cond: ((t1_2.a > 5) AND (t1_2.a < 7))
-         Filter: ((t1_2.a <> 6) AND (SubPlan 1) AND snoop(t1_2.a) AND leakproof(t1_2.a))
-   ->  Index Scan using t111_a_idx on public.t111 t1_3
-         Output: 100, t1_3.ctid
-         Index Cond: ((t1_3.a > 5) AND (t1_3.a < 7))
-         Filter: ((t1_3.a <> 6) AND (SubPlan 1) AND snoop(t1_3.a) AND leakproof(t1_3.a))
-(27 rows)
+   Update on public.t1 t1_1
+   Update on public.t11 t1_2
+   Update on public.t12 t1_3
+   Update on public.t111 t1_4
+   ->  Result
+         Output: 100, t1.ctid, (0)
+         ->  Append
+               ->  Index Scan using t1_a_idx on public.t1 t1_1
+                     Output: t1_1.ctid, 0
+                     Index Cond: ((t1_1.a > 5) AND (t1_1.a < 7))
+                     Filter: ((t1_1.a <> 6) AND (SubPlan 1) AND snoop(t1_1.a) AND leakproof(t1_1.a))
+                     SubPlan 1
+                       ->  Append
+                             ->  Seq Scan on public.t12 t12_1
+                                   Filter: (t12_1.a = t1_1.a)
+                             ->  Seq Scan on public.t111 t12_2
+                                   Filter: (t12_2.a = t1_1.a)
+               ->  Index Scan using t11_a_idx on public.t11 t1_2
+                     Output: t1_2.ctid, 1
+                     Index Cond: ((t1_2.a > 5) AND (t1_2.a < 7))
+                     Filter: ((t1_2.a <> 6) AND (SubPlan 1) AND snoop(t1_2.a) AND leakproof(t1_2.a))
+               ->  Index Scan using t12_a_idx on public.t12 t1_3
+                     Output: t1_3.ctid, 2
+                     Index Cond: ((t1_3.a > 5) AND (t1_3.a < 7))
+                     Filter: ((t1_3.a <> 6) AND (SubPlan 1) AND snoop(t1_3.a) AND leakproof(t1_3.a))
+               ->  Index Scan using t111_a_idx on public.t111 t1_4
+                     Output: t1_4.ctid, 3
+                     Index Cond: ((t1_4.a > 5) AND (t1_4.a < 7))
+                     Filter: ((t1_4.a <> 6) AND (SubPlan 1) AND snoop(t1_4.a) AND leakproof(t1_4.a))
+(30 rows)

 UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
 SELECT * FROM v1 WHERE a=100; -- Nothing should have been changed to 100
@@ -2376,36 +2374,39 @@ SELECT * FROM t1 WHERE a=100; -- Nothing should have been changed to 100

 EXPLAIN (VERBOSE, COSTS OFF)
 UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
-                              QUERY PLAN
------------------------------------------------------------------------
+                                    QUERY PLAN
+-----------------------------------------------------------------------------------
  Update on public.t1
-   Update on public.t1
-   Update on public.t11 t1_1
-   Update on public.t12 t1_2
-   Update on public.t111 t1_3
-   ->  Index Scan using t1_a_idx on public.t1
-         Output: (t1.a + 1), t1.ctid
-         Index Cond: ((t1.a > 5) AND (t1.a = 8))
-         Filter: ((SubPlan 1) AND snoop(t1.a) AND leakproof(t1.a))
-         SubPlan 1
-           ->  Append
-                 ->  Seq Scan on public.t12 t12_1
-                       Filter: (t12_1.a = t1.a)
-                 ->  Seq Scan on public.t111 t12_2
-                       Filter: (t12_2.a = t1.a)
-   ->  Index Scan using t11_a_idx on public.t11 t1_1
-         Output: (t1_1.a + 1), t1_1.ctid
-         Index Cond: ((t1_1.a > 5) AND (t1_1.a = 8))
-         Filter: ((SubPlan 1) AND snoop(t1_1.a) AND leakproof(t1_1.a))
-   ->  Index Scan using t12_a_idx on public.t12 t1_2
-         Output: (t1_2.a + 1), t1_2.ctid
-         Index Cond: ((t1_2.a > 5) AND (t1_2.a = 8))
-         Filter: ((SubPlan 1) AND snoop(t1_2.a) AND leakproof(t1_2.a))
-   ->  Index Scan using t111_a_idx on public.t111 t1_3
-         Output: (t1_3.a + 1), t1_3.ctid
-         Index Cond: ((t1_3.a > 5) AND (t1_3.a = 8))
-         Filter: ((SubPlan 1) AND snoop(t1_3.a) AND leakproof(t1_3.a))
-(27 rows)
+   Update on public.t1 t1_1
+   Update on public.t11 t1_2
+   Update on public.t12 t1_3
+   Update on public.t111 t1_4
+   ->  Result
+         Output: (t1.a + 1), t1.ctid, (0)
+         ->  Append
+               ->  Index Scan using t1_a_idx on public.t1 t1_1
+                     Output: t1_1.a, t1_1.ctid, 0
+                     Index Cond: ((t1_1.a > 5) AND (t1_1.a = 8))
+                     Filter: ((SubPlan 1) AND snoop(t1_1.a) AND leakproof(t1_1.a))
+                     SubPlan 1
+                       ->  Append
+                             ->  Seq Scan on public.t12 t12_1
+                                   Filter: (t12_1.a = t1_1.a)
+                             ->  Seq Scan on public.t111 t12_2
+                                   Filter: (t12_2.a = t1_1.a)
+               ->  Index Scan using t11_a_idx on public.t11 t1_2
+                     Output: t1_2.a, t1_2.ctid, 1
+                     Index Cond: ((t1_2.a > 5) AND (t1_2.a = 8))
+                     Filter: ((SubPlan 1) AND snoop(t1_2.a) AND leakproof(t1_2.a))
+               ->  Index Scan using t12_a_idx on public.t12 t1_3
+                     Output: t1_3.a, t1_3.ctid, 2
+                     Index Cond: ((t1_3.a > 5) AND (t1_3.a = 8))
+                     Filter: ((SubPlan 1) AND snoop(t1_3.a) AND leakproof(t1_3.a))
+               ->  Index Scan using t111_a_idx on public.t111 t1_4
+                     Output: t1_4.a, t1_4.ctid, 3
+                     Index Cond: ((t1_4.a > 5) AND (t1_4.a = 8))
+                     Filter: ((SubPlan 1) AND snoop(t1_4.a) AND leakproof(t1_4.a))
+(30 rows)

 UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
 NOTICE:  snooped value: 8
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index dece036069..dc34ac67b3 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -308,8 +308,8 @@ ALTER TABLE part_b_10_b_20 ATTACH PARTITION part_c_1_100 FOR VALUES FROM (1) TO

 -- The order of subplans should be in bound order
 EXPLAIN (costs off) UPDATE range_parted set c = c - 50 WHERE c > 97;
-                   QUERY PLAN
--------------------------------------------------
+                      QUERY PLAN
+-------------------------------------------------------
  Update on range_parted
    Update on part_a_1_a_10 range_parted_1
    Update on part_a_10_a_20 range_parted_2
@@ -318,21 +318,22 @@ EXPLAIN (costs off) UPDATE range_parted set c = c - 50 WHERE c > 97;
    Update on part_d_1_15 range_parted_5
    Update on part_d_15_20 range_parted_6
    Update on part_b_20_b_30 range_parted_7
-   ->  Seq Scan on part_a_1_a_10 range_parted_1
-         Filter: (c > '97'::numeric)
-   ->  Seq Scan on part_a_10_a_20 range_parted_2
-         Filter: (c > '97'::numeric)
-   ->  Seq Scan on part_b_1_b_10 range_parted_3
-         Filter: (c > '97'::numeric)
-   ->  Seq Scan on part_c_1_100 range_parted_4
-         Filter: (c > '97'::numeric)
-   ->  Seq Scan on part_d_1_15 range_parted_5
-         Filter: (c > '97'::numeric)
-   ->  Seq Scan on part_d_15_20 range_parted_6
-         Filter: (c > '97'::numeric)
-   ->  Seq Scan on part_b_20_b_30 range_parted_7
-         Filter: (c > '97'::numeric)
-(22 rows)
+   ->  Append
+         ->  Seq Scan on part_a_1_a_10 range_parted_1
+               Filter: (c > '97'::numeric)
+         ->  Seq Scan on part_a_10_a_20 range_parted_2
+               Filter: (c > '97'::numeric)
+         ->  Seq Scan on part_b_1_b_10 range_parted_3
+               Filter: (c > '97'::numeric)
+         ->  Seq Scan on part_c_1_100 range_parted_4
+               Filter: (c > '97'::numeric)
+         ->  Seq Scan on part_d_1_15 range_parted_5
+               Filter: (c > '97'::numeric)
+         ->  Seq Scan on part_d_15_20 range_parted_6
+               Filter: (c > '97'::numeric)
+         ->  Seq Scan on part_b_20_b_30 range_parted_7
+               Filter: (c > '97'::numeric)
+(23 rows)

 -- fail, row movement happens only within the partition subtree.
 UPDATE part_c_100_200 set c = c - 20, d = c WHERE c = 105;
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 9a6b716ddc..1cec874d62 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -2909,44 +2909,32 @@ DELETE FROM a USING wcte WHERE aa = q2;
                      QUERY PLAN
 ----------------------------------------------------
  Delete on public.a
-   Delete on public.a
-   Delete on public.b a_1
-   Delete on public.c a_2
-   Delete on public.d a_3
+   Delete on public.a a_1
+   Delete on public.b a_2
+   Delete on public.c a_3
+   Delete on public.d a_4
    CTE wcte
      ->  Insert on public.int8_tbl
            Output: int8_tbl.q2
            ->  Result
                  Output: '42'::bigint, '47'::bigint
-   ->  Nested Loop
-         Output: a.ctid, wcte.*
-         Join Filter: (a.aa = wcte.q2)
-         ->  Seq Scan on public.a
-               Output: a.ctid, a.aa
-         ->  CTE Scan on wcte
+   ->  Hash Join
+         Output: a.ctid, wcte.*, (0)
+         Hash Cond: (a.aa = wcte.q2)
+         ->  Append
+               ->  Seq Scan on public.a a_1
+                     Output: a_1.ctid, a_1.aa, 0
+               ->  Seq Scan on public.b a_2
+                     Output: a_2.ctid, a_2.aa, 1
+               ->  Seq Scan on public.c a_3
+                     Output: a_3.ctid, a_3.aa, 2
+               ->  Seq Scan on public.d a_4
+                     Output: a_4.ctid, a_4.aa, 3
+         ->  Hash
                Output: wcte.*, wcte.q2
-   ->  Nested Loop
-         Output: a_1.ctid, wcte.*
-         Join Filter: (a_1.aa = wcte.q2)
-         ->  Seq Scan on public.b a_1
-               Output: a_1.ctid, a_1.aa
-         ->  CTE Scan on wcte
-               Output: wcte.*, wcte.q2
-   ->  Nested Loop
-         Output: a_2.ctid, wcte.*
-         Join Filter: (a_2.aa = wcte.q2)
-         ->  Seq Scan on public.c a_2
-               Output: a_2.ctid, a_2.aa
-         ->  CTE Scan on wcte
-               Output: wcte.*, wcte.q2
-   ->  Nested Loop
-         Output: a_3.ctid, wcte.*
-         Join Filter: (a_3.aa = wcte.q2)
-         ->  Seq Scan on public.d a_3
-               Output: a_3.ctid, a_3.aa
-         ->  CTE Scan on wcte
-               Output: wcte.*, wcte.q2
-(38 rows)
+               ->  CTE Scan on wcte
+                     Output: wcte.*, wcte.q2
+(26 rows)

 -- error cases
 -- data-modifying WITH tries to use its own output
CREATE EXTENSION postgres_fdw;
DO $d$
    BEGIN
        EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
            OPTIONS (dbname '$$||current_database()||$$',
                     port '$$||current_setting('port')||$$'
            )$$;
    END;
$d$;
CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;

drop table if exists utrtest, loct2, loct3, loct4;

create table utrtest (a int, b text) partition by list (a);

create table locp (a int check (a in (1)), b text);
alter table utrtest attach partition locp for values in (1);
insert into utrtest values (1, 'one');

create table loct2 (a int check (a in (2)), b text);
create foreign table remp2 (a int check (a in (2)), b text)
  server loopback options (table_name 'loct2');
alter table utrtest attach partition remp2 for values in (2);
insert into utrtest values (2, 'two');

create table loct3 (a int check (a in (3)), b text);
create foreign table remp3 (a int check (a in (3)), b text)
  server loopback options (table_name 'loct3');
alter table utrtest attach partition remp3 for values in (3);
insert into utrtest values (3, 'three');

create table loct4 (a int check (a in (4)), b text);
create foreign table remp4 (a int check (a in (4)), b text)
  server loopback options (table_name 'loct4');
alter table utrtest attach partition remp4 for values in (4);
insert into utrtest values (4, 'four');

explain (verbose, costs off)
update utrtest set a = 1
  from generate_series(1,4) s(x) where a = s.x returning *;

Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
I wrote:
> ... which is what forced you to remove or lobotomize several regression
> test cases.  Now admittedly, that just moves the state of play for
> cross-partition updates into postgres_fdw partitions from "works
> sometimes" to "works never".  But I don't like the idea that we'll
> be taking away actual functionality.
> I have a blue-sky idea for fixing that properly, which is to disable FDW
> direct updates when there is a possibility of a cross-partition update,
> instead doing it the old way with a remote cursor reading the source rows
> for later UPDATEs.

After further poking at this, I realize that there is an independent reason
why a direct FDW update is unsuitable in a partitioned UPDATE: it fails to
cope with cases where a row needs to be moved *out* of a remote table.
(If you were smart and put a CHECK constraint equivalent to the partition
constraint on the remote table, you'll get a CHECK failure.  If you did
not do that, you just silently get the wrong behavior, with the row
updated where it is and thus no longer accessible via the partitioned
table.)  Again, backing off trying to use a direct update seems like
the right route to a fix.

So the short answer here is that postgres_fdw is about 75% broken for
cross-partition updates anyway, so making it 100% broken isn't giving
up as much as I thought.  Accordingly, I'm not going to consider that
issue to be a blocker for this patch.  Still, if anybody wants to
work on un-breaking it, that'd be great.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Sun, Mar 28, 2021 at 3:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > ... which is what forced you to remove or lobotomize several regression
> > test cases.  Now admittedly, that just moves the state of play for
> > cross-partition updates into postgres_fdw partitions from "works
> > sometimes" to "works never".  But I don't like the idea that we'll
> > be taking away actual functionality.
> > I have a blue-sky idea for fixing that properly, which is to disable FDW
> > direct updates when there is a possibility of a cross-partition update,
> > instead doing it the old way with a remote cursor reading the source rows
> > for later UPDATEs.
>
> After further poking at this, I realize that there is an independent reason
> why a direct FDW update is unsuitable in a partitioned UPDATE: it fails to
> cope with cases where a row needs to be moved *out* of a remote table.
> (If you were smart and put a CHECK constraint equivalent to the partition
> constraint on the remote table, you'll get a CHECK failure.  If you did
> not do that, you just silently get the wrong behavior, with the row
> updated where it is and thus no longer accessible via the partitioned
> table.)  Again, backing off trying to use a direct update seems like
> the right route to a fix.

Agreed.

> So the short answer here is that postgres_fdw is about 75% broken for
> cross-partition updates anyway, so making it 100% broken isn't giving
> up as much as I thought.  Accordingly, I'm not going to consider that
> issue to be a blocker for this patch.  Still, if anybody wants to
> work on un-breaking it, that'd be great.

Okay, I will give that a try once we're done with the main patch.

BTW, I had forgotten to update the description in postgres-fdw.sgml of
the current limitation, which is as follows:

===
Note also that postgres_fdw supports row movement invoked by UPDATE
statements executed on partitioned tables, but it currently does not
handle the case where a remote partition chosen to insert a moved row
into is also an UPDATE target partition that will be updated later.
===

I think we will need to take out the "...table will be updated later"
part at the end of the sentence.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Sun, Mar 28, 2021 at 1:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > Attached updated version of the patch.  I have forgotten to mention in
> > my recent posts on this thread one thing about 0001 that I had
> > mentioned upthread back in June.  That it currently fails a test in
> > postgres_fdw's suite due to a bug of cross-partition updates that I
> > decided at the time to pursue in another thread:
> > https://www.postgresql.org/message-id/CA%2BHiwqE_UK1jTSNrjb8mpTdivzd3dum6mK--xqKq0Y9VmfwWQA%40mail.gmail.com
>
> Yeah, I ran into that too.  I think we need not try to fix it in HEAD;
> we aren't likely to commit 0001 and 0002 separately.  We need some fix
> for the back branches, but that would better be discussed in the other
> thread.  (Note that the version of 0001 I attach below shows the actual
> output of the postgres_fdw test, including a failure from said bug.)

Okay, makes sense.

> I wanted to give a data dump of where I am.  I've reviewed and
> nontrivially modified 0001 and the executor parts of 0002, and
> I'm fairly happy with the state of that much of the code now.

Thanks a lot for that work.  I have looked at the changes and I agree
that updateColnosLists + ExecBuildUpdateProjection() looks much better
than updateTargetLists in the original patch.  Looking at
ExecBuildUpdateProjection(), I take back my comment upthread regarding
the performance characteristics of your approach, that the prepared
statements would suffer from having to build the update-new-tuple
projection(s) from scratch on every execution.

> (Note that 0002 below contains some cosmetic fixes, such as comments,
> that logically belong in 0001, but I didn't bother to tidy that up
> since I'm not seeing these as separate commits anyway.)
>
> The planner, however, still needs a lot of work.  There's a serious
> functional problem, in that UPDATEs across partition trees having
> more than one foreign table fail with
>
> ERROR:  junk column "wholerow" of child relation 5 conflicts with parent junk column with same name
>
> (cf. multiupdate.sql test case attached).

Oops, thanks for noticing that.

>  I think we could get around
> that by requiring "wholerow" junk attrs to have vartype RECORDOID instead
> of the particular table's rowtype, which might also remove the need for
> some of the vartype translation hacking in 0002.  But I haven't tried yet.

Sounds like that might work.

> More abstractly, I really dislike the "fake variable" design, primarily
> the aspect that you made the fake variables look like real columns of
> the parent table with attnums just beyond the last real one.  I think
> this is just a recipe for obscuring bugs, since it means you have to
> lobotomize a lot of bad-attnum error checks.  The alternative I'm
> considering is to invent a separate RTE that holds all the junk columns.
> Haven't tried that yet either.

Hmm, I did expect to hear a strong critique of that piece of code.  I
look forward to reviewing your alternative implementation.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> On Sun, Mar 28, 2021 at 1:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I wanted to give a data dump of where I am.  I've reviewed and
>> nontrivially modified 0001 and the executor parts of 0002, and
>> I'm fairly happy with the state of that much of the code now.

> Thanks a lot for that work.  I have looked at the changes and I agree
> that updateColnosLists + ExecBuildUpdateProjection() looks much better
> than updateTargetLists in the original patch.  Looking at
> ExecBuildUpdateProjection(), I take back my comment upthread regarding
> the performance characteristics of your approach, that the prepared
> statements would suffer from having to build the update-new-tuple
> projection(s) from scratch on every execution.

Yeah, I don't see any reason why the custom projection-build code
would be any slower than the regular path.  Related to this, though,
I was wondering whether we could get a useful win by having
nodeModifyTable.c be lazier about doing the per-target-table
initialization steps.  I think we have to open and lock all the
tables at start for semantic reasons, so maybe that swamps everything
else.  But we could avoid purely-internal setup steps, such as
building the slots and projection expressions, until the first time
a particular target is actually updated into.  This'd help if we've
failed to prune a lot of partitions that the update/delete won't
actually affect.

>> More abstractly, I really dislike the "fake variable" design, primarily
>> the aspect that you made the fake variables look like real columns of
>> the parent table with attnums just beyond the last real one.  I think
>> this is just a recipe for obscuring bugs, since it means you have to
>> lobotomize a lot of bad-attnum error checks.  The alternative I'm
>> considering is to invent a separate RTE that holds all the junk columns.
>> Haven't tried that yet either.

> Hmm, I did expect to hear a strong critique of that piece of code.  I
> look forward to reviewing your alternative implementation.

I got one version working over the weekend, but I didn't like the amount
of churn it forced in postgres_fdw (and, presumably, other FDWs).  Gimme
a day or so to try something else.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Mon, Mar 29, 2021 at 11:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Sun, Mar 28, 2021 at 1:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I wanted to give a data dump of where I am.  I've reviewed and
> >> nontrivially modified 0001 and the executor parts of 0002, and
> >> I'm fairly happy with the state of that much of the code now.
>
> > Thanks a lot for that work.  I have looked at the changes and I agree
> > that updateColnosLists + ExecBuildUpdateProjection() looks much better
> > than updateTargetLists in the original patch.  Looking at
> > ExecBuildUpdateProjection(), I take back my comment upthread regarding
> > the performance characteristics of your approach, that the prepared
> > statements would suffer from having to build the update-new-tuple
> > projection(s) from scratch on every execution.
>
> Yeah, I don't see any reason why the custom projection-build code
> would be any slower than the regular path.  Related to this, though,
> I was wondering whether we could get a useful win by having
> nodeModifyTable.c be lazier about doing the per-target-table
> initialization steps.
>
>  I think we have to open and lock all the
> tables at start for semantic reasons, so maybe that swamps everything
> else.  But we could avoid purely-internal setup steps, such as
> building the slots and projection expressions, until the first time
> a particular target is actually updated into.  This'd help if we've
> failed to prune a lot of partitions that the update/delete won't
> actually affect.

Oh, that is exactly what I have proposed in:

https://commitfest.postgresql.org/32/2621/

> >> More abstractly, I really dislike the "fake variable" design, primarily
> >> the aspect that you made the fake variables look like real columns of
> >> the parent table with attnums just beyond the last real one.  I think
> >> this is just a recipe for obscuring bugs, since it means you have to
> >> lobotomize a lot of bad-attnum error checks.  The alternative I'm
> >> considering is to invent a separate RTE that holds all the junk columns.
> >> Haven't tried that yet either.
>
> > Hmm, I did expect to hear a strong critique of that piece of code.  I
> > look forward to reviewing your alternative implementation.
>
> I got one version working over the weekend, but I didn't like the amount
> of churn it forced in postgres_fdw (and, presumably, other FDWs).  Gimme
> a day or so to try something else.

Sure, thanks again.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Here's a v13 patchset that I feel pretty good about.

My original thought for replacing the "fake variable" design was to
add another RTE holding the extra variables, and then have setrefs.c
translate the placeholder variables to the real thing at the last
possible moment.  I soon realized that instead of an actual RTE,
it'd be better to invent a special varno value akin to INDEX_VAR
(I called it ROWID_VAR, though I'm not wedded to that name).  Info
about the associated variables is kept in a list of RowIdentityVarInfo
structs, which are more suitable than a regular RTE would be.

I got that and the translate-in-setrefs approach more or less working,
but it was fairly messy, because the need to know about these special
variables spilled into FDWs and a lot of other places; for example
indxpath.c needed a special check for them when deciding if an
index-only scan is possible.  What turns out to be a lot cleaner is
to handle the translation in adjust_appendrel_attrs_mutator(), so that
we have converted to real variables by the time we reach any
relation-scan-level logic.

I did end up having to break the API for FDW AddForeignUpdateTargets
functions: they need to do things differently when adding junk columns,
and they need different parameters.  This seems all to the good though,
because the old API has been a backwards-compatibility hack for some
time (e.g., in not passing the "root" pointer).

Some other random notes:

* I was unimpressed with the idea of distinguishing different target
relations by embedding integer constants in the plan.  In the first
place, the implementation was extremely fragile --- there was
absolutely NOTHING tying the counter you used to the subplans' eventual
indexes in the ModifyTable lists.  Plus I don't have a lot of faith
that setrefs.c will reliably do what you want in terms of bubbling the
things up.  Maybe that could be made more robust, but the other problem
is that the EXPLAIN output is just about unreadable; nobody will
understand what "(0)" means.  So I went back to the idea of emitting
tableoid, and installed a hashtable plus a one-entry lookup cache
to make the run-time mapping as fast as I could.  I'm not necessarily
saying that this is how it has to be indefinitely, but I think we
need more work on planner and EXPLAIN infrastructure before we can
get the idea of directly providing a list index to work nicely.

* I didn't agree with your decision to remove the now-failing test
cases from postgres_fdw.sql.  I think it's better to leave them there,
especially in the cases where we were checking the plan as well as
the execution.  Hopefully we'll be able to un-break those soon.

* I updated a lot of hereby-obsoleted comments, which makes the patch
a bit longer than v12; but actually the code is a good bit smaller.
There's a noticeable net code savings in src/backend/optimizer/,
which there was not before.

I've not made any attempt to do performance testing on this,
but I think that's about the only thing standing between us
and committing this thing.

            regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6faf499f9a..cff23b0211 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1867,6 +1867,7 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
  * 'foreignrel' is the RelOptInfo for the target relation or the join relation
  *        containing all base relations in the query
  * 'targetlist' is the tlist of the underlying foreign-scan plan node
+ *        (note that this only contains new-value expressions and junk attrs)
  * 'targetAttrs' is the target columns of the UPDATE
  * 'remote_conds' is the qual clauses that must be evaluated remotely
  * '*params_list' is an output list of exprs that will become remote Params
@@ -1888,8 +1889,8 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
     deparse_expr_cxt context;
     int            nestlevel;
     bool        first;
-    ListCell   *lc;
     RangeTblEntry *rte = planner_rt_fetch(rtindex, root);
+    ListCell   *lc, *lc2;

     /* Set up context struct for recursion */
     context.root = root;
@@ -1908,14 +1909,13 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
     nestlevel = set_transmission_modes();

     first = true;
-    foreach(lc, targetAttrs)
+    forboth(lc, targetlist, lc2, targetAttrs)
     {
-        int            attnum = lfirst_int(lc);
-        TargetEntry *tle = get_tle_by_resno(targetlist, attnum);
+        TargetEntry *tle = lfirst_node(TargetEntry, lc);
+        int attnum = lfirst_int(lc2);

-        if (!tle)
-            elog(ERROR, "attribute number %d not found in UPDATE targetlist",
-                 attnum);
+        /* update's new-value expressions shouldn't be resjunk */
+        Assert(!tle->resjunk);

         if (!first)
             appendStringInfoString(buf, ", ");
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0649b6b81c..b46e7e623f 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -5503,13 +5503,13 @@ UPDATE ft2 AS target SET (c2, c7) = (
         FROM ft2 AS src
         WHERE target.c1 = src.c1
 ) WHERE c1 > 1100;
-                                                                    QUERY PLAN
                            

----------------------------------------------------------------------------------------------------------------------------------------------------
+                                                      QUERY PLAN


+-----------------------------------------------------------------------------------------------------------------------
  Update on public.ft2 target
    Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2, c7 = $3 WHERE ctid = $1
    ->  Foreign Scan on public.ft2 target
-         Output: target.c1, $1, NULL::integer, target.c3, target.c4, target.c5, target.c6, $2, target.c8, (SubPlan 1
(returns$1,$2)), target.ctid 
-         Remote SQL: SELECT "C 1", c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 1100)) FOR UPDATE
+         Output: $1, $2, (SubPlan 1 (returns $1,$2)), target.ctid, target.*
+         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 1100)) FOR UPDATE
          SubPlan 1 (returns $1,$2)
            ->  Foreign Scan on public.ft2 src
                  Output: (src.c2 * 10), src.c7
@@ -5539,9 +5539,9 @@ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
    ->  Foreign Scan on public.ft2
-         Output: c1, c2, NULL::integer, 'bar'::text, c4, c5, c6, c7, c8, ctid
+         Output: 'bar'::text, ctid, ft2.*
          Filter: (postgres_fdw_abs(ft2.c1) > 2000)
-         Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" FOR UPDATE
+         Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" FOR UPDATE
 (7 rows)

 UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
@@ -5570,11 +5570,11 @@ UPDATE ft2 SET c3 = 'baz'
    Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2,
ft5.c3
    Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
    ->  Nested Loop
-         Output: ft2.c1, ft2.c2, NULL::integer, 'baz'::text, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid, ft4.*,
ft5.*,ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3 
+         Output: 'baz'::text, ft2.ctid, ft2.*, ft4.*, ft5.*, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
          Join Filter: (ft2.c2 === ft4.c1)
          ->  Foreign Scan on public.ft2
-               Output: ft2.c1, ft2.c2, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.ctid
-               Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR
UPDATE
+               Output: ft2.ctid, ft2.*, ft2.c2
+               Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR
UPDATE
          ->  Foreign Scan
                Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
                Relations: (public.ft4) INNER JOIN (public.ft5)
@@ -6266,7 +6266,7 @@ UPDATE rw_view SET b = b + 5;
  Update on public.foreign_tbl
    Remote SQL: UPDATE public.base_tbl SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Foreign Scan on public.foreign_tbl
-         Output: foreign_tbl.a, (foreign_tbl.b + 5), foreign_tbl.ctid
+         Output: (foreign_tbl.b + 5), foreign_tbl.ctid, foreign_tbl.*
          Remote SQL: SELECT a, b, ctid FROM public.base_tbl WHERE ((a < b)) FOR UPDATE
 (5 rows)

@@ -6280,7 +6280,7 @@ UPDATE rw_view SET b = b + 15;
  Update on public.foreign_tbl
    Remote SQL: UPDATE public.base_tbl SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Foreign Scan on public.foreign_tbl
-         Output: foreign_tbl.a, (foreign_tbl.b + 15), foreign_tbl.ctid
+         Output: (foreign_tbl.b + 15), foreign_tbl.ctid, foreign_tbl.*
          Remote SQL: SELECT a, b, ctid FROM public.base_tbl WHERE ((a < b)) FOR UPDATE
 (5 rows)

@@ -6354,7 +6354,7 @@ UPDATE rw_view SET b = b + 5;
    Foreign Update on public.foreign_tbl parent_tbl_1
      Remote SQL: UPDATE public.child_tbl SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Foreign Scan on public.foreign_tbl parent_tbl_1
-         Output: parent_tbl_1.a, (parent_tbl_1.b + 5), parent_tbl_1.ctid
+         Output: (parent_tbl_1.b + 5), parent_tbl_1.ctid, parent_tbl_1.*
          Remote SQL: SELECT a, b, ctid FROM public.child_tbl WHERE ((a < b)) FOR UPDATE
 (6 rows)

@@ -6369,7 +6369,7 @@ UPDATE rw_view SET b = b + 15;
    Foreign Update on public.foreign_tbl parent_tbl_1
      Remote SQL: UPDATE public.child_tbl SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Foreign Scan on public.foreign_tbl parent_tbl_1
-         Output: parent_tbl_1.a, (parent_tbl_1.b + 15), parent_tbl_1.ctid
+         Output: (parent_tbl_1.b + 15), parent_tbl_1.ctid, parent_tbl_1.*
          Remote SQL: SELECT a, b, ctid FROM public.child_tbl WHERE ((a < b)) FOR UPDATE
 (6 rows)

@@ -6686,7 +6686,7 @@ UPDATE rem1 set f1 = 10;          -- all columns should be transmitted
  Update on public.rem1
    Remote SQL: UPDATE public.loc1 SET f1 = $2, f2 = $3 WHERE ctid = $1
    ->  Foreign Scan on public.rem1
-         Output: 10, f2, ctid, rem1.*
+         Output: 10, ctid, rem1.*
          Remote SQL: SELECT f1, f2, ctid FROM public.loc1 FOR UPDATE
 (5 rows)

@@ -6919,7 +6919,7 @@ UPDATE rem1 set f2 = '';          -- can't be pushed down
  Update on public.rem1
    Remote SQL: UPDATE public.loc1 SET f1 = $2, f2 = $3 WHERE ctid = $1
    ->  Foreign Scan on public.rem1
-         Output: f1, ''::text, ctid, rem1.*
+         Output: ''::text, ctid, rem1.*
          Remote SQL: SELECT f1, f2, ctid FROM public.loc1 FOR UPDATE
 (5 rows)

@@ -6943,7 +6943,7 @@ UPDATE rem1 set f2 = '';          -- can't be pushed down
  Update on public.rem1
    Remote SQL: UPDATE public.loc1 SET f2 = $2 WHERE ctid = $1 RETURNING f1, f2
    ->  Foreign Scan on public.rem1
-         Output: f1, ''::text, ctid, rem1.*
+         Output: ''::text, ctid, rem1.*
          Remote SQL: SELECT f1, f2, ctid FROM public.loc1 FOR UPDATE
 (5 rows)

@@ -7253,18 +7253,18 @@ select * from bar where f1 in (select f1 from foo) for share;
 -- Check UPDATE with inherited target and an inherited source table
 explain (verbose, costs off)
 update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
-                                           QUERY PLAN
--------------------------------------------------------------------------------------------------
+                                      QUERY PLAN
+---------------------------------------------------------------------------------------
  Update on public.bar
    Update on public.bar
    Foreign Update on public.bar2 bar_1
      Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1
    ->  Hash Join
-         Output: bar.f1, (bar.f2 + 100), bar.ctid, foo.ctid, foo.*, foo.tableoid
+         Output: (bar.f2 + 100), bar.ctid, foo.ctid, foo.*, foo.tableoid
          Inner Unique: true
          Hash Cond: (bar.f1 = foo.f1)
          ->  Seq Scan on public.bar
-               Output: bar.f1, bar.f2, bar.ctid
+               Output: bar.f2, bar.ctid, bar.f1
          ->  Hash
                Output: foo.ctid, foo.f1, foo.*, foo.tableoid
                ->  HashAggregate
@@ -7277,11 +7277,11 @@ update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
                                  Output: foo_2.ctid, foo_2.f1, foo_2.*, foo_2.tableoid
                                  Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct1
    ->  Hash Join
-         Output: bar_1.f1, (bar_1.f2 + 100), bar_1.f3, bar_1.ctid, foo.ctid, foo.*, foo.tableoid
+         Output: (bar_1.f2 + 100), bar_1.ctid, bar_1.*, foo.ctid, foo.*, foo.tableoid
          Inner Unique: true
          Hash Cond: (bar_1.f1 = foo.f1)
          ->  Foreign Scan on public.bar2 bar_1
-               Output: bar_1.f1, bar_1.f2, bar_1.f3, bar_1.ctid
+               Output: bar_1.f2, bar_1.ctid, bar_1.*, bar_1.f1
                Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
          ->  Hash
                Output: foo.ctid, foo.f1, foo.*, foo.tableoid
@@ -7321,7 +7321,7 @@ where bar.f1 = ss.f1;
    Foreign Update on public.bar2 bar_1
      Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1
    ->  Hash Join
-         Output: bar.f1, (bar.f2 + 100), bar.ctid, (ROW(foo.f1))
+         Output: (bar.f2 + 100), bar.ctid, (ROW(foo.f1))
          Hash Cond: (foo.f1 = bar.f1)
          ->  Append
                ->  Seq Scan on public.foo
@@ -7335,17 +7335,17 @@ where bar.f1 = ss.f1;
                      Output: ROW((foo_3.f1 + 3)), (foo_3.f1 + 3)
                      Remote SQL: SELECT f1 FROM public.loct1
          ->  Hash
-               Output: bar.f1, bar.f2, bar.ctid
+               Output: bar.f2, bar.ctid, bar.f1
                ->  Seq Scan on public.bar
-                     Output: bar.f1, bar.f2, bar.ctid
+                     Output: bar.f2, bar.ctid, bar.f1
    ->  Merge Join
-         Output: bar_1.f1, (bar_1.f2 + 100), bar_1.f3, bar_1.ctid, (ROW(foo.f1))
+         Output: (bar_1.f2 + 100), bar_1.ctid, bar_1.*, (ROW(foo.f1))
          Merge Cond: (bar_1.f1 = foo.f1)
          ->  Sort
-               Output: bar_1.f1, bar_1.f2, bar_1.f3, bar_1.ctid
+               Output: bar_1.f2, bar_1.ctid, bar_1.*, bar_1.f1
                Sort Key: bar_1.f1
                ->  Foreign Scan on public.bar2 bar_1
-                     Output: bar_1.f1, bar_1.f2, bar_1.f3, bar_1.ctid
+                     Output: bar_1.f2, bar_1.ctid, bar_1.*, bar_1.f1
                      Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
          ->  Sort
                Output: (ROW(foo.f1)), foo.f1
@@ -7519,7 +7519,7 @@ update bar set f2 = f2 + 100 returning *;
    Update on public.bar
    Foreign Update on public.bar2 bar_1
    ->  Seq Scan on public.bar
-         Output: bar.f1, (bar.f2 + 100), bar.ctid
+         Output: (bar.f2 + 100), bar.ctid
    ->  Foreign Update on public.bar2 bar_1
          Remote SQL: UPDATE public.loct2 SET f2 = (f2 + 100) RETURNING f1, f2
 (8 rows)
@@ -7551,9 +7551,9 @@ update bar set f2 = f2 + 100;
    Foreign Update on public.bar2 bar_1
      Remote SQL: UPDATE public.loct2 SET f1 = $2, f2 = $3, f3 = $4 WHERE ctid = $1 RETURNING f1, f2, f3
    ->  Seq Scan on public.bar
-         Output: bar.f1, (bar.f2 + 100), bar.ctid
+         Output: (bar.f2 + 100), bar.ctid
    ->  Foreign Scan on public.bar2 bar_1
-         Output: bar_1.f1, (bar_1.f2 + 100), bar_1.f3, bar_1.ctid, bar_1.*
+         Output: (bar_1.f2 + 100), bar_1.ctid, bar_1.*
          Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
 (9 rows)

@@ -7622,10 +7622,10 @@ update parent set b = parent.b || remt2.b from remt2 where parent.a = remt2.a re
    Update on public.parent
    Foreign Update on public.remt1 parent_1
    ->  Nested Loop
-         Output: parent.a, (parent.b || remt2.b), parent.ctid, remt2.*, remt2.a, remt2.b
+         Output: (parent.b || remt2.b), parent.ctid, remt2.*, remt2.a, remt2.b
          Join Filter: (parent.a = remt2.a)
          ->  Seq Scan on public.parent
-               Output: parent.a, parent.b, parent.ctid
+               Output: parent.b, parent.ctid, parent.a
          ->  Foreign Scan on public.remt2
                Output: remt2.b, remt2.*, remt2.a
                Remote SQL: SELECT a, b FROM public.loct2
@@ -7880,7 +7880,7 @@ update utrtest set a = 1 where a = 1 or a = 2 returning *;
    ->  Foreign Update on public.remp utrtest_1
          Remote SQL: UPDATE public.loct SET a = 1 WHERE (((a = 1) OR (a = 2))) RETURNING a, b
    ->  Seq Scan on public.locp utrtest_2
-         Output: 1, utrtest_2.b, utrtest_2.ctid
+         Output: 1, utrtest_2.ctid
          Filter: ((utrtest_2.a = 1) OR (utrtest_2.a = 2))
 (9 rows)

@@ -7896,13 +7896,13 @@ insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition isn't a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 2 returning *;
-                   QUERY PLAN
-------------------------------------------------
+               QUERY PLAN
+-----------------------------------------
  Update on public.utrtest
    Output: utrtest_1.a, utrtest_1.b
    Update on public.locp utrtest_1
    ->  Seq Scan on public.locp utrtest_1
-         Output: 1, utrtest_1.b, utrtest_1.ctid
+         Output: 1, utrtest_1.ctid
          Filter: (utrtest_1.a = 2)
 (6 rows)

@@ -7932,7 +7932,7 @@ update utrtest set a = 1 returning *;
    ->  Foreign Update on public.remp utrtest_1
          Remote SQL: UPDATE public.loct SET a = 1 RETURNING a, b
    ->  Seq Scan on public.locp utrtest_2
-         Output: 1, utrtest_2.b, utrtest_2.ctid
+         Output: 1, utrtest_2.ctid
 (8 rows)

 update utrtest set a = 1 returning *;
@@ -7956,20 +7956,20 @@ update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
      Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
    Update on public.locp utrtest_2
    ->  Hash Join
-         Output: 1, utrtest_1.b, utrtest_1.ctid, "*VALUES*".*, "*VALUES*".column1
+         Output: 1, utrtest_1.ctid, utrtest_1.*, "*VALUES*".*, "*VALUES*".column1
          Hash Cond: (utrtest_1.a = "*VALUES*".column1)
          ->  Foreign Scan on public.remp utrtest_1
-               Output: utrtest_1.b, utrtest_1.ctid, utrtest_1.a
+               Output: utrtest_1.ctid, utrtest_1.*, utrtest_1.a
                Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
          ->  Hash
                Output: "*VALUES*".*, "*VALUES*".column1
                ->  Values Scan on "*VALUES*"
                      Output: "*VALUES*".*, "*VALUES*".column1
    ->  Hash Join
-         Output: 1, utrtest_2.b, utrtest_2.ctid, "*VALUES*".*, "*VALUES*".column1
+         Output: 1, utrtest_2.ctid, "*VALUES*".*, "*VALUES*".column1
          Hash Cond: (utrtest_2.a = "*VALUES*".column1)
          ->  Seq Scan on public.locp utrtest_2
-               Output: utrtest_2.b, utrtest_2.ctid, utrtest_2.a
+               Output: utrtest_2.ctid, utrtest_2.a
          ->  Hash
                Output: "*VALUES*".*, "*VALUES*".column1
                ->  Values Scan on "*VALUES*"
@@ -7977,12 +7977,7 @@ update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
 (24 rows)

 update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
- a |  b  | x
----+-----+---
- 1 | foo | 1
- 1 | qux | 2
-(2 rows)
-
+ERROR:  invalid attribute number 5
 -- Change the definition of utrtest so that the foreign partition get updated
 -- after the local partition
 delete from utrtest;
@@ -8005,7 +8000,7 @@ update utrtest set a = 3 returning *;
    Update on public.locp utrtest_1
    Foreign Update on public.remp utrtest_2
    ->  Seq Scan on public.locp utrtest_1
-         Output: 3, utrtest_1.b, utrtest_1.ctid
+         Output: 3, utrtest_1.ctid
    ->  Foreign Update on public.remp utrtest_2
          Remote SQL: UPDATE public.loct SET a = 3 RETURNING a, b
 (8 rows)
@@ -8023,19 +8018,19 @@ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
    Foreign Update on public.remp utrtest_2
      Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
    ->  Hash Join
-         Output: 3, utrtest_1.b, utrtest_1.ctid, "*VALUES*".*, "*VALUES*".column1
+         Output: 3, utrtest_1.ctid, "*VALUES*".*, "*VALUES*".column1
          Hash Cond: (utrtest_1.a = "*VALUES*".column1)
          ->  Seq Scan on public.locp utrtest_1
-               Output: utrtest_1.b, utrtest_1.ctid, utrtest_1.a
+               Output: utrtest_1.ctid, utrtest_1.a
          ->  Hash
                Output: "*VALUES*".*, "*VALUES*".column1
                ->  Values Scan on "*VALUES*"
                      Output: "*VALUES*".*, "*VALUES*".column1
    ->  Hash Join
-         Output: 3, utrtest_2.b, utrtest_2.ctid, "*VALUES*".*, "*VALUES*".column1
+         Output: 3, utrtest_2.ctid, utrtest_2.*, "*VALUES*".*, "*VALUES*".column1
          Hash Cond: (utrtest_2.a = "*VALUES*".column1)
          ->  Foreign Scan on public.remp utrtest_2
-               Output: utrtest_2.b, utrtest_2.ctid, utrtest_2.a
+               Output: utrtest_2.ctid, utrtest_2.*, utrtest_2.a
                Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
          ->  Hash
                Output: "*VALUES*".*, "*VALUES*".column1
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 35b48575c5..6ba6786c8b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2322,32 +2322,26 @@ postgresPlanDirectModify(PlannerInfo *root,
      */
     if (operation == CMD_UPDATE)
     {
-        int            col;
+        ListCell *lc, *lc2;

         /*
-         * We transmit only columns that were explicitly targets of the
-         * UPDATE, so as to avoid unnecessary data transmission.
+         * The expressions of concern are the first N columns of the subplan
+         * targetlist, where N is the length of root->update_colnos.
          */
-        col = -1;
-        while ((col = bms_next_member(rte->updatedCols, col)) >= 0)
+        targetAttrs = root->update_colnos;
+        forboth(lc, subplan->targetlist, lc2, targetAttrs)
         {
-            /* bit numbers are offset by FirstLowInvalidHeapAttributeNumber */
-            AttrNumber    attno = col + FirstLowInvalidHeapAttributeNumber;
-            TargetEntry *tle;
+            TargetEntry *tle = lfirst_node(TargetEntry, lc);
+            AttrNumber attno = lfirst_int(lc2);
+
+            /* update's new-value expressions shouldn't be resjunk */
+            Assert(!tle->resjunk);

             if (attno <= InvalidAttrNumber) /* shouldn't happen */
                 elog(ERROR, "system-column update is not supported");

-            tle = get_tle_by_resno(subplan->targetlist, attno);
-
-            if (!tle)
-                elog(ERROR, "attribute number %d not found in subplan targetlist",
-                     attno);
-
             if (!is_foreign_expr(root, foreignrel, (Expr *) tle->expr))
                 return false;
-
-            targetAttrs = lappend_int(targetAttrs, attno);
         }
     }

diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 04bc052ee8..6989957d50 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -703,10 +703,14 @@ ExecForeignUpdate(EState *estate,
      <literal>slot</literal> contains the new data for the tuple; it will match the
      row-type definition of the foreign table.
      <literal>planSlot</literal> contains the tuple that was generated by the
-     <structname>ModifyTable</structname> plan node's subplan; it differs from
-     <literal>slot</literal> in possibly containing additional <quote>junk</quote>
-     columns.  In particular, any junk columns that were requested by
-     <function>AddForeignUpdateTargets</function> will be available from this slot.
+     <structname>ModifyTable</structname> plan node's subplan.  Unlike
+     <literal>slot</literal>, this tuple contains only the new values for
+     columns changed by the query, so do not rely on attribute numbers of the
+     foreign table to index into <literal>planSlot</literal>.
+     Also, <literal>planSlot</literal> typically contains
+     additional <quote>junk</quote> columns.  In particular, any junk columns
+     that were requested by <function>AddForeignUpdateTargets</function> will
+     be available from this slot.
     </para>

     <para>
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 7383d5994e..a53070f602 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2724,20 +2724,22 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
         /*
          * In READ COMMITTED isolation level it's possible that target tuple
          * was changed due to concurrent update.  In that case we have a raw
-         * subplan output tuple in epqslot_candidate, and need to run it
-         * through the junk filter to produce an insertable tuple.
+         * subplan output tuple in epqslot_candidate, and need to form a new
+         * insertable tuple using ExecGetUpdateNewTuple to replace the one
+         * we received in newslot.  Neither we nor our callers have any
+         * further interest in the passed-in tuple, so it's okay to overwrite
+         * newslot with the newer data.
          *
-         * Caution: more than likely, the passed-in slot is the same as the
-         * junkfilter's output slot, so we are clobbering the original value
-         * of slottuple by doing the filtering.  This is OK since neither we
-         * nor our caller have any more interest in the prior contents of that
-         * slot.
+         * (Typically, newslot was also generated by ExecGetUpdateNewTuple, so
+         * that epqslot_clean will be that same slot and the copy step below
+         * is not needed.)
          */
         if (epqslot_candidate != NULL)
         {
             TupleTableSlot *epqslot_clean;

-            epqslot_clean = ExecFilterJunk(relinfo->ri_junkFilter, epqslot_candidate);
+            epqslot_clean = ExecGetUpdateNewTuple(relinfo, epqslot_candidate,
+                                                  oldslot);

             if (newslot != epqslot_clean)
                 ExecCopySlot(newslot, epqslot_clean);
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2e463f5499..a3937f3e66 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -477,6 +477,204 @@ ExecBuildProjectionInfo(List *targetList,
     return projInfo;
 }

+/*
+ *        ExecBuildUpdateProjection
+ *
+ * Build a ProjectionInfo node for constructing a new tuple during UPDATE.
+ * The projection will be executed in the given econtext and the result will
+ * be stored into the given tuple slot.  (Caller must have ensured that tuple
+ * slot has a descriptor matching the target rel!)
+ *
+ * subTargetList is the tlist of the subplan node feeding ModifyTable.
+ * We use this mainly to cross-check that the expressions being assigned
+ * are of the correct types.  The values from this tlist are assumed to be
+ * available from the "outer" tuple slot.  They are assigned to target columns
+ * listed in the corresponding targetColnos elements.  (Only non-resjunk tlist
+ * entries are assigned.)  Columns not listed in targetColnos are filled from
+ * the UPDATE's old tuple, which is assumed to be available in the "scan"
+ * tuple slot.
+ *
+ * relDesc must describe the relation we intend to update.
+ *
+ * This is basically a specialized variant of ExecBuildProjectionInfo.
+ * However, it also performs sanity checks equivalent to ExecCheckPlanOutput.
+ * Since we never make a normal tlist equivalent to the whole
+ * tuple-to-be-assigned, there is no convenient way to apply
+ * ExecCheckPlanOutput, so we must do our safety checks here.
+ */
+ProjectionInfo *
+ExecBuildUpdateProjection(List *subTargetList,
+                          List *targetColnos,
+                          TupleDesc relDesc,
+                          ExprContext *econtext,
+                          TupleTableSlot *slot,
+                          PlanState *parent)
+{
+    ProjectionInfo *projInfo = makeNode(ProjectionInfo);
+    ExprState  *state;
+    int nAssignableCols;
+    bool sawJunk;
+    Bitmapset*assignedCols;
+    LastAttnumInfo deform = {0, 0, 0};
+    ExprEvalStep scratch = {0};
+    int outerattnum;
+    ListCell   *lc, *lc2;
+
+    projInfo->pi_exprContext = econtext;
+    /* We embed ExprState into ProjectionInfo instead of doing extra palloc */
+    projInfo->pi_state.tag = T_ExprState;
+    state = &projInfo->pi_state;
+    state->expr = NULL;            /* not used */
+    state->parent = parent;
+    state->ext_params = NULL;
+
+    state->resultslot = slot;
+
+    /*
+     * Examine the subplan tlist to see how many non-junk columns there are,
+     * and to verify that the non-junk columns come before the junk ones.
+     */
+    nAssignableCols = 0;
+    sawJunk = false;
+    foreach(lc, subTargetList)
+    {
+        TargetEntry *tle = lfirst_node(TargetEntry, lc);
+
+        if (tle->resjunk)
+            sawJunk = true;
+        else
+        {
+            if (sawJunk)
+                elog(ERROR, "subplan target list is out of order");
+            nAssignableCols++;
+        }
+    }
+
+    /* We should have one targetColnos entry per non-junk column */
+    if (nAssignableCols != list_length(targetColnos))
+        elog(ERROR, "targetColnos does not match subplan target list");
+
+    /*
+     * Build a bitmapset of the columns in targetColnos.  (We could just
+     * use list_member_int() tests, but that risks O(N^2) behavior with
+     * many columns.)
+     */
+    assignedCols = NULL;
+    foreach(lc, targetColnos)
+    {
+        AttrNumber    targetattnum = lfirst_int(lc);
+
+        assignedCols = bms_add_member(assignedCols, targetattnum);
+    }
+
+    /*
+     * We want to insert EEOP_*_FETCHSOME steps to ensure the outer and scan
+     * tuples are sufficiently deconstructed.  Outer tuple is easy, but for
+     * scan tuple we must find out the last old column we need.
+     */
+    deform.last_outer = nAssignableCols;
+
+    for (int attnum = relDesc->natts; attnum > 0; attnum--)
+    {
+        Form_pg_attribute attr = TupleDescAttr(relDesc, attnum - 1);
+        if (attr->attisdropped)
+            continue;
+        if (bms_is_member(attnum, assignedCols))
+            continue;
+        deform.last_scan = attnum;
+        break;
+    }
+
+    ExecPushExprSlots(state, &deform);
+
+    /*
+     * Now generate code to fetch data from the outer tuple, incidentally
+     * validating that it'll be of the right type.  The checks above ensure
+     * that the forboth() will iterate over exactly the non-junk columns.
+     */
+    outerattnum = 0;
+    forboth(lc, subTargetList, lc2, targetColnos)
+    {
+        TargetEntry *tle = lfirst_node(TargetEntry, lc);
+        AttrNumber    targetattnum = lfirst_int(lc2);
+        Form_pg_attribute attr;
+
+        Assert(!tle->resjunk);
+
+        /*
+         * Apply sanity checks comparable to ExecCheckPlanOutput().
+         */
+        if (targetattnum <= 0 || targetattnum > relDesc->natts)
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("table row type and query-specified row type do not match"),
+                     errdetail("Query has too many columns.")));
+        attr = TupleDescAttr(relDesc, targetattnum - 1);
+
+        if (attr->attisdropped)
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("table row type and query-specified row type do not match"),
+                     errdetail("Query provides a value for a dropped column at ordinal position %d.",
+                               targetattnum)));
+        if (exprType((Node *) tle->expr) != attr->atttypid)
+            ereport(ERROR,
+                    (errcode(ERRCODE_DATATYPE_MISMATCH),
+                     errmsg("table row type and query-specified row type do not match"),
+                     errdetail("Table has type %s at ordinal position %d, but query expects %s.",
+                               format_type_be(attr->atttypid),
+                               targetattnum,
+                               format_type_be(exprType((Node *) tle->expr)))));
+
+        /*
+         * OK, build an outer-tuple reference.
+         */
+        scratch.opcode = EEOP_ASSIGN_OUTER_VAR;
+        scratch.d.assign_var.attnum = outerattnum++;
+        scratch.d.assign_var.resultnum = targetattnum - 1;
+        ExprEvalPushStep(state, &scratch);
+    }
+
+    /*
+     * Now generate code to copy over any old columns that were not assigned
+     * to, and to ensure that dropped columns are set to NULL.
+     */
+    for (int attnum = 1; attnum <= relDesc->natts; attnum++)
+    {
+        Form_pg_attribute attr = TupleDescAttr(relDesc, attnum - 1);
+
+        if (attr->attisdropped)
+        {
+            /* Put a null into the ExprState's resvalue/resnull ... */
+            scratch.opcode = EEOP_CONST;
+            scratch.resvalue = &state->resvalue;
+            scratch.resnull = &state->resnull;
+            scratch.d.constval.value = (Datum) 0;
+            scratch.d.constval.isnull = true;
+            ExprEvalPushStep(state, &scratch);
+            /* ... then assign it to the result slot */
+            scratch.opcode = EEOP_ASSIGN_TMP;
+            scratch.d.assign_tmp.resultnum = attnum - 1;
+            ExprEvalPushStep(state, &scratch);
+        }
+        else if (!bms_is_member(attnum, assignedCols))
+        {
+            /* Certainly the right type, so needn't check */
+            scratch.opcode = EEOP_ASSIGN_SCAN_VAR;
+            scratch.d.assign_var.attnum = attnum - 1;
+            scratch.d.assign_var.resultnum = attnum - 1;
+            ExprEvalPushStep(state, &scratch);
+        }
+    }
+
+    scratch.opcode = EEOP_DONE;
+    ExprEvalPushStep(state, &scratch);
+
+    ExecReadyExpr(state);
+
+    return projInfo;
+}
+
 /*
  * ExecPrepareExpr --- initialize for expression execution outside a normal
  * Plan tree context.
diff --git a/src/backend/executor/execJunk.c b/src/backend/executor/execJunk.c
index 970e1c325e..2e0bcbbede 100644
--- a/src/backend/executor/execJunk.c
+++ b/src/backend/executor/execJunk.c
@@ -59,43 +59,16 @@
 JunkFilter *
 ExecInitJunkFilter(List *targetList, TupleTableSlot *slot)
 {
+    JunkFilter *junkfilter;
     TupleDesc    cleanTupType;
+    int            cleanLength;
+    AttrNumber *cleanMap;

     /*
      * Compute the tuple descriptor for the cleaned tuple.
      */
     cleanTupType = ExecCleanTypeFromTL(targetList);

-    /*
-     * The rest is the same as ExecInitJunkFilterInsertion, ie, we want to map
-     * every non-junk targetlist column into the output tuple.
-     */
-    return ExecInitJunkFilterInsertion(targetList, cleanTupType, slot);
-}
-
-/*
- * ExecInitJunkFilterInsertion
- *
- * Initialize a JunkFilter for insertions into a table.
- *
- * Here, we are given the target "clean" tuple descriptor rather than
- * inferring it from the targetlist.  Although the target descriptor can
- * contain deleted columns, that is not of concern here, since the targetlist
- * should contain corresponding NULL constants (cf. ExecCheckPlanOutput).
- * It is assumed that the caller has checked that the table's columns match up
- * with the non-junk columns of the targetlist.
- */
-JunkFilter *
-ExecInitJunkFilterInsertion(List *targetList,
-                            TupleDesc cleanTupType,
-                            TupleTableSlot *slot)
-{
-    JunkFilter *junkfilter;
-    int            cleanLength;
-    AttrNumber *cleanMap;
-    ListCell   *t;
-    AttrNumber    cleanResno;
-
     /*
      * Use the given slot, or make a new slot if we weren't given one.
      */
@@ -117,6 +90,9 @@ ExecInitJunkFilterInsertion(List *targetList,
     cleanLength = cleanTupType->natts;
     if (cleanLength > 0)
     {
+        AttrNumber    cleanResno;
+        ListCell   *t;
+
         cleanMap = (AttrNumber *) palloc(cleanLength * sizeof(AttrNumber));
         cleanResno = 0;
         foreach(t, targetList)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 8de78ada63..ea1530e032 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1217,11 +1217,14 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
         resultRelInfo->ri_FdwRoutine = NULL;

     /* The following fields are set later if needed */
+    resultRelInfo->ri_RowIdAttNo = 0;
+    resultRelInfo->ri_projectNew = NULL;
+    resultRelInfo->ri_newTupleSlot = NULL;
+    resultRelInfo->ri_oldTupleSlot = NULL;
     resultRelInfo->ri_FdwState = NULL;
     resultRelInfo->ri_usesFdwDirectModify = false;
     resultRelInfo->ri_ConstraintExprs = NULL;
     resultRelInfo->ri_GeneratedExprs = NULL;
-    resultRelInfo->ri_junkFilter = NULL;
     resultRelInfo->ri_projectReturning = NULL;
     resultRelInfo->ri_onConflictArbiterIndexes = NIL;
     resultRelInfo->ri_onConflict = NULL;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2993ba43e3..b9064bfe66 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -81,7 +81,7 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
                                                ResultRelInfo **partRelInfo);

 /*
- * Verify that the tuples to be produced by INSERT or UPDATE match the
+ * Verify that the tuples to be produced by INSERT match the
  * target relation's rowtype
  *
  * We do this to guard against stale plans.  If plan invalidation is
@@ -91,6 +91,9 @@ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
  *
  * The plan output is represented by its targetlist, because that makes
  * handling the dropped-column case easier.
+ *
+ * We used to use this for UPDATE as well, but now the equivalent checks
+ * are done in ExecBuildUpdateProjection.
  */
 static void
 ExecCheckPlanOutput(Relation resultRel, List *targetList)
@@ -104,8 +107,7 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
         TargetEntry *tle = (TargetEntry *) lfirst(lc);
         Form_pg_attribute attr;

-        if (tle->resjunk)
-            continue;            /* ignore junk tlist items */
+        Assert(!tle->resjunk);    /* caller removed junk items already */

         if (attno >= resultDesc->natts)
             ereport(ERROR,
@@ -367,6 +369,55 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
     MemoryContextSwitchTo(oldContext);
 }

+/*
+ * ExecGetInsertNewTuple
+ *        This prepares a "new" tuple ready to be inserted into given result
+ *        relation by removing any junk columns of the plan's output tuple.
+ *
+ * Note: currently, this is really dead code, because INSERT cases don't
+ * receive any junk columns so there's never a projection to be done.
+ */
+static TupleTableSlot *
+ExecGetInsertNewTuple(ResultRelInfo *relinfo,
+                      TupleTableSlot *planSlot)
+{
+    ProjectionInfo *newProj = relinfo->ri_projectNew;
+    ExprContext   *econtext;
+
+    if (newProj == NULL)
+        return planSlot;
+
+    econtext = newProj->pi_exprContext;
+    econtext->ecxt_outertuple = planSlot;
+    return ExecProject(newProj);
+}
+
+/*
+ * ExecGetUpdateNewTuple
+ *        This prepares a "new" tuple by combining an UPDATE subplan's output
+ *        tuple (which contains values of changed columns) with unchanged
+ *        columns taken from the old tuple.  The subplan tuple might also
+ *        contain junk columns, which are ignored.
+ */
+TupleTableSlot *
+ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
+                      TupleTableSlot *planSlot,
+                      TupleTableSlot *oldSlot)
+{
+    ProjectionInfo *newProj = relinfo->ri_projectNew;
+    ExprContext   *econtext;
+
+    Assert(newProj != NULL);
+    Assert(planSlot != NULL && !TTS_EMPTY(planSlot));
+    Assert(oldSlot != NULL && !TTS_EMPTY(oldSlot));
+
+    econtext = newProj->pi_exprContext;
+    econtext->ecxt_outertuple = planSlot;
+    econtext->ecxt_scantuple = oldSlot;
+    return ExecProject(newProj);
+}
+
+
 /* ----------------------------------------------------------------
  *        ExecInsert
  *
@@ -374,6 +425,10 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
  *        (or partition thereof) and insert appropriate tuples into the index
  *        relations.
  *
+ *        slot contains the new tuple value to be stored.
+ *        planSlot is the output of the ModifyTable's subplan; we use it
+ *        to access "junk" columns that are not going to be stored.
+ *
  *        Returns RETURNING result if any, otherwise NULL.
  *
  *        This may change the currently active tuple conversion map in
@@ -1194,7 +1249,9 @@ static bool
 ExecCrossPartitionUpdate(ModifyTableState *mtstate,
                          ResultRelInfo *resultRelInfo,
                          ItemPointer tupleid, HeapTuple oldtuple,
-                         TupleTableSlot *slot, TupleTableSlot *planSlot,
+                         TupleTableSlot *slot,
+                         TupleTableSlot *oldSlot,
+                         TupleTableSlot *planSlot,
                          EPQState *epqstate, bool canSetTag,
                          TupleTableSlot **retry_slot,
                          TupleTableSlot **inserted_tuple)
@@ -1269,7 +1326,15 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
             return true;
         else
         {
-            *retry_slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+            /* Fetch the most recent version of old tuple. */
+            ExecClearTuple(oldSlot);
+            if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
+                                               tupleid,
+                                               SnapshotAny,
+                                               oldSlot))
+                elog(ERROR, "failed to fetch tuple being updated");
+            *retry_slot = ExecGetUpdateNewTuple(resultRelInfo, epqslot,
+                                                oldSlot);
             return false;
         }
     }
@@ -1319,6 +1384,11 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
  *        foreign table triggers; it is NULL when the foreign table has
  *        no relevant triggers.
  *
+ *        slot contains the new tuple value to be stored, while oldSlot
+ *        contains the old tuple being replaced.  planSlot is the output
+ *        of the ModifyTable's subplan; we use it to access values from
+ *        other input tables (for RETURNING), row-ID junk columns, etc.
+ *
  *        Returns RETURNING result if any, otherwise NULL.
  * ----------------------------------------------------------------
  */
@@ -1328,6 +1398,7 @@ ExecUpdate(ModifyTableState *mtstate,
            ItemPointer tupleid,
            HeapTuple oldtuple,
            TupleTableSlot *slot,
+           TupleTableSlot *oldSlot,
            TupleTableSlot *planSlot,
            EPQState *epqstate,
            EState *estate,
@@ -1465,8 +1536,8 @@ lreplace:;
              * the tuple we're trying to move has been concurrently updated.
              */
             retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
-                                              oldtuple, slot, planSlot,
-                                              epqstate, canSetTag,
+                                              oldtuple, slot, oldSlot,
+                                              planSlot, epqstate, canSetTag,
                                               &retry_slot, &inserted_tuple);
             if (retry)
             {
@@ -1578,7 +1649,15 @@ lreplace:;
                                 /* Tuple not passing quals anymore, exiting... */
                                 return NULL;

-                            slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+                            /* Fetch the most recent version of old tuple. */
+                            ExecClearTuple(oldSlot);
+                            if (!table_tuple_fetch_row_version(resultRelationDesc,
+                                                               tupleid,
+                                                               SnapshotAny,
+                                                               oldSlot))
+                                elog(ERROR, "failed to fetch tuple being updated");
+                            slot = ExecGetUpdateNewTuple(resultRelInfo,
+                                                         epqslot, oldSlot);
                             goto lreplace;

                         case TM_Deleted:
@@ -1874,7 +1953,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
     /* Execute UPDATE with projection */
     *returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
                             resultRelInfo->ri_onConflict->oc_ProjSlot,
-                            planSlot,
+                            existing, planSlot,
                             &mtstate->mt_epqstate, mtstate->ps.state,
                             canSetTag);

@@ -2051,7 +2130,6 @@ ExecModifyTable(PlanState *pstate)
     CmdType        operation = node->operation;
     ResultRelInfo *resultRelInfo;
     PlanState  *subplanstate;
-    JunkFilter *junkfilter;
     TupleTableSlot *slot;
     TupleTableSlot *planSlot;
     ItemPointer tupleid;
@@ -2097,7 +2175,6 @@ ExecModifyTable(PlanState *pstate)
     /* Preload local variables */
     resultRelInfo = node->resultRelInfo + node->mt_whichplan;
     subplanstate = node->mt_plans[node->mt_whichplan];
-    junkfilter = resultRelInfo->ri_junkFilter;

     /*
      * Fetch rows from subplan(s), and execute the required table modification
@@ -2131,7 +2208,6 @@ ExecModifyTable(PlanState *pstate)
             {
                 resultRelInfo++;
                 subplanstate = node->mt_plans[node->mt_whichplan];
-                junkfilter = resultRelInfo->ri_junkFilter;
                 EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
                                     node->mt_arowmarks[node->mt_whichplan]);
                 continue;
@@ -2173,87 +2249,123 @@ ExecModifyTable(PlanState *pstate)

         tupleid = NULL;
         oldtuple = NULL;
-        if (junkfilter != NULL)
+
+        /*
+         * For UPDATE/DELETE, fetch the row identity info for the tuple to be
+         * updated/deleted.  For a heap relation, that's a TID; otherwise we
+         * may have a wholerow junk attr that carries the old tuple in toto.
+         * Keep this in step with the part of ExecInitModifyTable that sets
+         * up ri_RowIdAttNo.
+         */
+        if (operation == CMD_UPDATE || operation == CMD_DELETE)
         {
-            /*
-             * extract the 'ctid' or 'wholerow' junk attribute.
-             */
-            if (operation == CMD_UPDATE || operation == CMD_DELETE)
+            char        relkind;
+            Datum        datum;
+            bool        isNull;
+
+            relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
+            if (relkind == RELKIND_RELATION ||
+                relkind == RELKIND_MATVIEW ||
+                relkind == RELKIND_PARTITIONED_TABLE)
             {
-                char        relkind;
-                Datum        datum;
-                bool        isNull;
-
-                relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
-                if (relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW)
-                {
-                    datum = ExecGetJunkAttribute(slot,
-                                                 junkfilter->jf_junkAttNo,
-                                                 &isNull);
-                    /* shouldn't ever get a null result... */
-                    if (isNull)
-                        elog(ERROR, "ctid is NULL");
-
-                    tupleid = (ItemPointer) DatumGetPointer(datum);
-                    tuple_ctid = *tupleid;    /* be sure we don't free ctid!! */
-                    tupleid = &tuple_ctid;
-                }
-
-                /*
-                 * Use the wholerow attribute, when available, to reconstruct
-                 * the old relation tuple.
-                 *
-                 * Foreign table updates have a wholerow attribute when the
-                 * relation has a row-level trigger.  Note that the wholerow
-                 * attribute does not carry system columns.  Foreign table
-                 * triggers miss seeing those, except that we know enough here
-                 * to set t_tableOid.  Quite separately from this, the FDW may
-                 * fetch its own junk attrs to identify the row.
-                 *
-                 * Other relevant relkinds, currently limited to views, always
-                 * have a wholerow attribute.
-                 */
-                else if (AttributeNumberIsValid(junkfilter->jf_junkAttNo))
-                {
-                    datum = ExecGetJunkAttribute(slot,
-                                                 junkfilter->jf_junkAttNo,
-                                                 &isNull);
-                    /* shouldn't ever get a null result... */
-                    if (isNull)
-                        elog(ERROR, "wholerow is NULL");
-
-                    oldtupdata.t_data = DatumGetHeapTupleHeader(datum);
-                    oldtupdata.t_len =
-                        HeapTupleHeaderGetDatumLength(oldtupdata.t_data);
-                    ItemPointerSetInvalid(&(oldtupdata.t_self));
-                    /* Historically, view triggers see invalid t_tableOid. */
-                    oldtupdata.t_tableOid =
-                        (relkind == RELKIND_VIEW) ? InvalidOid :
-                        RelationGetRelid(resultRelInfo->ri_RelationDesc);
-
-                    oldtuple = &oldtupdata;
-                }
-                else
-                    Assert(relkind == RELKIND_FOREIGN_TABLE);
+                /* ri_RowIdAttNo refers to a ctid attribute */
+                Assert(AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo));
+                datum = ExecGetJunkAttribute(slot,
+                                             resultRelInfo->ri_RowIdAttNo,
+                                             &isNull);
+                /* shouldn't ever get a null result... */
+                if (isNull)
+                    elog(ERROR, "ctid is NULL");
+
+                tupleid = (ItemPointer) DatumGetPointer(datum);
+                tuple_ctid = *tupleid;    /* be sure we don't free ctid!! */
+                tupleid = &tuple_ctid;
             }

             /*
-             * apply the junkfilter if needed.
+             * Use the wholerow attribute, when available, to reconstruct the
+             * old relation tuple.  The old tuple serves one or both of two
+             * purposes: 1) it serves as the OLD tuple for row triggers, 2) it
+             * provides values for any unchanged columns for the NEW tuple of
+             * an UPDATE, because the subplan does not produce all the columns
+             * of the target table.
+             *
+             * Note that the wholerow attribute does not carry system columns,
+             * so foreign table triggers miss seeing those, except that we
+             * know enough here to set t_tableOid.  Quite separately from
+             * this, the FDW may fetch its own junk attrs to identify the row.
+             *
+             * Other relevant relkinds, currently limited to views, always
+             * have a wholerow attribute.
              */
-            if (operation != CMD_DELETE)
-                slot = ExecFilterJunk(junkfilter, slot);
+            else if (AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
+            {
+                datum = ExecGetJunkAttribute(slot,
+                                             resultRelInfo->ri_RowIdAttNo,
+                                             &isNull);
+                /* shouldn't ever get a null result... */
+                if (isNull)
+                    elog(ERROR, "wholerow is NULL");
+
+                oldtupdata.t_data = DatumGetHeapTupleHeader(datum);
+                oldtupdata.t_len =
+                    HeapTupleHeaderGetDatumLength(oldtupdata.t_data);
+                ItemPointerSetInvalid(&(oldtupdata.t_self));
+                /* Historically, view triggers see invalid t_tableOid. */
+                oldtupdata.t_tableOid =
+                    (relkind == RELKIND_VIEW) ? InvalidOid :
+                    RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+                oldtuple = &oldtupdata;
+            }
+            else
+            {
+                /* Only foreign tables are allowed to omit a row-ID attr */
+                Assert(relkind == RELKIND_FOREIGN_TABLE);
+            }
         }

         switch (operation)
         {
             case CMD_INSERT:
+                slot = ExecGetInsertNewTuple(resultRelInfo, planSlot);
                 slot = ExecInsert(node, resultRelInfo, slot, planSlot,
                                   estate, node->canSetTag);
                 break;
             case CMD_UPDATE:
-                slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
-                                  planSlot, &node->mt_epqstate, estate,
-                                  node->canSetTag);
+                {
+                    TupleTableSlot *oldSlot = resultRelInfo->ri_oldTupleSlot;
+
+                    /*
+                     * Make the new tuple by combining plan's output tuple
+                     * with the old tuple being updated.
+                     */
+                    ExecClearTuple(oldSlot);
+                    if (oldtuple != NULL)
+                    {
+                        /* Foreign table update, store the wholerow attr. */
+                        ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
+                    }
+                    else
+                    {
+                        /* Fetch the most recent version of old tuple. */
+                        Relation    relation = resultRelInfo->ri_RelationDesc;
+
+                        Assert(tupleid != NULL);
+                        if (!table_tuple_fetch_row_version(relation, tupleid,
+                                                           SnapshotAny,
+                                                           oldSlot))
+                            elog(ERROR, "failed to fetch tuple being updated");
+                    }
+                    slot = ExecGetUpdateNewTuple(resultRelInfo, planSlot,
+                                                 oldSlot);
+
+                    /* Now apply the update. */
+                    slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple,
+                                      slot, oldSlot, planSlot,
+                                      &node->mt_epqstate, estate,
+                                      node->canSetTag);
+                }
                 break;
             case CMD_DELETE:
                 slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
@@ -2679,117 +2791,143 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
                         mtstate->mt_arowmarks[0]);

     /*
-     * Initialize the junk filter(s) if needed.  INSERT queries need a filter
-     * if there are any junk attrs in the tlist.  UPDATE and DELETE always
-     * need a filter, since there's always at least one junk attribute present
-     * --- no need to look first.  Typically, this will be a 'ctid' or
-     * 'wholerow' attribute, but in the case of a foreign data wrapper it
-     * might be a set of junk attributes sufficient to identify the remote
-     * row.
+     * Initialize projection(s) to create tuples suitable for result rel(s).
+     * INSERT queries may need a projection to filter out junk attrs in the
+     * tlist.  UPDATE always needs a projection, because (1) there's always
+     * some junk attrs, and (2) we may need to merge values of not-updated
+     * columns from the old tuple into the final tuple.  In UPDATE, the tuple
+     * arriving from the subplan contains only new values for the changed
+     * columns, plus row identity info in the junk attrs.
      *
-     * If there are multiple result relations, each one needs its own junk
-     * filter.  Note multiple rels are only possible for UPDATE/DELETE, so we
-     * can't be fooled by some needing a filter and some not.
+     * If there are multiple result relations, each one needs its own
+     * projection.  Note multiple rels are only possible for UPDATE/DELETE, so
+     * we can't be fooled by some needing a filter and some not.
      *
      * This section of code is also a convenient place to verify that the
      * output of an INSERT or UPDATE matches the target table(s).
      */
+    for (i = 0; i < nplans; i++)
     {
-        bool        junk_filter_needed = false;
+        resultRelInfo = &mtstate->resultRelInfo[i];
+        subplan = mtstate->mt_plans[i]->plan;

-        switch (operation)
+        /*
+         * Prepare to generate tuples suitable for the target relation.
+         */
+        if (operation == CMD_INSERT)
         {
-            case CMD_INSERT:
-                foreach(l, subplan->targetlist)
-                {
-                    TargetEntry *tle = (TargetEntry *) lfirst(l);
+            List       *insertTargetList = NIL;
+            bool        need_projection = false;
+            foreach(l, subplan->targetlist)
+            {
+                TargetEntry *tle = (TargetEntry *) lfirst(l);

-                    if (tle->resjunk)
-                    {
-                        junk_filter_needed = true;
-                        break;
-                    }
-                }
-                break;
-            case CMD_UPDATE:
-            case CMD_DELETE:
-                junk_filter_needed = true;
-                break;
-            default:
-                elog(ERROR, "unknown operation");
-                break;
-        }
+                if (!tle->resjunk)
+                    insertTargetList = lappend(insertTargetList, tle);
+                else
+                    need_projection = true;
+            }
+            if (need_projection)
+            {
+                TupleDesc    relDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
+
+                resultRelInfo->ri_newTupleSlot =
+                    table_slot_create(resultRelInfo->ri_RelationDesc,
+                                      &mtstate->ps.state->es_tupleTable);
+
+                /* need an expression context to do the projection */
+                if (mtstate->ps.ps_ExprContext == NULL)
+                    ExecAssignExprContext(estate, &mtstate->ps);
+
+                resultRelInfo->ri_projectNew =
+                    ExecBuildProjectionInfo(insertTargetList,
+                                            mtstate->ps.ps_ExprContext,
+                                            resultRelInfo->ri_newTupleSlot,
+                                            &mtstate->ps,
+                                            relDesc);
+            }

-        if (junk_filter_needed)
+            /*
+             * The junk-free list must produce a tuple suitable for the result
+             * relation.
+             */
+            ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
+                                insertTargetList);
+        }
+        else if (operation == CMD_UPDATE)
         {
-            resultRelInfo = mtstate->resultRelInfo;
-            for (i = 0; i < nplans; i++)
-            {
-                JunkFilter *j;
-                TupleTableSlot *junkresslot;
+            List       *updateColnos;
+            TupleDesc    relDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
+
+            updateColnos = (List *) list_nth(node->updateColnosLists, i);

-                subplan = mtstate->mt_plans[i]->plan;
+            /*
+             * For UPDATE, we use the old tuple to fill up missing values in
+             * the tuple produced by the plan to get the new tuple.
+             */
+            resultRelInfo->ri_oldTupleSlot =
+                table_slot_create(resultRelInfo->ri_RelationDesc,
+                                  &mtstate->ps.state->es_tupleTable);
+            resultRelInfo->ri_newTupleSlot =
+                table_slot_create(resultRelInfo->ri_RelationDesc,
+                                  &mtstate->ps.state->es_tupleTable);
+
+            /* need an expression context to do the projection */
+            if (mtstate->ps.ps_ExprContext == NULL)
+                ExecAssignExprContext(estate, &mtstate->ps);
+
+            resultRelInfo->ri_projectNew =
+                ExecBuildUpdateProjection(subplan->targetlist,
+                                          updateColnos,
+                                          relDesc,
+                                          mtstate->ps.ps_ExprContext,
+                                          resultRelInfo->ri_newTupleSlot,
+                                          &mtstate->ps);
+        }

-                junkresslot =
-                    ExecInitExtraTupleSlot(estate, NULL,
-                                           table_slot_callbacks(resultRelInfo->ri_RelationDesc));
+        /*
+         * For UPDATE/DELETE, find the appropriate junk attr now, either a
+         * 'ctid' or 'wholerow' attribute depending on relkind.  For foreign
+         * tables, the FDW might have created additional junk attr(s), but
+         * those are no concern of ours.
+         */
+        if (operation == CMD_UPDATE || operation == CMD_DELETE)
+        {
+            char    relkind;

+            relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
+            if (relkind == RELKIND_RELATION ||
+                relkind == RELKIND_MATVIEW ||
+                relkind == RELKIND_PARTITIONED_TABLE)
+            {
+                resultRelInfo->ri_RowIdAttNo =
+                    ExecFindJunkAttributeInTlist(subplan->targetlist, "ctid");
+                if (!AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
+                    elog(ERROR, "could not find junk ctid column");
+            }
+            else if (relkind == RELKIND_FOREIGN_TABLE)
+            {
                 /*
-                 * For an INSERT or UPDATE, the result tuple must always match
-                 * the target table's descriptor.  For a DELETE, it won't
-                 * (indeed, there's probably no non-junk output columns).
+                 * When there is a row-level trigger, there should be a
+                 * wholerow attribute.  We also require it to be present in
+                 * UPDATE, so we can get the values of unchanged columns.
                  */
-                if (operation == CMD_INSERT || operation == CMD_UPDATE)
-                {
-                    ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
-                                        subplan->targetlist);
-                    j = ExecInitJunkFilterInsertion(subplan->targetlist,
-                                                    RelationGetDescr(resultRelInfo->ri_RelationDesc),
-                                                    junkresslot);
-                }
-                else
-                    j = ExecInitJunkFilter(subplan->targetlist,
-                                           junkresslot);
-
-                if (operation == CMD_UPDATE || operation == CMD_DELETE)
-                {
-                    /* For UPDATE/DELETE, find the appropriate junk attr now */
-                    char        relkind;
-
-                    relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
-                    if (relkind == RELKIND_RELATION ||
-                        relkind == RELKIND_MATVIEW ||
-                        relkind == RELKIND_PARTITIONED_TABLE)
-                    {
-                        j->jf_junkAttNo = ExecFindJunkAttribute(j, "ctid");
-                        if (!AttributeNumberIsValid(j->jf_junkAttNo))
-                            elog(ERROR, "could not find junk ctid column");
-                    }
-                    else if (relkind == RELKIND_FOREIGN_TABLE)
-                    {
-                        /*
-                         * When there is a row-level trigger, there should be
-                         * a wholerow attribute.
-                         */
-                        j->jf_junkAttNo = ExecFindJunkAttribute(j, "wholerow");
-                    }
-                    else
-                    {
-                        j->jf_junkAttNo = ExecFindJunkAttribute(j, "wholerow");
-                        if (!AttributeNumberIsValid(j->jf_junkAttNo))
-                            elog(ERROR, "could not find junk wholerow column");
-                    }
-                }
-
-                resultRelInfo->ri_junkFilter = j;
-                resultRelInfo++;
+                resultRelInfo->ri_RowIdAttNo =
+                    ExecFindJunkAttributeInTlist(subplan->targetlist,
+                                                 "wholerow");
+                if (mtstate->operation == CMD_UPDATE &&
+                    !AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
+                    elog(ERROR, "could not find junk wholerow column");
+            }
+            else
+            {
+                /* Other valid target relkinds must provide wholerow */
+                resultRelInfo->ri_RowIdAttNo =
+                    ExecFindJunkAttributeInTlist(subplan->targetlist,
+                                                 "wholerow");
+                if (!AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
+                    elog(ERROR, "could not find junk wholerow column");
             }
-        }
-        else
-        {
-            if (operation == CMD_INSERT)
-                ExecCheckPlanOutput(mtstate->resultRelInfo->ri_RelationDesc,
-                                    subplan->targetlist);
         }
     }

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 38b56231b7..1ec586729b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -207,6 +207,7 @@ _copyModifyTable(const ModifyTable *from)
     COPY_SCALAR_FIELD(partColsUpdated);
     COPY_NODE_FIELD(resultRelations);
     COPY_NODE_FIELD(plans);
+    COPY_NODE_FIELD(updateColnosLists);
     COPY_NODE_FIELD(withCheckOptionLists);
     COPY_NODE_FIELD(returningLists);
     COPY_NODE_FIELD(fdwPrivLists);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9f7918c7e9..99fb38c05a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -408,6 +408,7 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
     WRITE_BOOL_FIELD(partColsUpdated);
     WRITE_NODE_FIELD(resultRelations);
     WRITE_NODE_FIELD(plans);
+    WRITE_NODE_FIELD(updateColnosLists);
     WRITE_NODE_FIELD(withCheckOptionLists);
     WRITE_NODE_FIELD(returningLists);
     WRITE_NODE_FIELD(fdwPrivLists);
@@ -2143,6 +2144,7 @@ _outModifyTablePath(StringInfo str, const ModifyTablePath *node)
     WRITE_NODE_FIELD(resultRelations);
     WRITE_NODE_FIELD(subpaths);
     WRITE_NODE_FIELD(subroots);
+    WRITE_NODE_FIELD(updateColnosLists);
     WRITE_NODE_FIELD(withCheckOptionLists);
     WRITE_NODE_FIELD(returningLists);
     WRITE_NODE_FIELD(rowMarks);
@@ -2268,12 +2270,12 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
     WRITE_NODE_FIELD(distinct_pathkeys);
     WRITE_NODE_FIELD(sort_pathkeys);
     WRITE_NODE_FIELD(processed_tlist);
+    WRITE_NODE_FIELD(update_colnos);
     WRITE_NODE_FIELD(minmax_aggs);
     WRITE_FLOAT_FIELD(total_table_pages, "%.0f");
     WRITE_FLOAT_FIELD(tuple_fraction, "%.4f");
     WRITE_FLOAT_FIELD(limit_tuples, "%.0f");
     WRITE_UINT_FIELD(qual_security_level);
-    WRITE_ENUM_FIELD(inhTargetKind, InheritanceKind);
     WRITE_BOOL_FIELD(hasJoinRTEs);
     WRITE_BOOL_FIELD(hasLateralRTEs);
     WRITE_BOOL_FIELD(hasHavingQual);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 377185f7c6..0b6331d3da 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1683,6 +1683,7 @@ _readModifyTable(void)
     READ_BOOL_FIELD(partColsUpdated);
     READ_NODE_FIELD(resultRelations);
     READ_NODE_FIELD(plans);
+    READ_NODE_FIELD(updateColnosLists);
     READ_NODE_FIELD(withCheckOptionLists);
     READ_NODE_FIELD(returningLists);
     READ_NODE_FIELD(fdwPrivLists);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 906cab7053..4bb482879f 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -302,6 +302,7 @@ static ModifyTable *make_modifytable(PlannerInfo *root,
                                      Index nominalRelation, Index rootRelation,
                                      bool partColsUpdated,
                                      List *resultRelations, List *subplans, List *subroots,
+                                     List *updateColnosLists,
                                      List *withCheckOptionLists, List *returningLists,
                                      List *rowMarks, OnConflictExpr *onconflict, int epqParam);
 static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
@@ -2642,7 +2643,8 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)
     ModifyTable *plan;
     List       *subplans = NIL;
     ListCell   *subpaths,
-               *subroots;
+               *subroots,
+               *lc;

     /* Build the plan for each input path */
     forboth(subpaths, best_path->subpaths,
@@ -2665,9 +2667,6 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)
          */
         subplan = create_plan_recurse(subroot, subpath, CP_EXACT_TLIST);

-        /* Transfer resname/resjunk labeling, too, to keep executor happy */
-        apply_tlist_labeling(subplan->targetlist, subroot->processed_tlist);
-
         subplans = lappend(subplans, subplan);
     }

@@ -2680,6 +2679,7 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)
                             best_path->resultRelations,
                             subplans,
                             best_path->subroots,
+                            best_path->updateColnosLists,
                             best_path->withCheckOptionLists,
                             best_path->returningLists,
                             best_path->rowMarks,
@@ -2688,6 +2688,41 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)

     copy_generic_path_info(&plan->plan, &best_path->path);

+    forboth(lc, subplans,
+            subroots, best_path->subroots)
+    {
+        Plan       *subplan = (Plan *) lfirst(lc);
+        PlannerInfo *subroot = (PlannerInfo *) lfirst(subroots);
+
+        /*
+         * Fix up the resnos of query's TLEs to make them match their ordinal
+         * position in the list, which they may not in the case of an UPDATE.
+         * It's safe to revise that targetlist now, because nothing after this
+         * point needs those resnos to match target relation's attribute
+         * numbers.
+         * XXX - we do this simply because apply_tlist_labeling() asserts that
+         * resnos in processed_tlist and resnos in subplan targetlist are
+         * exactly same, but maybe we can just remove the assert?
+         */
+        if (plan->operation == CMD_UPDATE)
+        {
+            ListCell   *l;
+            AttrNumber    resno = 1;
+
+            foreach(l, subroot->processed_tlist)
+            {
+                TargetEntry *tle = lfirst(l);
+
+                tle = flatCopyTargetEntry(tle);
+                tle->resno = resno++;
+                lfirst(l) = tle;
+            }
+        }
+
+        /* Transfer resname/resjunk labeling, too, to keep executor happy */
+        apply_tlist_labeling(subplan->targetlist, subroot->processed_tlist);
+    }
+
     return plan;
 }

@@ -6880,6 +6915,7 @@ make_modifytable(PlannerInfo *root,
                  Index nominalRelation, Index rootRelation,
                  bool partColsUpdated,
                  List *resultRelations, List *subplans, List *subroots,
+                 List *updateColnosLists,
                  List *withCheckOptionLists, List *returningLists,
                  List *rowMarks, OnConflictExpr *onconflict, int epqParam)
 {
@@ -6892,6 +6928,9 @@ make_modifytable(PlannerInfo *root,

     Assert(list_length(resultRelations) == list_length(subplans));
     Assert(list_length(resultRelations) == list_length(subroots));
+    Assert(operation == CMD_UPDATE ?
+           list_length(resultRelations) == list_length(updateColnosLists) :
+           updateColnosLists == NIL);
     Assert(withCheckOptionLists == NIL ||
            list_length(resultRelations) == list_length(withCheckOptionLists));
     Assert(returningLists == NIL ||
@@ -6936,6 +6975,7 @@ make_modifytable(PlannerInfo *root,
         node->exclRelRTI = onconflict->exclRelIndex;
         node->exclRelTlist = onconflict->exclRelTlist;
     }
+    node->updateColnosLists = updateColnosLists;
     node->withCheckOptionLists = withCheckOptionLists;
     node->returningLists = returningLists;
     node->rowMarks = rowMarks;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f529d107d2..ccb9166a8e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -620,6 +620,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
     memset(root->upper_rels, 0, sizeof(root->upper_rels));
     memset(root->upper_targets, 0, sizeof(root->upper_targets));
     root->processed_tlist = NIL;
+    root->update_colnos = NIL;
     root->grouping_map = NULL;
     root->minmax_aggs = NIL;
     root->qual_security_level = 0;
@@ -1222,6 +1223,7 @@ inheritance_planner(PlannerInfo *root)
     List       *subpaths = NIL;
     List       *subroots = NIL;
     List       *resultRelations = NIL;
+    List       *updateColnosLists = NIL;
     List       *withCheckOptionLists = NIL;
     List       *returningLists = NIL;
     List       *rowMarks;
@@ -1687,6 +1689,11 @@ inheritance_planner(PlannerInfo *root)
         /* Build list of target-relation RT indexes */
         resultRelations = lappend_int(resultRelations, appinfo->child_relid);

+        /* Accumulate lists of UPDATE target columns */
+        if (parse->commandType == CMD_UPDATE)
+            updateColnosLists = lappend(updateColnosLists,
+                                        subroot->update_colnos);
+
         /* Build lists of per-relation WCO and RETURNING targetlists */
         if (parse->withCheckOptions)
             withCheckOptionLists = lappend(withCheckOptionLists,
@@ -1732,6 +1739,9 @@ inheritance_planner(PlannerInfo *root)
         subpaths = list_make1(dummy_path);
         subroots = list_make1(root);
         resultRelations = list_make1_int(parse->resultRelation);
+        if (parse->commandType == CMD_UPDATE)
+            updateColnosLists = lappend(updateColnosLists,
+                                        root->update_colnos);
         if (parse->withCheckOptions)
             withCheckOptionLists = list_make1(parse->withCheckOptions);
         if (parse->returningList)
@@ -1788,6 +1798,7 @@ inheritance_planner(PlannerInfo *root)
                                      resultRelations,
                                      subpaths,
                                      subroots,
+                                     updateColnosLists,
                                      withCheckOptionLists,
                                      returningLists,
                                      rowMarks,
@@ -2313,6 +2324,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
         if (parse->commandType != CMD_SELECT && !inheritance_update)
         {
             Index        rootRelation;
+            List *updateColnosLists;
             List       *withCheckOptionLists;
             List       *returningLists;
             List       *rowMarks;
@@ -2327,6 +2339,12 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
             else
                 rootRelation = 0;

+            /* Set up the UPDATE target columns list-of-lists, if needed. */
+            if (parse->commandType == CMD_UPDATE)
+                updateColnosLists = list_make1(root->update_colnos);
+            else
+                updateColnosLists = NIL;
+
             /*
              * Set up the WITH CHECK OPTION and RETURNING lists-of-lists, if
              * needed.
@@ -2361,6 +2379,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
                                         list_make1_int(parse->resultRelation),
                                         list_make1(path),
                                         list_make1(root),
+                                        updateColnosLists,
                                         withCheckOptionLists,
                                         returningLists,
                                         rowMarks,
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index d961592e01..e18553ac7c 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -925,6 +925,7 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
     memset(subroot->upper_rels, 0, sizeof(subroot->upper_rels));
     memset(subroot->upper_targets, 0, sizeof(subroot->upper_targets));
     subroot->processed_tlist = NIL;
+    subroot->update_colnos = NIL;
     subroot->grouping_map = NULL;
     subroot->minmax_aggs = NIL;
     subroot->qual_security_level = 0;
diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 23f9f861f4..488e8cfd4d 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -3,13 +3,19 @@
  * preptlist.c
  *      Routines to preprocess the parse tree target list
  *
- * For INSERT and UPDATE queries, the targetlist must contain an entry for
- * each attribute of the target relation in the correct order.  For UPDATE and
- * DELETE queries, it must also contain junk tlist entries needed to allow the
- * executor to identify the rows to be updated or deleted.  For all query
- * types, we may need to add junk tlist entries for Vars used in the RETURNING
- * list and row ID information needed for SELECT FOR UPDATE locking and/or
- * EvalPlanQual checking.
+ * For an INSERT, the targetlist must contain an entry for each attribute of
+ * the target relation in the correct order.
+ *
+ * For an UPDATE, the targetlist just contains the expressions for the new
+ * column values.
+ *
+ * For UPDATE and DELETE queries, the targetlist must also contain "junk"
+ * tlist entries needed to allow the executor to identify the rows to be
+ * updated or deleted; for example, the ctid of a heap row.
+ *
+ * For all query types, there can be additional junk tlist entries, such as
+ * sort keys, Vars needed for a RETURNING list, and row ID information needed
+ * for SELECT FOR UPDATE locking and/or EvalPlanQual checking.
  *
  * The query rewrite phase also does preprocessing of the targetlist (see
  * rewriteTargetListIU).  The division of labor between here and there is
@@ -52,6 +58,7 @@
 #include "rewrite/rewriteHandler.h"
 #include "utils/rel.h"

+static List *make_update_colnos(List *tlist);
 static List *expand_targetlist(List *tlist, int command_type,
                                Index result_relation, Relation rel);

@@ -63,7 +70,8 @@ static List *expand_targetlist(List *tlist, int command_type,
  *      Returns the new targetlist.
  *
  * As a side effect, if there's an ON CONFLICT UPDATE clause, its targetlist
- * is also preprocessed (and updated in-place).
+ * is also preprocessed (and updated in-place).  Also, if this is an UPDATE,
+ * we return a list of target column numbers in root->update_colnos.
  */
 List *
 preprocess_targetlist(PlannerInfo *root)
@@ -108,14 +116,19 @@ preprocess_targetlist(PlannerInfo *root)
         rewriteTargetListUD(parse, target_rte, target_relation);

     /*
-     * for heap_form_tuple to work, the targetlist must match the exact order
-     * of the attributes. We also need to fill in any missing attributes. -ay
-     * 10/94
+     * In an INSERT, the executor expects the targetlist to match the exact
+     * order of the target table's attributes, including entries for
+     * attributes not mentioned in the source query.
+     *
+     * In an UPDATE, we don't rearrange the tlist order, but we need to make a
+     * separate list of the target attribute numbers, in tlist order.
      */
     tlist = parse->targetList;
-    if (command_type == CMD_INSERT || command_type == CMD_UPDATE)
+    if (command_type == CMD_INSERT)
         tlist = expand_targetlist(tlist, command_type,
                                   result_relation, target_relation);
+    else if (command_type == CMD_UPDATE)
+        root->update_colnos = make_update_colnos(tlist);

     /*
      * Add necessary junk columns for rowmarked rels.  These values are needed
@@ -239,6 +252,29 @@ preprocess_targetlist(PlannerInfo *root)
     return tlist;
 }

+/*
+ * make_update_colnos
+ *         Extract a list of the target-table column numbers that
+ *         an UPDATE's targetlist wants to assign to.
+ *
+ * We just need to capture the resno's of the non-junk tlist entries.
+ */
+static List *
+make_update_colnos(List *tlist)
+{
+    List*update_colnos = NIL;
+    ListCell *lc;
+
+    foreach(lc, tlist)
+    {
+        TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+        if (!tle->resjunk)
+            update_colnos = lappend_int(update_colnos, tle->resno);
+    }
+    return update_colnos;
+}
+

 /*****************************************************************************
  *
@@ -251,6 +287,10 @@ preprocess_targetlist(PlannerInfo *root)
  *      Given a target list as generated by the parser and a result relation,
  *      add targetlist entries for any missing attributes, and ensure the
  *      non-junk attributes appear in proper field order.
+ *
+ * command_type is a bit of an archaism now: it's CMD_INSERT when we're
+ * processing an INSERT, all right, but the only other use of this function
+ * is for ON CONFLICT UPDATE tlists, for which command_type is CMD_UPDATE.
  */
 static List *
 expand_targetlist(List *tlist, int command_type,
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 69b83071cf..a97929c13f 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3548,6 +3548,8 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
  * 'resultRelations' is an integer list of actual RT indexes of target rel(s)
  * 'subpaths' is a list of Path(s) producing source data (one per rel)
  * 'subroots' is a list of PlannerInfo structs (one per rel)
+ * 'updateColnosLists' is a list of UPDATE target column number lists
+ *        (one sublist per rel); or NIL if not an UPDATE
  * 'withCheckOptionLists' is a list of WCO lists (one per rel)
  * 'returningLists' is a list of RETURNING tlists (one per rel)
  * 'rowMarks' is a list of PlanRowMarks (non-locking only)
@@ -3561,6 +3563,7 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
                         bool partColsUpdated,
                         List *resultRelations, List *subpaths,
                         List *subroots,
+                        List *updateColnosLists,
                         List *withCheckOptionLists, List *returningLists,
                         List *rowMarks, OnConflictExpr *onconflict,
                         int epqParam)
@@ -3571,6 +3574,9 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,

     Assert(list_length(resultRelations) == list_length(subpaths));
     Assert(list_length(resultRelations) == list_length(subroots));
+    Assert(operation == CMD_UPDATE ?
+           list_length(resultRelations) == list_length(updateColnosLists) :
+           updateColnosLists == NIL);
     Assert(withCheckOptionLists == NIL ||
            list_length(resultRelations) == list_length(withCheckOptionLists));
     Assert(returningLists == NIL ||
@@ -3633,6 +3639,7 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
     pathnode->resultRelations = resultRelations;
     pathnode->subpaths = subpaths;
     pathnode->subroots = subroots;
+    pathnode->updateColnosLists = updateColnosLists;
     pathnode->withCheckOptionLists = withCheckOptionLists;
     pathnode->returningLists = returningLists;
     pathnode->rowMarks = rowMarks;
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 0672f497c6..f9175987f8 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1659,17 +1659,21 @@ rewriteTargetListUD(Query *parsetree, RangeTblEntry *target_rte,
                                                 target_relation);

         /*
-         * If we have a row-level trigger corresponding to the operation, emit
-         * a whole-row Var so that executor will have the "old" row to pass to
-         * the trigger.  Alas, this misses system columns.
+         * For UPDATE, we need to make the FDW fetch unchanged columns by
+         * asking it to fetch a whole-row Var.  That's because the top-level
+         * targetlist only contains entries for changed columns.  (Actually,
+         * we only really need this for UPDATEs that are not pushed to the
+         * remote side, but it's hard to tell if that will be the case at the
+         * point when this function is called.)
+         *
+         * We will also need the whole row if there are any row triggers, so
+         * that the executor will have the "old" row to pass to the trigger.
+         * Alas, this misses system columns.
          */
-        if (target_relation->trigdesc &&
-            ((parsetree->commandType == CMD_UPDATE &&
-              (target_relation->trigdesc->trig_update_after_row ||
-               target_relation->trigdesc->trig_update_before_row)) ||
-             (parsetree->commandType == CMD_DELETE &&
-              (target_relation->trigdesc->trig_delete_after_row ||
-               target_relation->trigdesc->trig_delete_before_row))))
+        if (parsetree->commandType == CMD_UPDATE ||
+            (target_relation->trigdesc &&
+             (target_relation->trigdesc->trig_delete_after_row ||
+              target_relation->trigdesc->trig_delete_before_row)))
         {
             var = makeWholeRowVar(target_rte,
                                   parsetree->resultRelation,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363d54..c8c09f1cb5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -156,9 +156,6 @@ extern void ResetTupleHashTable(TupleHashTable hashtable);
  */
 extern JunkFilter *ExecInitJunkFilter(List *targetList,
                                       TupleTableSlot *slot);
-extern JunkFilter *ExecInitJunkFilterInsertion(List *targetList,
-                                               TupleDesc cleanTupType,
-                                               TupleTableSlot *slot);
 extern JunkFilter *ExecInitJunkFilterConversion(List *targetList,
                                                 TupleDesc cleanTupType,
                                                 TupleTableSlot *slot);
@@ -270,6 +267,12 @@ extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
                                                TupleTableSlot *slot,
                                                PlanState *parent,
                                                TupleDesc inputDesc);
+extern ProjectionInfo *ExecBuildUpdateProjection(List *subTargetList,
+                          List *targetColnos,
+                          TupleDesc relDesc,
+                          ExprContext *econtext,
+                          TupleTableSlot *slot,
+                          PlanState *parent);
 extern ExprState *ExecPrepareExpr(Expr *node, EState *estate);
 extern ExprState *ExecPrepareQual(List *qual, EState *estate);
 extern ExprState *ExecPrepareCheck(List *qual, EState *estate);
@@ -622,4 +625,9 @@ extern void CheckCmdReplicaIdentity(Relation rel, CmdType cmd);
 extern void CheckSubscriptionRelkind(char relkind, const char *nspname,
                                      const char *relname);

+/* needed by trigger.c */
+extern TupleTableSlot *ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
+                          TupleTableSlot *planSlot,
+                          TupleTableSlot *oldSlot);
+
 #endif                            /* EXECUTOR_H  */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e31ad6204e..7af6d48525 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -356,10 +356,6 @@ typedef struct ProjectionInfo
  *                        attribute numbers of the "original" tuple and the
  *                        attribute numbers of the "clean" tuple.
  *      resultSlot:        tuple slot used to hold cleaned tuple.
- *      junkAttNo:        not used by junkfilter code.  Can be used by caller
- *                        to remember the attno of a specific junk attribute
- *                        (nodeModifyTable.c keeps the "ctid" or "wholerow"
- *                        attno here).
  * ----------------
  */
 typedef struct JunkFilter
@@ -369,7 +365,6 @@ typedef struct JunkFilter
     TupleDesc    jf_cleanTupType;
     AttrNumber *jf_cleanMap;
     TupleTableSlot *jf_resultSlot;
-    AttrNumber    jf_junkAttNo;
 } JunkFilter;

 /*
@@ -423,6 +418,19 @@ typedef struct ResultRelInfo
     /* array of key/attr info for indices */
     IndexInfo **ri_IndexRelationInfo;

+    /*
+     * For UPDATE/DELETE result relations, the attribute number of the row
+     * identity junk attribute in the source plan's output tuples
+     */
+    AttrNumber        ri_RowIdAttNo;
+
+    /* Projection to generate new tuple in an INSERT/UPDATE */
+    ProjectionInfo *ri_projectNew;
+    /* Slot to hold that tuple */
+    TupleTableSlot *ri_newTupleSlot;
+    /* Slot to hold the old tuple being updated */
+    TupleTableSlot *ri_oldTupleSlot;
+
     /* triggers to be fired, if any */
     TriggerDesc *ri_TrigDesc;

@@ -470,9 +478,6 @@ typedef struct ResultRelInfo
     /* number of stored generated columns we need to compute */
     int            ri_NumGeneratedNeeded;

-    /* for removing junk attributes from tuples */
-    JunkFilter *ri_junkFilter;
-
     /* list of RETURNING expressions */
     List       *ri_returningList;

diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c13642e35e..bed9f4da09 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -309,15 +309,23 @@ struct PlannerInfo

     /*
      * The fully-processed targetlist is kept here.  It differs from
-     * parse->targetList in that (for INSERT and UPDATE) it's been reordered
-     * to match the target table, and defaults have been filled in.  Also,
-     * additional resjunk targets may be present.  preprocess_targetlist()
-     * does most of this work, but note that more resjunk targets can get
-     * added during appendrel expansion.  (Hence, upper_targets mustn't get
-     * set up till after that.)
+     * parse->targetList in that (for INSERT) it's been reordered to match the
+     * target table, and defaults have been filled in.  Also, additional
+     * resjunk targets may be present.  preprocess_targetlist() does most of
+     * that work, but note that more resjunk targets can get added during
+     * appendrel expansion.  (Hence, upper_targets mustn't get set up till
+     * after that.)
      */
     List       *processed_tlist;

+    /*
+     * For UPDATE, processed_tlist remains in the order the user wrote the
+     * assignments.  This list contains the target table's attribute numbers
+     * to which the first N entries of processed_tlist are to be assigned.
+     * (Any additional entries in processed_tlist must be resjunk.)
+     */
+    List       *update_colnos;
+
     /* Fields filled during create_plan() for use in setrefs.c */
     AttrNumber *grouping_map;    /* for GroupingFunc fixup */
     List       *minmax_aggs;    /* List of MinMaxAggInfos */
@@ -1839,6 +1847,7 @@ typedef struct ModifyTablePath
     List       *resultRelations;    /* integer list of RT indexes */
     List       *subpaths;        /* Path(s) producing source data */
     List       *subroots;        /* per-target-table PlannerInfos */
+    List       *updateColnosLists; /* per-target-table update_colnos lists */
     List       *withCheckOptionLists;    /* per-target-table WCO lists */
     List       *returningLists; /* per-target-table RETURNING tlists */
     List       *rowMarks;        /* PlanRowMarks (non-locking only) */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 6e62104d0b..7d74bd92b8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -219,6 +219,7 @@ typedef struct ModifyTable
     bool        partColsUpdated;    /* some part key in hierarchy updated */
     List       *resultRelations;    /* integer list of RT indexes */
     List       *plans;            /* plan(s) producing source data */
+    List       *updateColnosLists; /* per-target-table update_colnos lists */
     List       *withCheckOptionLists;    /* per-target-table WCO lists */
     List       *returningLists; /* per-target-table RETURNING tlists */
     List       *fdwPrivLists;    /* per-target-table FDW private data lists */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 54f4b782fc..9673a4a638 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -265,6 +265,7 @@ extern ModifyTablePath *create_modifytable_path(PlannerInfo *root,
                                                 bool partColsUpdated,
                                                 List *resultRelations, List *subpaths,
                                                 List *subroots,
+                                                List *updateColnosLists,
                                                 List *withCheckOptionLists, List *returningLists,
                                                 List *rowMarks, OnConflictExpr *onconflict,
                                                 int epqParam);
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 2b68aef654..94e43c3410 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -545,25 +545,25 @@ create table some_tab_child () inherits (some_tab);
 insert into some_tab_child values(1,2);
 explain (verbose, costs off)
 update some_tab set a = a + 1 where false;
-            QUERY PLAN
-----------------------------------
+           QUERY PLAN
+--------------------------------
  Update on public.some_tab
    Update on public.some_tab
    ->  Result
-         Output: (a + 1), b, ctid
+         Output: (a + 1), ctid
          One-Time Filter: false
 (5 rows)

 update some_tab set a = a + 1 where false;
 explain (verbose, costs off)
 update some_tab set a = a + 1 where false returning b, a;
-            QUERY PLAN
-----------------------------------
+           QUERY PLAN
+--------------------------------
  Update on public.some_tab
    Output: b, a
    Update on public.some_tab
    ->  Result
-         Output: (a + 1), b, ctid
+         Output: (a + 1), ctid
          One-Time Filter: false
 (6 rows)

diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 24905332b1..770eab38b5 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -1283,12 +1283,12 @@ SELECT * FROM rw_view1;
 (4 rows)

 EXPLAIN (verbose, costs off) UPDATE rw_view1 SET b = b + 1 RETURNING *;
-                         QUERY PLAN
--------------------------------------------------------------
+                   QUERY PLAN
+-------------------------------------------------
  Update on public.base_tbl
    Output: base_tbl.a, base_tbl.b
    ->  Seq Scan on public.base_tbl
-         Output: base_tbl.a, (base_tbl.b + 1), base_tbl.ctid
+         Output: (base_tbl.b + 1), base_tbl.ctid
 (4 rows)

 UPDATE rw_view1 SET b = b + 1 RETURNING *;
@@ -2340,7 +2340,7 @@ UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
    Update on public.t12 t1_2
    Update on public.t111 t1_3
    ->  Index Scan using t1_a_idx on public.t1
-         Output: 100, t1.b, t1.c, t1.ctid
+         Output: 100, t1.ctid
          Index Cond: ((t1.a > 5) AND (t1.a < 7))
          Filter: ((t1.a <> 6) AND (SubPlan 1) AND snoop(t1.a) AND leakproof(t1.a))
          SubPlan 1
@@ -2350,15 +2350,15 @@ UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
                  ->  Seq Scan on public.t111 t12_2
                        Filter: (t12_2.a = t1.a)
    ->  Index Scan using t11_a_idx on public.t11 t1_1
-         Output: 100, t1_1.b, t1_1.c, t1_1.d, t1_1.ctid
+         Output: 100, t1_1.ctid
          Index Cond: ((t1_1.a > 5) AND (t1_1.a < 7))
          Filter: ((t1_1.a <> 6) AND (SubPlan 1) AND snoop(t1_1.a) AND leakproof(t1_1.a))
    ->  Index Scan using t12_a_idx on public.t12 t1_2
-         Output: 100, t1_2.b, t1_2.c, t1_2.e, t1_2.ctid
+         Output: 100, t1_2.ctid
          Index Cond: ((t1_2.a > 5) AND (t1_2.a < 7))
          Filter: ((t1_2.a <> 6) AND (SubPlan 1) AND snoop(t1_2.a) AND leakproof(t1_2.a))
    ->  Index Scan using t111_a_idx on public.t111 t1_3
-         Output: 100, t1_3.b, t1_3.c, t1_3.d, t1_3.e, t1_3.ctid
+         Output: 100, t1_3.ctid
          Index Cond: ((t1_3.a > 5) AND (t1_3.a < 7))
          Filter: ((t1_3.a <> 6) AND (SubPlan 1) AND snoop(t1_3.a) AND leakproof(t1_3.a))
 (27 rows)
@@ -2376,15 +2376,15 @@ SELECT * FROM t1 WHERE a=100; -- Nothing should have been changed to 100

 EXPLAIN (VERBOSE, COSTS OFF)
 UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
-                               QUERY PLAN
--------------------------------------------------------------------------
+                              QUERY PLAN
+-----------------------------------------------------------------------
  Update on public.t1
    Update on public.t1
    Update on public.t11 t1_1
    Update on public.t12 t1_2
    Update on public.t111 t1_3
    ->  Index Scan using t1_a_idx on public.t1
-         Output: (t1.a + 1), t1.b, t1.c, t1.ctid
+         Output: (t1.a + 1), t1.ctid
          Index Cond: ((t1.a > 5) AND (t1.a = 8))
          Filter: ((SubPlan 1) AND snoop(t1.a) AND leakproof(t1.a))
          SubPlan 1
@@ -2394,15 +2394,15 @@ UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
                  ->  Seq Scan on public.t111 t12_2
                        Filter: (t12_2.a = t1.a)
    ->  Index Scan using t11_a_idx on public.t11 t1_1
-         Output: (t1_1.a + 1), t1_1.b, t1_1.c, t1_1.d, t1_1.ctid
+         Output: (t1_1.a + 1), t1_1.ctid
          Index Cond: ((t1_1.a > 5) AND (t1_1.a = 8))
          Filter: ((SubPlan 1) AND snoop(t1_1.a) AND leakproof(t1_1.a))
    ->  Index Scan using t12_a_idx on public.t12 t1_2
-         Output: (t1_2.a + 1), t1_2.b, t1_2.c, t1_2.e, t1_2.ctid
+         Output: (t1_2.a + 1), t1_2.ctid
          Index Cond: ((t1_2.a > 5) AND (t1_2.a = 8))
          Filter: ((SubPlan 1) AND snoop(t1_2.a) AND leakproof(t1_2.a))
    ->  Index Scan using t111_a_idx on public.t111 t1_3
-         Output: (t1_3.a + 1), t1_3.b, t1_3.c, t1_3.d, t1_3.e, t1_3.ctid
+         Output: (t1_3.a + 1), t1_3.ctid
          Index Cond: ((t1_3.a > 5) AND (t1_3.a = 8))
          Filter: ((SubPlan 1) AND snoop(t1_3.a) AND leakproof(t1_3.a))
 (27 rows)
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index bf939d79f6..dece036069 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -172,14 +172,14 @@ EXPLAIN (VERBOSE, COSTS OFF)
 UPDATE update_test t
   SET (a, b) = (SELECT b, a FROM update_test s WHERE s.a = t.a)
   WHERE CURRENT_USER = SESSION_USER;
-                            QUERY PLAN
-------------------------------------------------------------------
+                         QUERY PLAN
+-------------------------------------------------------------
  Update on public.update_test t
    ->  Result
-         Output: $1, $2, t.c, (SubPlan 1 (returns $1,$2)), t.ctid
+         Output: $1, $2, (SubPlan 1 (returns $1,$2)), t.ctid
          One-Time Filter: (CURRENT_USER = SESSION_USER)
          ->  Seq Scan on public.update_test t
-               Output: t.c, t.a, t.ctid
+               Output: t.a, t.ctid
          SubPlan 1 (returns $1,$2)
            ->  Seq Scan on public.update_test s
                  Output: s.b, s.a
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index cff23b0211..9f29d7f2b8 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1275,7 +1275,7 @@ deparseLockingClause(deparse_expr_cxt *context)
          * that DECLARE CURSOR ... FOR UPDATE is supported, which it isn't
          * before 8.3.
          */
-        if (relid == root->parse->resultRelation &&
+        if (bms_is_member(relid, root->all_result_relids) &&
             (root->parse->commandType == CMD_UPDATE ||
              root->parse->commandType == CMD_DELETE))
         {
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b46e7e623f..8d8a7b88df 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -6348,13 +6348,13 @@ SELECT * FROM foreign_tbl;

 EXPLAIN (VERBOSE, COSTS OFF)
 UPDATE rw_view SET b = b + 5;
-                                       QUERY PLAN
-----------------------------------------------------------------------------------------
+                                           QUERY PLAN
+------------------------------------------------------------------------------------------------
  Update on public.parent_tbl
    Foreign Update on public.foreign_tbl parent_tbl_1
      Remote SQL: UPDATE public.child_tbl SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Foreign Scan on public.foreign_tbl parent_tbl_1
-         Output: (parent_tbl_1.b + 5), parent_tbl_1.ctid, parent_tbl_1.*
+         Output: (parent_tbl_1.b + 5), parent_tbl_1.tableoid, parent_tbl_1.ctid, parent_tbl_1.*
          Remote SQL: SELECT a, b, ctid FROM public.child_tbl WHERE ((a < b)) FOR UPDATE
 (6 rows)

@@ -6363,13 +6363,13 @@ ERROR:  new row violates check option for view "rw_view"
 DETAIL:  Failing row contains (20, 20).
 EXPLAIN (VERBOSE, COSTS OFF)
 UPDATE rw_view SET b = b + 15;
-                                       QUERY PLAN
-----------------------------------------------------------------------------------------
+                                           QUERY PLAN
+-------------------------------------------------------------------------------------------------
  Update on public.parent_tbl
    Foreign Update on public.foreign_tbl parent_tbl_1
      Remote SQL: UPDATE public.child_tbl SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Foreign Scan on public.foreign_tbl parent_tbl_1
-         Output: (parent_tbl_1.b + 15), parent_tbl_1.ctid, parent_tbl_1.*
+         Output: (parent_tbl_1.b + 15), parent_tbl_1.tableoid, parent_tbl_1.ctid, parent_tbl_1.*
          Remote SQL: SELECT a, b, ctid FROM public.child_tbl WHERE ((a < b)) FOR UPDATE
 (6 rows)

@@ -7253,36 +7253,22 @@ select * from bar where f1 in (select f1 from foo) for share;
 -- Check UPDATE with inherited target and an inherited source table
 explain (verbose, costs off)
 update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
-                                      QUERY PLAN
----------------------------------------------------------------------------------------
+                                              QUERY PLAN
+-------------------------------------------------------------------------------------------------------
  Update on public.bar
-   Update on public.bar
-   Foreign Update on public.bar2 bar_1
+   Update on public.bar bar_1
+   Foreign Update on public.bar2 bar_2
      Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1
    ->  Hash Join
-         Output: (bar.f2 + 100), bar.ctid, foo.ctid, foo.*, foo.tableoid
+         Output: (bar.f2 + 100), foo.ctid, bar.tableoid, bar.ctid, (NULL::record), foo.*, foo.tableoid
          Inner Unique: true
          Hash Cond: (bar.f1 = foo.f1)
-         ->  Seq Scan on public.bar
-               Output: bar.f2, bar.ctid, bar.f1
-         ->  Hash
-               Output: foo.ctid, foo.f1, foo.*, foo.tableoid
-               ->  HashAggregate
-                     Output: foo.ctid, foo.f1, foo.*, foo.tableoid
-                     Group Key: foo.f1
-                     ->  Append
-                           ->  Seq Scan on public.foo foo_1
-                                 Output: foo_1.ctid, foo_1.f1, foo_1.*, foo_1.tableoid
-                           ->  Foreign Scan on public.foo2 foo_2
-                                 Output: foo_2.ctid, foo_2.f1, foo_2.*, foo_2.tableoid
-                                 Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct1
-   ->  Hash Join
-         Output: (bar_1.f2 + 100), bar_1.ctid, bar_1.*, foo.ctid, foo.*, foo.tableoid
-         Inner Unique: true
-         Hash Cond: (bar_1.f1 = foo.f1)
-         ->  Foreign Scan on public.bar2 bar_1
-               Output: bar_1.f2, bar_1.ctid, bar_1.*, bar_1.f1
-               Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
+         ->  Append
+               ->  Seq Scan on public.bar bar_1
+                     Output: bar_1.f2, bar_1.f1, bar_1.tableoid, bar_1.ctid, NULL::record
+               ->  Foreign Scan on public.bar2 bar_2
+                     Output: bar_2.f2, bar_2.f1, bar_2.tableoid, bar_2.ctid, bar_2.*
+                     Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
          ->  Hash
                Output: foo.ctid, foo.f1, foo.*, foo.tableoid
                ->  HashAggregate
@@ -7294,7 +7280,7 @@ update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
                            ->  Foreign Scan on public.foo2 foo_2
                                  Output: foo_2.ctid, foo_2.f1, foo_2.*, foo_2.tableoid
                                  Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct1
-(39 rows)
+(25 rows)

 update bar set f2 = f2 + 100 where f1 in (select f1 from foo);
 select tableoid::regclass, * from bar order by 1,2;
@@ -7314,39 +7300,24 @@ update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
-                                      QUERY PLAN
---------------------------------------------------------------------------------------
+                                           QUERY PLAN
+------------------------------------------------------------------------------------------------
  Update on public.bar
-   Update on public.bar
-   Foreign Update on public.bar2 bar_1
+   Update on public.bar bar_1
+   Foreign Update on public.bar2 bar_2
      Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1
-   ->  Hash Join
-         Output: (bar.f2 + 100), bar.ctid, (ROW(foo.f1))
-         Hash Cond: (foo.f1 = bar.f1)
-         ->  Append
-               ->  Seq Scan on public.foo
-                     Output: ROW(foo.f1), foo.f1
-               ->  Foreign Scan on public.foo2 foo_1
-                     Output: ROW(foo_1.f1), foo_1.f1
-                     Remote SQL: SELECT f1 FROM public.loct1
-               ->  Seq Scan on public.foo foo_2
-                     Output: ROW((foo_2.f1 + 3)), (foo_2.f1 + 3)
-               ->  Foreign Scan on public.foo2 foo_3
-                     Output: ROW((foo_3.f1 + 3)), (foo_3.f1 + 3)
-                     Remote SQL: SELECT f1 FROM public.loct1
-         ->  Hash
-               Output: bar.f2, bar.ctid, bar.f1
-               ->  Seq Scan on public.bar
-                     Output: bar.f2, bar.ctid, bar.f1
    ->  Merge Join
-         Output: (bar_1.f2 + 100), bar_1.ctid, bar_1.*, (ROW(foo.f1))
-         Merge Cond: (bar_1.f1 = foo.f1)
+         Output: (bar.f2 + 100), (ROW(foo.f1)), bar.tableoid, bar.ctid, (NULL::record)
+         Merge Cond: (bar.f1 = foo.f1)
          ->  Sort
-               Output: bar_1.f2, bar_1.ctid, bar_1.*, bar_1.f1
-               Sort Key: bar_1.f1
-               ->  Foreign Scan on public.bar2 bar_1
-                     Output: bar_1.f2, bar_1.ctid, bar_1.*, bar_1.f1
-                     Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
+               Output: bar.f2, bar.f1, bar.tableoid, bar.ctid, (NULL::record)
+               Sort Key: bar.f1
+               ->  Append
+                     ->  Seq Scan on public.bar bar_1
+                           Output: bar_1.f2, bar_1.f1, bar_1.tableoid, bar_1.ctid, NULL::record
+                     ->  Foreign Scan on public.bar2 bar_2
+                           Output: bar_2.f2, bar_2.f1, bar_2.tableoid, bar_2.ctid, bar_2.*
+                           Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
          ->  Sort
                Output: (ROW(foo.f1)), foo.f1
                Sort Key: foo.f1
@@ -7361,7 +7332,7 @@ where bar.f1 = ss.f1;
                      ->  Foreign Scan on public.foo2 foo_3
                            Output: ROW((foo_3.f1 + 3)), (foo_3.f1 + 3)
                            Remote SQL: SELECT f1 FROM public.loct1
-(45 rows)
+(30 rows)

 update bar set f2 = f2 + 100
 from
@@ -7487,18 +7458,19 @@ ERROR:  WHERE CURRENT OF is not supported for this table type
 rollback;
 explain (verbose, costs off)
 delete from foo where f1 < 5 returning *;
-                                   QUERY PLAN
---------------------------------------------------------------------------------
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
  Delete on public.foo
-   Output: foo.f1, foo.f2
-   Delete on public.foo
-   Foreign Delete on public.foo2 foo_1
-   ->  Index Scan using i_foo_f1 on public.foo
-         Output: foo.ctid
-         Index Cond: (foo.f1 < 5)
-   ->  Foreign Delete on public.foo2 foo_1
-         Remote SQL: DELETE FROM public.loct1 WHERE ((f1 < 5)) RETURNING f1, f2
-(9 rows)
+   Output: foo_1.f1, foo_1.f2
+   Delete on public.foo foo_1
+   Foreign Delete on public.foo2 foo_2
+   ->  Append
+         ->  Index Scan using i_foo_f1 on public.foo foo_1
+               Output: foo_1.tableoid, foo_1.ctid
+               Index Cond: (foo_1.f1 < 5)
+         ->  Foreign Delete on public.foo2 foo_2
+               Remote SQL: DELETE FROM public.loct1 WHERE ((f1 < 5)) RETURNING f1, f2
+(10 rows)

 delete from foo where f1 < 5 returning *;
  f1 | f2
@@ -7512,17 +7484,20 @@ delete from foo where f1 < 5 returning *;

 explain (verbose, costs off)
 update bar set f2 = f2 + 100 returning *;
-                                  QUERY PLAN
-------------------------------------------------------------------------------
+                                        QUERY PLAN
+------------------------------------------------------------------------------------------
  Update on public.bar
-   Output: bar.f1, bar.f2
-   Update on public.bar
-   Foreign Update on public.bar2 bar_1
-   ->  Seq Scan on public.bar
-         Output: (bar.f2 + 100), bar.ctid
-   ->  Foreign Update on public.bar2 bar_1
-         Remote SQL: UPDATE public.loct2 SET f2 = (f2 + 100) RETURNING f1, f2
-(8 rows)
+   Output: bar_1.f1, bar_1.f2
+   Update on public.bar bar_1
+   Foreign Update on public.bar2 bar_2
+   ->  Result
+         Output: (bar.f2 + 100), bar.tableoid, bar.ctid, (NULL::record)
+         ->  Append
+               ->  Seq Scan on public.bar bar_1
+                     Output: bar_1.f2, bar_1.tableoid, bar_1.ctid, NULL::record
+               ->  Foreign Update on public.bar2 bar_2
+                     Remote SQL: UPDATE public.loct2 SET f2 = (f2 + 100) RETURNING f1, f2
+(11 rows)

 update bar set f2 = f2 + 100 returning *;
  f1 | f2
@@ -7547,15 +7522,18 @@ update bar set f2 = f2 + 100;
                                                QUERY PLAN
 --------------------------------------------------------------------------------------------------------
  Update on public.bar
-   Update on public.bar
-   Foreign Update on public.bar2 bar_1
+   Update on public.bar bar_1
+   Foreign Update on public.bar2 bar_2
      Remote SQL: UPDATE public.loct2 SET f1 = $2, f2 = $3, f3 = $4 WHERE ctid = $1 RETURNING f1, f2, f3
-   ->  Seq Scan on public.bar
-         Output: (bar.f2 + 100), bar.ctid
-   ->  Foreign Scan on public.bar2 bar_1
-         Output: (bar_1.f2 + 100), bar_1.ctid, bar_1.*
-         Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
-(9 rows)
+   ->  Result
+         Output: (bar.f2 + 100), bar.tableoid, bar.ctid, (NULL::record)
+         ->  Append
+               ->  Seq Scan on public.bar bar_1
+                     Output: bar_1.f2, bar_1.tableoid, bar_1.ctid, NULL::record
+               ->  Foreign Scan on public.bar2 bar_2
+                     Output: bar_2.f2, bar_2.tableoid, bar_2.ctid, bar_2.*
+                     Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
+(12 rows)

 update bar set f2 = f2 + 100;
 NOTICE:  trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2
@@ -7572,19 +7550,20 @@ NOTICE:  trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2
 NOTICE:  OLD: (7,277,77),NEW: (7,377,77)
 explain (verbose, costs off)
 delete from bar where f2 < 400;
-                                         QUERY PLAN
----------------------------------------------------------------------------------------------
+                                            QUERY PLAN
+---------------------------------------------------------------------------------------------------
  Delete on public.bar
-   Delete on public.bar
-   Foreign Delete on public.bar2 bar_1
+   Delete on public.bar bar_1
+   Foreign Delete on public.bar2 bar_2
      Remote SQL: DELETE FROM public.loct2 WHERE ctid = $1 RETURNING f1, f2, f3
-   ->  Seq Scan on public.bar
-         Output: bar.ctid
-         Filter: (bar.f2 < 400)
-   ->  Foreign Scan on public.bar2 bar_1
-         Output: bar_1.ctid, bar_1.*
-         Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 WHERE ((f2 < 400)) FOR UPDATE
-(10 rows)
+   ->  Append
+         ->  Seq Scan on public.bar bar_1
+               Output: bar_1.tableoid, bar_1.ctid, NULL::record
+               Filter: (bar_1.f2 < 400)
+         ->  Foreign Scan on public.bar2 bar_2
+               Output: bar_2.tableoid, bar_2.ctid, bar_2.*
+               Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 WHERE ((f2 < 400)) FOR UPDATE
+(11 rows)

 delete from bar where f2 < 400;
 NOTICE:  trig_row_before(23, skidoo) BEFORE ROW DELETE ON bar2
@@ -7615,23 +7594,28 @@ analyze remt1;
 analyze remt2;
 explain (verbose, costs off)
 update parent set b = parent.b || remt2.b from remt2 where parent.a = remt2.a returning *;
-                                                                  QUERY PLAN
                        

------------------------------------------------------------------------------------------------------------------------------------------------
+                                                   QUERY PLAN
+----------------------------------------------------------------------------------------------------------------
  Update on public.parent
-   Output: parent.a, parent.b, remt2.a, remt2.b
-   Update on public.parent
-   Foreign Update on public.remt1 parent_1
+   Output: parent_1.a, parent_1.b, remt2.a, remt2.b
+   Update on public.parent parent_1
+   Foreign Update on public.remt1 parent_2
+     Remote SQL: UPDATE public.loct1 SET b = $2 WHERE ctid = $1 RETURNING a, b
    ->  Nested Loop
-         Output: (parent.b || remt2.b), parent.ctid, remt2.*, remt2.a, remt2.b
+         Output: (parent.b || remt2.b), remt2.*, remt2.a, remt2.b, parent.tableoid, parent.ctid, (NULL::record)
          Join Filter: (parent.a = remt2.a)
-         ->  Seq Scan on public.parent
-               Output: parent.b, parent.ctid, parent.a
-         ->  Foreign Scan on public.remt2
+         ->  Append
+               ->  Seq Scan on public.parent parent_1
+                     Output: parent_1.b, parent_1.a, parent_1.tableoid, parent_1.ctid, NULL::record
+               ->  Foreign Scan on public.remt1 parent_2
+                     Output: parent_2.b, parent_2.a, parent_2.tableoid, parent_2.ctid, parent_2.*
+                     Remote SQL: SELECT a, b, ctid FROM public.loct1 FOR UPDATE
+         ->  Materialize
                Output: remt2.b, remt2.*, remt2.a
-               Remote SQL: SELECT a, b FROM public.loct2
-   ->  Foreign Update
-         Remote SQL: UPDATE public.loct1 r4 SET b = (r4.b || r2.b) FROM public.loct2 r2 WHERE ((r4.a = r2.a))
RETURNINGr4.a, r4.b, r2.a, r2.b 
-(14 rows)
+               ->  Foreign Scan on public.remt2
+                     Output: remt2.b, remt2.*, remt2.a
+                     Remote SQL: SELECT a, b FROM public.loct2
+(19 rows)

 update parent set b = parent.b || remt2.b from remt2 where parent.a = remt2.a returning *;
  a |   b    | a |  b
@@ -7642,23 +7626,28 @@ update parent set b = parent.b || remt2.b from remt2 where parent.a = remt2.a re

 explain (verbose, costs off)
 delete from parent using remt2 where parent.a = remt2.a returning parent;
-                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
+                                 QUERY PLAN
+-----------------------------------------------------------------------------
  Delete on public.parent
-   Output: parent.*
-   Delete on public.parent
-   Foreign Delete on public.remt1 parent_1
+   Output: parent_1.*
+   Delete on public.parent parent_1
+   Foreign Delete on public.remt1 parent_2
+     Remote SQL: DELETE FROM public.loct1 WHERE ctid = $1 RETURNING a, b
    ->  Nested Loop
-         Output: parent.ctid, remt2.*
+         Output: remt2.*, parent.tableoid, parent.ctid
          Join Filter: (parent.a = remt2.a)
-         ->  Seq Scan on public.parent
-               Output: parent.ctid, parent.a
-         ->  Foreign Scan on public.remt2
+         ->  Append
+               ->  Seq Scan on public.parent parent_1
+                     Output: parent_1.a, parent_1.tableoid, parent_1.ctid
+               ->  Foreign Scan on public.remt1 parent_2
+                     Output: parent_2.a, parent_2.tableoid, parent_2.ctid
+                     Remote SQL: SELECT a, ctid FROM public.loct1 FOR UPDATE
+         ->  Materialize
                Output: remt2.*, remt2.a
-               Remote SQL: SELECT a, b FROM public.loct2
-   ->  Foreign Delete
-         Remote SQL: DELETE FROM public.loct1 r4 USING public.loct2 r2 WHERE ((r4.a = r2.a)) RETURNING r4.a, r4.b
-(14 rows)
+               ->  Foreign Scan on public.remt2
+                     Output: remt2.*, remt2.a
+                     Remote SQL: SELECT a, b FROM public.loct2
+(19 rows)

 delete from parent using remt2 where parent.a = remt2.a returning parent;
    parent
@@ -7837,29 +7826,25 @@ DETAIL:  Failing row contains (2, foo).
 CONTEXT:  remote SQL command: UPDATE public.loct SET a = 2 WHERE ((b = 'foo'::text)) RETURNING a, b
 -- But the reverse is allowed
 update utrtest set a = 1 where b = 'qux' returning *;
- a |  b
----+-----
- 1 | qux
-(1 row)
-
+ERROR:  cannot route tuples into foreign table to be updated "remp"
 select tableoid::regclass, * FROM utrtest;
  tableoid | a |  b
 ----------+---+-----
  remp     | 1 | foo
- remp     | 1 | qux
+ locp     | 2 | qux
 (2 rows)

 select tableoid::regclass, * FROM remp;
  tableoid | a |  b
 ----------+---+-----
  remp     | 1 | foo
- remp     | 1 | qux
-(2 rows)
+(1 row)

 select tableoid::regclass, * FROM locp;
- tableoid | a | b
-----------+---+---
-(0 rows)
+ tableoid | a |  b
+----------+---+-----
+ locp     | 2 | qux
+(1 row)

 -- The executor should not let unexercised FDWs shut down
 update utrtest set a = 1 where b = 'foo';
@@ -7871,38 +7856,35 @@ insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition is a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
-                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
+                                             QUERY PLAN
+----------------------------------------------------------------------------------------------------
  Update on public.utrtest
    Output: utrtest_1.a, utrtest_1.b
    Foreign Update on public.remp utrtest_1
    Update on public.locp utrtest_2
-   ->  Foreign Update on public.remp utrtest_1
-         Remote SQL: UPDATE public.loct SET a = 1 WHERE (((a = 1) OR (a = 2))) RETURNING a, b
-   ->  Seq Scan on public.locp utrtest_2
-         Output: 1, utrtest_2.ctid
-         Filter: ((utrtest_2.a = 1) OR (utrtest_2.a = 2))
-(9 rows)
+   ->  Append
+         ->  Foreign Update on public.remp utrtest_1
+               Remote SQL: UPDATE public.loct SET a = 1 WHERE (((a = 1) OR (a = 2))) RETURNING a, b
+         ->  Seq Scan on public.locp utrtest_2
+               Output: 1, utrtest_2.tableoid, utrtest_2.ctid, NULL::record
+               Filter: ((utrtest_2.a = 1) OR (utrtest_2.a = 2))
+(10 rows)

 -- The new values are concatenated with ' triggered !'
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
- a |        b
----+-----------------
- 1 | qux triggered !
-(1 row)
-
+ERROR:  cannot route tuples into foreign table to be updated "remp"
 delete from utrtest;
 insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition isn't a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 2 returning *;
-               QUERY PLAN
------------------------------------------
+                      QUERY PLAN
+-------------------------------------------------------
  Update on public.utrtest
    Output: utrtest_1.a, utrtest_1.b
    Update on public.locp utrtest_1
    ->  Seq Scan on public.locp utrtest_1
-         Output: 1, utrtest_1.ctid
+         Output: 1, utrtest_1.tableoid, utrtest_1.ctid
          Filter: (utrtest_1.a = 2)
 (6 rows)

@@ -7923,61 +7905,51 @@ insert into utrtest values (2, 'qux');
 -- with a direct modification plan
 explain (verbose, costs off)
 update utrtest set a = 1 returning *;
-                           QUERY PLAN
------------------------------------------------------------------
+                                QUERY PLAN
+---------------------------------------------------------------------------
  Update on public.utrtest
    Output: utrtest_1.a, utrtest_1.b
    Foreign Update on public.remp utrtest_1
    Update on public.locp utrtest_2
-   ->  Foreign Update on public.remp utrtest_1
-         Remote SQL: UPDATE public.loct SET a = 1 RETURNING a, b
-   ->  Seq Scan on public.locp utrtest_2
-         Output: 1, utrtest_2.ctid
-(8 rows)
+   ->  Append
+         ->  Foreign Update on public.remp utrtest_1
+               Remote SQL: UPDATE public.loct SET a = 1 RETURNING a, b
+         ->  Seq Scan on public.locp utrtest_2
+               Output: 1, utrtest_2.tableoid, utrtest_2.ctid, NULL::record
+(9 rows)

 update utrtest set a = 1 returning *;
- a |  b
----+-----
- 1 | foo
- 1 | qux
-(2 rows)
-
+ERROR:  cannot route tuples into foreign table to be updated "remp"
 delete from utrtest;
 insert into utrtest values (1, 'foo');
 insert into utrtest values (2, 'qux');
 -- with a non-direct modification plan
 explain (verbose, costs off)
 update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
-                                    QUERY PLAN
-----------------------------------------------------------------------------------
+                                           QUERY PLAN
+------------------------------------------------------------------------------------------------
  Update on public.utrtest
    Output: utrtest_1.a, utrtest_1.b, "*VALUES*".column1
    Foreign Update on public.remp utrtest_1
      Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
    Update on public.locp utrtest_2
    ->  Hash Join
-         Output: 1, utrtest_1.ctid, utrtest_1.*, "*VALUES*".*, "*VALUES*".column1
-         Hash Cond: (utrtest_1.a = "*VALUES*".column1)
-         ->  Foreign Scan on public.remp utrtest_1
-               Output: utrtest_1.ctid, utrtest_1.*, utrtest_1.a
-               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
-         ->  Hash
-               Output: "*VALUES*".*, "*VALUES*".column1
-               ->  Values Scan on "*VALUES*"
-                     Output: "*VALUES*".*, "*VALUES*".column1
-   ->  Hash Join
-         Output: 1, utrtest_2.ctid, "*VALUES*".*, "*VALUES*".column1
-         Hash Cond: (utrtest_2.a = "*VALUES*".column1)
-         ->  Seq Scan on public.locp utrtest_2
-               Output: utrtest_2.ctid, utrtest_2.a
+         Output: 1, "*VALUES*".*, "*VALUES*".column1, utrtest.tableoid, utrtest.ctid, utrtest.*
+         Hash Cond: (utrtest.a = "*VALUES*".column1)
+         ->  Append
+               ->  Foreign Scan on public.remp utrtest_1
+                     Output: utrtest_1.a, utrtest_1.tableoid, utrtest_1.ctid, utrtest_1.*
+                     Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+               ->  Seq Scan on public.locp utrtest_2
+                     Output: utrtest_2.a, utrtest_2.tableoid, utrtest_2.ctid, NULL::record
          ->  Hash
                Output: "*VALUES*".*, "*VALUES*".column1
                ->  Values Scan on "*VALUES*"
                      Output: "*VALUES*".*, "*VALUES*".column1
-(24 rows)
+(18 rows)

 update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
-ERROR:  invalid attribute number 5
+ERROR:  cannot route tuples into foreign table to be updated "remp"
 -- Change the definition of utrtest so that the foreign partition get updated
 -- after the local partition
 delete from utrtest;
@@ -7993,50 +7965,45 @@ insert into utrtest values (3, 'xyzzy');
 -- with a direct modification plan
 explain (verbose, costs off)
 update utrtest set a = 3 returning *;
-                           QUERY PLAN
------------------------------------------------------------------
+                                QUERY PLAN
+---------------------------------------------------------------------------
  Update on public.utrtest
    Output: utrtest_1.a, utrtest_1.b
    Update on public.locp utrtest_1
    Foreign Update on public.remp utrtest_2
-   ->  Seq Scan on public.locp utrtest_1
-         Output: 3, utrtest_1.ctid
-   ->  Foreign Update on public.remp utrtest_2
-         Remote SQL: UPDATE public.loct SET a = 3 RETURNING a, b
-(8 rows)
+   ->  Append
+         ->  Seq Scan on public.locp utrtest_1
+               Output: 3, utrtest_1.tableoid, utrtest_1.ctid, NULL::record
+         ->  Foreign Update on public.remp utrtest_2
+               Remote SQL: UPDATE public.loct SET a = 3 RETURNING a, b
+(9 rows)

 update utrtest set a = 3 returning *; -- ERROR
 ERROR:  cannot route tuples into foreign table to be updated "remp"
 -- with a non-direct modification plan
 explain (verbose, costs off)
 update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
-                                    QUERY PLAN
-----------------------------------------------------------------------------------
+                                             QUERY PLAN
+-----------------------------------------------------------------------------------------------------
  Update on public.utrtest
    Output: utrtest_1.a, utrtest_1.b, "*VALUES*".column1
    Update on public.locp utrtest_1
    Foreign Update on public.remp utrtest_2
      Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
    ->  Hash Join
-         Output: 3, utrtest_1.ctid, "*VALUES*".*, "*VALUES*".column1
-         Hash Cond: (utrtest_1.a = "*VALUES*".column1)
-         ->  Seq Scan on public.locp utrtest_1
-               Output: utrtest_1.ctid, utrtest_1.a
-         ->  Hash
-               Output: "*VALUES*".*, "*VALUES*".column1
-               ->  Values Scan on "*VALUES*"
-                     Output: "*VALUES*".*, "*VALUES*".column1
-   ->  Hash Join
-         Output: 3, utrtest_2.ctid, utrtest_2.*, "*VALUES*".*, "*VALUES*".column1
-         Hash Cond: (utrtest_2.a = "*VALUES*".column1)
-         ->  Foreign Scan on public.remp utrtest_2
-               Output: utrtest_2.ctid, utrtest_2.*, utrtest_2.a
-               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+         Output: 3, "*VALUES*".*, "*VALUES*".column1, utrtest.tableoid, utrtest.ctid, (NULL::record)
+         Hash Cond: (utrtest.a = "*VALUES*".column1)
+         ->  Append
+               ->  Seq Scan on public.locp utrtest_1
+                     Output: utrtest_1.a, utrtest_1.tableoid, utrtest_1.ctid, NULL::record
+               ->  Foreign Scan on public.remp utrtest_2
+                     Output: utrtest_2.a, utrtest_2.tableoid, utrtest_2.ctid, utrtest_2.*
+                     Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
          ->  Hash
                Output: "*VALUES*".*, "*VALUES*".column1
                ->  Values Scan on "*VALUES*"
                      Output: "*VALUES*".*, "*VALUES*".column1
-(24 rows)
+(18 rows)

 update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *; -- ERROR
 ERROR:  cannot route tuples into foreign table to be updated "remp"
@@ -9423,11 +9390,12 @@ CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
 INSERT INTO batch_cp_upd_test VALUES (1), (2);
 -- The following moves a row from the local partition to the foreign one
 UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+ERROR:  cannot route tuples into foreign table to be updated "batch_cp_upd_test1_f"
 SELECT tableoid::regclass, * FROM batch_cp_upd_test;
        tableoid       | a
 ----------------------+---
  batch_cp_upd_test1_f | 1
- batch_cp_upd_test1_f | 1
+ batch_cp_up_test1    | 2
 (2 rows)

 -- Clean up
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 6ba6786c8b..348f7d8d65 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -26,6 +26,7 @@
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
@@ -337,7 +338,8 @@ static void postgresBeginForeignScan(ForeignScanState *node, int eflags);
 static TupleTableSlot *postgresIterateForeignScan(ForeignScanState *node);
 static void postgresReScanForeignScan(ForeignScanState *node);
 static void postgresEndForeignScan(ForeignScanState *node);
-static void postgresAddForeignUpdateTargets(Query *parsetree,
+static void postgresAddForeignUpdateTargets(PlannerInfo *root,
+                                            Index rtindex,
                                             RangeTblEntry *target_rte,
                                             Relation target_relation);
 static List *postgresPlanForeignModify(PlannerInfo *root,
@@ -1637,36 +1639,27 @@ postgresEndForeignScan(ForeignScanState *node)
  *        Add resjunk column(s) needed for update/delete on a foreign table
  */
 static void
-postgresAddForeignUpdateTargets(Query *parsetree,
+postgresAddForeignUpdateTargets(PlannerInfo *root,
+                                Index rtindex,
                                 RangeTblEntry *target_rte,
                                 Relation target_relation)
 {
     Var           *var;
-    const char *attrname;
-    TargetEntry *tle;

     /*
      * In postgres_fdw, what we need is the ctid, same as for a regular table.
      */

     /* Make a Var representing the desired value */
-    var = makeVar(parsetree->resultRelation,
+    var = makeVar(rtindex,
                   SelfItemPointerAttributeNumber,
                   TIDOID,
                   -1,
                   InvalidOid,
                   0);

-    /* Wrap it in a resjunk TLE with the right name ... */
-    attrname = "ctid";
-
-    tle = makeTargetEntry((Expr *) var,
-                          list_length(parsetree->targetList) + 1,
-                          pstrdup(attrname),
-                          true);
-
-    /* ... and add it to the query's targetlist */
-    parsetree->targetList = lappend(parsetree->targetList, tle);
+    /* Register it as a row-identity column needed by this target rel */
+    add_row_identity_var(root, var, rtindex, "ctid");
 }

 /*
@@ -1854,7 +1847,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
                                     rte,
                                     resultRelInfo,
                                     mtstate->operation,
-                                    mtstate->mt_plans[subplan_index]->plan,
+                                    outerPlanState(mtstate)->plan,
                                     query,
                                     target_attrs,
                                     values_end_len,
@@ -2054,8 +2047,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
      */
     if (plan && plan->operation == CMD_UPDATE &&
         (resultRelInfo->ri_usesFdwDirectModify ||
-         resultRelInfo->ri_FdwState) &&
-        resultRelInfo > mtstate->resultRelInfo + mtstate->mt_whichplan)
+         resultRelInfo->ri_FdwState))
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                  errmsg("cannot route tuples into foreign table to be updated \"%s\"",
@@ -2251,6 +2243,65 @@ postgresRecheckForeignScan(ForeignScanState *node, TupleTableSlot *slot)
     return true;
 }

+/*
+ * find_modifytable_subplan
+ *        Helper routine for postgresPlanDirectModify to find the
+ *        ModifyTable subplan node that scans the specified RTI.
+ *
+ * Returns NULL if the subplan couldn't be identified.  That's not a fatal
+ * error condition, we just abandon trying to do the update directly.
+ */
+static ForeignScan *
+find_modifytable_subplan(PlannerInfo *root,
+                         ModifyTable *plan,
+                         Index rtindex,
+                         int subplan_index)
+{
+    Plan       *subplan = outerPlan(plan);
+
+    /*
+     * The cases we support are (1) the desired ForeignScan is the immediate
+     * child of ModifyTable, or (2) it is the subplan_index'th child of an
+     * Append node that is the immediate child of ModifyTable.  There is no
+     * point in looking further down, as that would mean that local joins are
+     * involved, so we can't do the update directly.
+     *
+     * There could be a Result atop the Append too, acting to compute the
+     * UPDATE targetlist values.  We ignore that here; the tlist will be
+     * checked by our caller.
+     *
+     * In principle we could examine all the children of Append etc, but it's
+     * currently unlikely that the core planner would generate such a plan
+     * with the children out-of-order.  Moreover, such a search risks costing
+     * O(N^2) time when there are a lot of children.
+     */
+    if (IsA(subplan, Append))
+    {
+        Append       *appendplan = (Append *) subplan;
+
+        if (subplan_index < list_length(appendplan->appendplans))
+            subplan = (Plan *) list_nth(appendplan->appendplans, subplan_index);
+    }
+    else if (IsA(subplan, Result) && IsA(outerPlan(subplan), Append))
+    {
+        Append       *appendplan = (Append *) outerPlan(subplan);
+
+        if (subplan_index < list_length(appendplan->appendplans))
+            subplan = (Plan *) list_nth(appendplan->appendplans, subplan_index);
+    }
+
+    /* Now, have we got a ForeignScan on the desired rel? */
+    if (IsA(subplan, ForeignScan))
+    {
+        ForeignScan *fscan = (ForeignScan *) subplan;
+
+        if (bms_is_member(rtindex, fscan->fs_relids))
+            return fscan;
+    }
+
+    return NULL;
+}
+
 /*
  * postgresPlanDirectModify
  *        Consider a direct foreign table modification
@@ -2265,13 +2316,13 @@ postgresPlanDirectModify(PlannerInfo *root,
                          int subplan_index)
 {
     CmdType        operation = plan->operation;
-    Plan       *subplan;
     RelOptInfo *foreignrel;
     RangeTblEntry *rte;
     PgFdwRelationInfo *fpinfo;
     Relation    rel;
     StringInfoData sql;
     ForeignScan *fscan;
+    List       *processed_tlist = NIL;
     List       *targetAttrs = NIL;
     List       *remote_exprs;
     List       *params_list = NIL;
@@ -2289,19 +2340,17 @@ postgresPlanDirectModify(PlannerInfo *root,
         return false;

     /*
-     * It's unsafe to modify a foreign table directly if there are any local
-     * joins needed.
+     * Try to locate the ForeignScan subplan that's scanning resultRelation.
      */
-    subplan = (Plan *) list_nth(plan->plans, subplan_index);
-    if (!IsA(subplan, ForeignScan))
+    fscan = find_modifytable_subplan(root, plan, resultRelation, subplan_index);
+    if (!fscan)
         return false;
-    fscan = (ForeignScan *) subplan;

     /*
      * It's unsafe to modify a foreign table directly if there are any quals
      * that should be evaluated locally.
      */
-    if (subplan->qual != NIL)
+    if (fscan->scan.plan.qual != NIL)
         return false;

     /* Safe to fetch data about the target foreign rel */
@@ -2322,14 +2371,16 @@ postgresPlanDirectModify(PlannerInfo *root,
      */
     if (operation == CMD_UPDATE)
     {
-        ListCell *lc, *lc2;
+        ListCell   *lc,
+                   *lc2;

         /*
-         * The expressions of concern are the first N columns of the subplan
-         * targetlist, where N is the length of root->update_colnos.
+         * The expressions of concern are the first N columns of the processed
+         * targetlist, where N is the length of the rel's update_colnos.
          */
-        targetAttrs = root->update_colnos;
-        forboth(lc, subplan->targetlist, lc2, targetAttrs)
+        get_translated_update_targetlist(root, resultRelation,
+                                         &processed_tlist, &targetAttrs);
+        forboth(lc, processed_tlist, lc2, targetAttrs)
         {
             TargetEntry *tle = lfirst_node(TargetEntry, lc);
             AttrNumber attno = lfirst_int(lc2);
@@ -2392,7 +2443,7 @@ postgresPlanDirectModify(PlannerInfo *root,
         case CMD_UPDATE:
             deparseDirectUpdateSql(&sql, root, resultRelation, rel,
                                    foreignrel,
-                                   ((Plan *) fscan)->targetlist,
+                                   processed_tlist,
                                    targetAttrs,
                                    remote_exprs, ¶ms_list,
                                    returningList, &retrieved_attrs);
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6989957d50..71504791f3 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -424,7 +424,8 @@ GetForeignUpperPaths(PlannerInfo *root,
     <para>
 <programlisting>
 void
-AddForeignUpdateTargets(Query *parsetree,
+AddForeignUpdateTargets(PlannerInfo *root,
+                        Index rtindex,
                         RangeTblEntry *target_rte,
                         Relation target_relation);
 </programlisting>
@@ -440,27 +441,31 @@ AddForeignUpdateTargets(Query *parsetree,
     </para>

     <para>
-     To do that, add <structname>TargetEntry</structname> items to
-     <literal>parsetree->targetList</literal>, containing expressions for the
-     extra values to be fetched.  Each such entry must be marked
-     <structfield>resjunk</structfield> = <literal>true</literal>, and must have a distinct
-     <structfield>resname</structfield> that will identify it at execution time.
-     Avoid using names matching <literal>ctid<replaceable>N</replaceable></literal>,
-     <literal>wholerow</literal>, or
-     <literal>wholerow<replaceable>N</replaceable></literal>, as the core system can
-     generate junk columns of these names.
-     If the extra expressions are more complex than simple Vars, they
-     must be run through <function>eval_const_expressions</function>
-     before adding them to the target list.
-    </para>
-
-    <para>
-     Although this function is called during planning, the
-     information provided is a bit different from that available to other
-     planning routines.
-     <literal>parsetree</literal> is the parse tree for the <command>UPDATE</command> or
-     <command>DELETE</command> command, while <literal>target_rte</literal> and
-     <literal>target_relation</literal> describe the target foreign table.
+     To do that, construct a <structname>Var</structname> representing
+     an extra value you need, and pass it
+     to <function>add_row_identity_var</function>, along with a name for
+     the junk column.  (You can do this more than once if several columns
+     are needed.)  You must choose a distinct junk column name for each
+     different <structname>Var</structname> you need, except
+     that <structname>Var</structname>s that are identical except for
+     the <structfield>varno</structfield> field can and should share a
+     column name.
+     The core system uses the junk column names
+     <literal>tableoid</literal> for a
+     table's <structfield>tableoid</structfield> column,
+     <literal>ctid</literal>
+     or <literal>ctid<replaceable>N</replaceable></literal>
+     for <structfield>ctid</structfield>,
+     <literal>wholerow</literal>
+     for a whole-row <structname>Var</structname> marked with
+     <structfield>vartype</structfield> = <type>RECORD</type>,
+     and <literal>wholerow<replaceable>N</replaceable></literal>
+     for a whole-row <structname>Var</structname> with
+     <structfield>vartype</structfield> equal to the table's declared rowtype.
+     Re-use these names when you can (the planner will combine duplicate
+     requests for identical junk columns).  If you need another kind of
+     junk column besides these, it might be wise to choose a name prefixed
+     with your extension name, to avoid conflicts against other FDWs.
     </para>

     <para>
@@ -495,8 +500,8 @@ PlanForeignModify(PlannerInfo *root,
      <literal>resultRelation</literal> identifies the target foreign table by its
      range table index.  <literal>subplan_index</literal> identifies which target of
      the <structname>ModifyTable</structname> plan node this is, counting from zero;
-     use this if you want to index into <literal>plan->plans</literal> or other
-     substructure of the <literal>plan</literal> node.
+     use this if you want to index into per-target-relation substructures of the
+     <literal>plan</literal> node.
     </para>

     <para>
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 07aa25799d..4942dbfc18 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -78,7 +78,7 @@
   invoked by <command>UPDATE</command> statements executed on partitioned
   tables, but it currently does not handle the case where a remote partition
   chosen to insert a moved row into is also an <command>UPDATE</command>
-  target partition that will be updated later.
+  target partition that will be updated elsewhere in the same command.
  </para>

  <para>
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2ed696d429..74dbb709fe 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -666,6 +666,7 @@ CopyFrom(CopyFromState cstate)
     mtstate->ps.plan = NULL;
     mtstate->ps.state = estate;
     mtstate->operation = CMD_INSERT;
+    mtstate->mt_nrels = 1;
     mtstate->resultRelInfo = resultRelInfo;
     mtstate->rootResultRelInfo = resultRelInfo;

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index afc45429ba..0b1808d503 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2078,7 +2078,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
     haschildren = planstate->initPlan ||
         outerPlanState(planstate) ||
         innerPlanState(planstate) ||
-        IsA(plan, ModifyTable) ||
         IsA(plan, Append) ||
         IsA(plan, MergeAppend) ||
         IsA(plan, BitmapAnd) ||
@@ -2111,11 +2110,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
     /* special child plans */
     switch (nodeTag(plan))
     {
-        case T_ModifyTable:
-            ExplainMemberNodes(((ModifyTableState *) planstate)->mt_plans,
-                               ((ModifyTableState *) planstate)->mt_nplans,
-                               ancestors, es);
-            break;
         case T_Append:
             ExplainMemberNodes(((AppendState *) planstate)->appendplans,
                                ((AppendState *) planstate)->as_nplans,
@@ -3715,14 +3709,14 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
     }

     /* Should we explicitly label target relations? */
-    labeltargets = (mtstate->mt_nplans > 1 ||
-                    (mtstate->mt_nplans == 1 &&
+    labeltargets = (mtstate->mt_nrels > 1 ||
+                    (mtstate->mt_nrels == 1 &&
                      mtstate->resultRelInfo[0].ri_RangeTableIndex != node->nominalRelation));

     if (labeltargets)
         ExplainOpenGroup("Target Tables", "Target Tables", false, es);

-    for (j = 0; j < mtstate->mt_nplans; j++)
+    for (j = 0; j < mtstate->mt_nrels; j++)
     {
         ResultRelInfo *resultRelInfo = mtstate->resultRelInfo + j;
         FdwRoutine *fdwroutine = resultRelInfo->ri_FdwRoutine;
@@ -3817,10 +3811,10 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
             double        insert_path;
             double        other_path;

-            InstrEndLoop(mtstate->mt_plans[0]->instrument);
+            InstrEndLoop(outerPlanState(mtstate)->instrument);

             /* count the number of source rows */
-            total = mtstate->mt_plans[0]->instrument->ntuples;
+            total = outerPlanState(mtstate)->instrument->ntuples;
             other_path = mtstate->ps.instrument->ntuples2;
             insert_path = total - other_path;

@@ -3836,7 +3830,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
 }

 /*
- * Explain the constituent plans of a ModifyTable, Append, MergeAppend,
+ * Explain the constituent plans of an Append, MergeAppend,
  * BitmapAnd, or BitmapOr node.
  *
  * The ancestors list should already contain the immediate parent of these
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 18b2ac1865..4958452730 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -32,10 +32,14 @@ includes a RETURNING clause, the ModifyTable node delivers the computed
 RETURNING rows as output, otherwise it returns nothing.  Handling INSERT
 is pretty straightforward: the tuples returned from the plan tree below
 ModifyTable are inserted into the correct result relation.  For UPDATE,
-the plan tree returns the computed tuples to be updated, plus a "junk"
-(hidden) CTID column identifying which table row is to be replaced by each
-one.  For DELETE, the plan tree need only deliver a CTID column, and the
-ModifyTable node visits each of those rows and marks the row deleted.
+the plan tree returns the new values of the updated columns, plus "junk"
+(hidden) column(s) identifying which table row is to be updated.  The
+ModifyTable node must fetch that row to extract values for the unchanged
+columns, combine the values into a new row, and apply the update.  (For a
+heap table, the row-identity junk column is a CTID, but other things may
+be used for other table types.)  For DELETE, the plan tree need only deliver
+junk row-identity column(s), and the ModifyTable node visits each of those
+rows and marks the row deleted.

 XXX a great deal more documentation needs to be written here...

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ea1530e032..163242f54e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2416,7 +2416,8 @@ EvalPlanQualInit(EPQState *epqstate, EState *parentestate,
 /*
  * EvalPlanQualSetPlan -- set or change subplan of an EPQState.
  *
- * We need this so that ModifyTable can deal with multiple subplans.
+ * We used to need this so that ModifyTable could deal with multiple subplans.
+ * It could now be refactored out of existence.
  */
 void
 EvalPlanQualSetPlan(EPQState *epqstate, Plan *subplan, List *auxrowmarks)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 619aaffae4..558060e080 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -82,7 +82,7 @@
  *
  * subplan_resultrel_htab
  *        Hash table to store subplan ResultRelInfos by Oid.  This is used to
- *        cache ResultRelInfos from subplans of an UPDATE ModifyTable node;
+ *        cache ResultRelInfos from targets of an UPDATE ModifyTable node;
  *        NULL in other cases.  Some of these may be useful for tuple routing
  *        to save having to build duplicates.
  *
@@ -527,12 +527,12 @@ ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate,
     ctl.entrysize = sizeof(SubplanResultRelHashElem);
     ctl.hcxt = CurrentMemoryContext;

-    htab = hash_create("PartitionTupleRouting table", mtstate->mt_nplans,
+    htab = hash_create("PartitionTupleRouting table", mtstate->mt_nrels,
                        &ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
     proute->subplan_resultrel_htab = htab;

     /* Hash all subplans by their Oid */
-    for (i = 0; i < mtstate->mt_nplans; i++)
+    for (i = 0; i < mtstate->mt_nrels; i++)
     {
         ResultRelInfo *rri = &mtstate->resultRelInfo[i];
         bool        found;
@@ -628,10 +628,10 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
          */
         Assert((node->operation == CMD_INSERT &&
                 list_length(node->withCheckOptionLists) == 1 &&
-                list_length(node->plans) == 1) ||
+                list_length(node->resultRelations) == 1) ||
                (node->operation == CMD_UPDATE &&
                 list_length(node->withCheckOptionLists) ==
-                list_length(node->plans)));
+                list_length(node->resultRelations)));

         /*
          * Use the WCO list of the first plan as a reference to calculate
@@ -687,10 +687,10 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
         /* See the comment above for WCO lists. */
         Assert((node->operation == CMD_INSERT &&
                 list_length(node->returningLists) == 1 &&
-                list_length(node->plans) == 1) ||
+                list_length(node->resultRelations) == 1) ||
                (node->operation == CMD_UPDATE &&
                 list_length(node->returningLists) ==
-                list_length(node->plans)));
+                list_length(node->resultRelations)));

         /*
          * Use the RETURNING list of the first plan as a reference to
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index b9064bfe66..739ca8384e 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -19,14 +19,10 @@
  *        ExecReScanModifyTable - rescan the ModifyTable node
  *
  *     NOTES
- *        Each ModifyTable node contains a list of one or more subplans,
- *        much like an Append node.  There is one subplan per result relation.
- *        The key reason for this is that in an inherited UPDATE command, each
- *        result relation could have a different schema (more or different
- *        columns) requiring a different plan tree to produce it.  In an
- *        inherited DELETE, all the subplans should produce the same output
- *        rowtype, but we might still find that different plans are appropriate
- *        for different child relations.
+ *        The ModifyTable node receives input from its outerPlan, which is
+ *        the data to insert for INSERT cases, or the changed columns' new
+ *        values plus row-locating info for UPDATE cases, or just the
+ *        row-locating info for DELETE cases.
  *
  *        If the query specifies RETURNING, then the ModifyTable returns a
  *        RETURNING tuple after completing each row insert, update, or delete.
@@ -58,6 +54,12 @@
 #include "utils/rel.h"


+typedef struct MTTargetRelLookup
+{
+    Oid            relationOid;    /* hash key, must be first */
+    int            relationIndex;    /* rel's index in resultRelInfo[] array */
+} MTTargetRelLookup;
+
 static void ExecBatchInsert(ModifyTableState *mtstate,
                                  ResultRelInfo *resultRelInfo,
                                  TupleTableSlot **slots,
@@ -372,21 +374,39 @@ ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
 /*
  * ExecGetInsertNewTuple
  *        This prepares a "new" tuple ready to be inserted into given result
- *        relation by removing any junk columns of the plan's output tuple.
- *
- * Note: currently, this is really dead code, because INSERT cases don't
- * receive any junk columns so there's never a projection to be done.
+ *        relation, by removing any junk columns of the plan's output tuple
+ *        and (if necessary) coercing the tuple to the right tuple format.
  */
 static TupleTableSlot *
 ExecGetInsertNewTuple(ResultRelInfo *relinfo,
                       TupleTableSlot *planSlot)
 {
     ProjectionInfo *newProj = relinfo->ri_projectNew;
-    ExprContext   *econtext;
+    ExprContext *econtext;

+    /*
+     * If there's no projection to be done, just make sure the slot is of the
+     * right type for the target rel.  If the planSlot is the right type we
+     * can use it as-is, else copy the data into ri_newTupleSlot.
+     */
     if (newProj == NULL)
-        return planSlot;
+    {
+        if (relinfo->ri_newTupleSlot->tts_ops != planSlot->tts_ops)
+        {
+            ExecCopySlot(relinfo->ri_newTupleSlot, planSlot);
+            return relinfo->ri_newTupleSlot;
+        }
+        else
+            return planSlot;
+    }

+    /*
+     * Else project; since the projection output slot is ri_newTupleSlot, this
+     * will also fix any slot-type problem.
+     *
+     * Note: currently, this is dead code, because INSERT cases don't receive
+     * any junk columns so there's never a projection to be done.
+     */
     econtext = newProj->pi_exprContext;
     econtext->ecxt_outertuple = planSlot;
     return ExecProject(newProj);
@@ -396,8 +416,10 @@ ExecGetInsertNewTuple(ResultRelInfo *relinfo,
  * ExecGetUpdateNewTuple
  *        This prepares a "new" tuple by combining an UPDATE subplan's output
  *        tuple (which contains values of changed columns) with unchanged
- *        columns taken from the old tuple.  The subplan tuple might also
- *        contain junk columns, which are ignored.
+ *        columns taken from the old tuple.
+ *
+ * The subplan tuple might also contain junk columns, which are ignored.
+ * Note that the projection also ensures we have a slot of the right type.
  */
 TupleTableSlot *
 ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
@@ -405,9 +427,8 @@ ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
                       TupleTableSlot *oldSlot)
 {
     ProjectionInfo *newProj = relinfo->ri_projectNew;
-    ExprContext   *econtext;
+    ExprContext *econtext;

-    Assert(newProj != NULL);
     Assert(planSlot != NULL && !TTS_EMPTY(planSlot));
     Assert(oldSlot != NULL && !TTS_EMPTY(oldSlot));

@@ -1249,9 +1270,7 @@ static bool
 ExecCrossPartitionUpdate(ModifyTableState *mtstate,
                          ResultRelInfo *resultRelInfo,
                          ItemPointer tupleid, HeapTuple oldtuple,
-                         TupleTableSlot *slot,
-                         TupleTableSlot *oldSlot,
-                         TupleTableSlot *planSlot,
+                         TupleTableSlot *slot, TupleTableSlot *planSlot,
                          EPQState *epqstate, bool canSetTag,
                          TupleTableSlot **retry_slot,
                          TupleTableSlot **inserted_tuple)
@@ -1327,7 +1346,8 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
         else
         {
             /* Fetch the most recent version of old tuple. */
-            ExecClearTuple(oldSlot);
+            TupleTableSlot *oldSlot = resultRelInfo->ri_oldTupleSlot;
+
             if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
                                                tupleid,
                                                SnapshotAny,
@@ -1340,7 +1360,7 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
     }

     /*
-     * resultRelInfo is one of the per-subplan resultRelInfos.  So we should
+     * resultRelInfo is one of the per-relation resultRelInfos.  So we should
      * convert the tuple into root's tuple descriptor if needed, since
      * ExecInsert() starts the search from root.
      */
@@ -1384,10 +1404,10 @@ ExecCrossPartitionUpdate(ModifyTableState *mtstate,
  *        foreign table triggers; it is NULL when the foreign table has
  *        no relevant triggers.
  *
- *        slot contains the new tuple value to be stored, while oldSlot
- *        contains the old tuple being replaced.  planSlot is the output
- *        of the ModifyTable's subplan; we use it to access values from
- *        other input tables (for RETURNING), row-ID junk columns, etc.
+ *        slot contains the new tuple value to be stored.
+ *        planSlot is the output of the ModifyTable's subplan; we use it
+ *        to access values from other input tables (for RETURNING),
+ *        row-ID junk columns, etc.
  *
  *        Returns RETURNING result if any, otherwise NULL.
  * ----------------------------------------------------------------
@@ -1398,7 +1418,6 @@ ExecUpdate(ModifyTableState *mtstate,
            ItemPointer tupleid,
            HeapTuple oldtuple,
            TupleTableSlot *slot,
-           TupleTableSlot *oldSlot,
            TupleTableSlot *planSlot,
            EPQState *epqstate,
            EState *estate,
@@ -1536,8 +1555,8 @@ lreplace:;
              * the tuple we're trying to move has been concurrently updated.
              */
             retry = !ExecCrossPartitionUpdate(mtstate, resultRelInfo, tupleid,
-                                              oldtuple, slot, oldSlot,
-                                              planSlot, epqstate, canSetTag,
+                                              oldtuple, slot, planSlot,
+                                              epqstate, canSetTag,
                                               &retry_slot, &inserted_tuple);
             if (retry)
             {
@@ -1616,6 +1635,7 @@ lreplace:;
                 {
                     TupleTableSlot *inputslot;
                     TupleTableSlot *epqslot;
+                    TupleTableSlot *oldSlot;

                     if (IsolationUsesXactSnapshot())
                         ereport(ERROR,
@@ -1650,7 +1670,7 @@ lreplace:;
                                 return NULL;

                             /* Fetch the most recent version of old tuple. */
-                            ExecClearTuple(oldSlot);
+                            oldSlot = resultRelInfo->ri_oldTupleSlot;
                             if (!table_tuple_fetch_row_version(resultRelationDesc,
                                                                tupleid,
                                                                SnapshotAny,
@@ -1953,7 +1973,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
     /* Execute UPDATE with projection */
     *returning = ExecUpdate(mtstate, resultRelInfo, conflictTid, NULL,
                             resultRelInfo->ri_onConflict->oc_ProjSlot,
-                            existing, planSlot,
+                            planSlot,
                             &mtstate->mt_epqstate, mtstate->ps.state,
                             canSetTag);

@@ -2132,6 +2152,7 @@ ExecModifyTable(PlanState *pstate)
     PlanState  *subplanstate;
     TupleTableSlot *slot;
     TupleTableSlot *planSlot;
+    TupleTableSlot *oldSlot;
     ItemPointer tupleid;
     ItemPointerData tuple_ctid;
     HeapTupleData oldtupdata;
@@ -2173,11 +2194,11 @@ ExecModifyTable(PlanState *pstate)
     }

     /* Preload local variables */
-    resultRelInfo = node->resultRelInfo + node->mt_whichplan;
-    subplanstate = node->mt_plans[node->mt_whichplan];
+    resultRelInfo = node->resultRelInfo + node->mt_lastResultIndex;
+    subplanstate = outerPlanState(node);

     /*
-     * Fetch rows from subplan(s), and execute the required table modification
+     * Fetch rows from subplan, and execute the required table modification
      * for each row.
      */
     for (;;)
@@ -2200,29 +2221,61 @@ ExecModifyTable(PlanState *pstate)

         planSlot = ExecProcNode(subplanstate);

+        /* No more tuples to process? */
         if (TupIsNull(planSlot))
-        {
-            /* advance to next subplan if any */
-            node->mt_whichplan++;
-            if (node->mt_whichplan < node->mt_nplans)
-            {
-                resultRelInfo++;
-                subplanstate = node->mt_plans[node->mt_whichplan];
-                EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
-                                    node->mt_arowmarks[node->mt_whichplan]);
-                continue;
-            }
-            else
-                break;
-        }
+            break;

         /*
-         * Ensure input tuple is the right format for the target relation.
+         * When there are multiple result relations, each tuple contains a
+         * junk column that gives the OID of the rel from which it came.
+         * Extract it and select the correct result relation.
          */
-        if (node->mt_scans[node->mt_whichplan]->tts_ops != planSlot->tts_ops)
+        if (AttributeNumberIsValid(node->mt_resultOidAttno))
         {
-            ExecCopySlot(node->mt_scans[node->mt_whichplan], planSlot);
-            planSlot = node->mt_scans[node->mt_whichplan];
+            Datum        datum;
+            bool        isNull;
+            Oid            resultoid;
+
+            datum = ExecGetJunkAttribute(planSlot, node->mt_resultOidAttno,
+                                         &isNull);
+            if (isNull)
+                elog(ERROR, "tableoid is NULL");
+            resultoid = DatumGetObjectId(datum);
+
+            /* If it's not the same as last time, we need to locate the rel */
+            if (resultoid != node->mt_lastResultOid)
+            {
+                if (node->mt_resultOidHash)
+                {
+                    /* Use the pre-built hash table to locate the rel */
+                    MTTargetRelLookup *mtlookup;
+
+                    mtlookup = (MTTargetRelLookup *)
+                        hash_search(node->mt_resultOidHash, &resultoid,
+                                    HASH_FIND, NULL);
+                    if (!mtlookup)
+                        elog(ERROR, "incorrect result rel OID %u", resultoid);
+                    node->mt_lastResultOid = resultoid;
+                    node->mt_lastResultIndex = mtlookup->relationIndex;
+                    resultRelInfo = node->resultRelInfo + mtlookup->relationIndex;
+                }
+                else
+                {
+                    /* With few target rels, just do a simple search */
+                    int            ndx;
+
+                    for (ndx = 0; ndx < node->mt_nrels; ndx++)
+                    {
+                        resultRelInfo = node->resultRelInfo + ndx;
+                        if (RelationGetRelid(resultRelInfo->ri_RelationDesc) == resultoid)
+                            break;
+                    }
+                    if (ndx >= node->mt_nrels)
+                        elog(ERROR, "incorrect result rel OID %u", resultoid);
+                    node->mt_lastResultOid = resultoid;
+                    node->mt_lastResultIndex = ndx;
+                }
+            }
         }

         /*
@@ -2333,39 +2386,34 @@ ExecModifyTable(PlanState *pstate)
                                   estate, node->canSetTag);
                 break;
             case CMD_UPDATE:
+                /*
+                 * Make the new tuple by combining plan's output tuple with
+                 * the old tuple being updated.
+                 */
+                oldSlot = resultRelInfo->ri_oldTupleSlot;
+                if (oldtuple != NULL)
                 {
-                    TupleTableSlot *oldSlot = resultRelInfo->ri_oldTupleSlot;
-
-                    /*
-                     * Make the new tuple by combining plan's output tuple
-                     * with the old tuple being updated.
-                     */
-                    ExecClearTuple(oldSlot);
-                    if (oldtuple != NULL)
-                    {
-                        /* Foreign table update, store the wholerow attr. */
-                        ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
-                    }
-                    else
-                    {
-                        /* Fetch the most recent version of old tuple. */
-                        Relation    relation = resultRelInfo->ri_RelationDesc;
-
-                        Assert(tupleid != NULL);
-                        if (!table_tuple_fetch_row_version(relation, tupleid,
-                                                           SnapshotAny,
-                                                           oldSlot))
-                            elog(ERROR, "failed to fetch tuple being updated");
-                    }
-                    slot = ExecGetUpdateNewTuple(resultRelInfo, planSlot,
-                                                 oldSlot);
-
-                    /* Now apply the update. */
-                    slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple,
-                                      slot, oldSlot, planSlot,
-                                      &node->mt_epqstate, estate,
-                                      node->canSetTag);
+                    /* Use the wholerow junk attr as the old tuple. */
+                    ExecForceStoreHeapTuple(oldtuple, oldSlot, false);
                 }
+                else
+                {
+                    /* Fetch the most recent version of old tuple. */
+                    Relation    relation = resultRelInfo->ri_RelationDesc;
+
+                    Assert(tupleid != NULL);
+                    if (!table_tuple_fetch_row_version(relation, tupleid,
+                                                       SnapshotAny,
+                                                       oldSlot))
+                        elog(ERROR, "failed to fetch tuple being updated");
+                }
+                slot = ExecGetUpdateNewTuple(resultRelInfo, planSlot,
+                                             oldSlot);
+
+                /* Now apply the update. */
+                slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot,
+                                  planSlot, &node->mt_epqstate, estate,
+                                  node->canSetTag);
                 break;
             case CMD_DELETE:
                 slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple,
@@ -2425,12 +2473,12 @@ ModifyTableState *
 ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 {
     ModifyTableState *mtstate;
+    Plan       *subplan = outerPlan(node);
     CmdType        operation = node->operation;
-    int            nplans = list_length(node->plans);
+    int            nrels = list_length(node->resultRelations);
     ResultRelInfo *resultRelInfo;
-    Plan       *subplan;
-    ListCell   *l,
-               *l1;
+    List       *arowmarks;
+    ListCell   *l;
     int            i;
     Relation    rel;
     bool        update_tuple_routing_needed = node->partColsUpdated;
@@ -2450,10 +2498,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
     mtstate->canSetTag = node->canSetTag;
     mtstate->mt_done = false;

-    mtstate->mt_plans = (PlanState **) palloc0(sizeof(PlanState *) * nplans);
+    mtstate->mt_nrels = nrels;
     mtstate->resultRelInfo = (ResultRelInfo *)
-        palloc(nplans * sizeof(ResultRelInfo));
-    mtstate->mt_scans = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
+        palloc(nrels * sizeof(ResultRelInfo));

     /*----------
      * Resolve the target relation. This is the same as:
@@ -2482,9 +2529,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
                                linitial_int(node->resultRelations));
     }

-    mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
-    mtstate->mt_nplans = nplans;
-
     /* set up epqstate with dummy subplan data for the moment */
     EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
     mtstate->fireBSTriggers = true;
@@ -2497,23 +2541,17 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
         ExecSetupTransitionCaptureState(mtstate, estate);

     /*
-     * call ExecInitNode on each of the plans to be executed and save the
-     * results into the array "mt_plans".  This is also a convenient place to
-     * verify that the proposed target relations are valid and open their
-     * indexes for insertion of new index entries.
+     * Open all the result relations and initialize the ResultRelInfo structs.
+     * (But root relation was initialized above, if it's part of the array.)
+     * We must do this before initializing the subplan, because direct-modify
+     * FDWs expect their ResultRelInfos to be available.
      */
     resultRelInfo = mtstate->resultRelInfo;
     i = 0;
-    forboth(l, node->resultRelations, l1, node->plans)
+    foreach(l, node->resultRelations)
     {
         Index        resultRelation = lfirst_int(l);

-        subplan = (Plan *) lfirst(l1);
-
-        /*
-         * This opens result relation and fills ResultRelInfo. (root relation
-         * was initialized already.)
-         */
         if (resultRelInfo != mtstate->rootResultRelInfo)
             ExecInitResultRelation(estate, resultRelInfo, resultRelation);

@@ -2526,6 +2564,22 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
          */
         CheckValidResultRel(resultRelInfo, operation);

+        resultRelInfo++;
+        i++;
+    }
+
+    /*
+     * Now we may initialize the subplan.
+     */
+    outerPlanState(mtstate) = ExecInitNode(subplan, estate, eflags);
+
+    /*
+     * Do additional per-result-relation initialization.
+     */
+    for (i = 0; i < nrels; i++)
+    {
+        resultRelInfo = &mtstate->resultRelInfo[i];
+
         /*
          * If there are indices on the result relation, open them and save
          * descriptors in the result relation info, so that we can add new
@@ -2551,12 +2605,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
             operation == CMD_UPDATE)
             update_tuple_routing_needed = true;

-        /* Now init the plan for this result rel */
-        mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
-        mtstate->mt_scans[i] =
-            ExecInitExtraTupleSlot(mtstate->ps.state, ExecGetResultType(mtstate->mt_plans[i]),
-                                   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
-
         /* Also let FDWs init themselves for foreign-table result rels */
         if (!resultRelInfo->ri_usesFdwDirectModify &&
             resultRelInfo->ri_FdwRoutine != NULL &&
@@ -2588,11 +2636,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
             resultRelInfo->ri_ChildToRootMap =
                 convert_tuples_by_name(RelationGetDescr(resultRelInfo->ri_RelationDesc),
                                        RelationGetDescr(mtstate->rootResultRelInfo->ri_RelationDesc));
-        resultRelInfo++;
-        i++;
     }

-    /* Get the target relation */
+    /* Get the root target relation */
     rel = mtstate->rootResultRelInfo->ri_RelationDesc;

     /*
@@ -2708,8 +2754,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
         TupleDesc    relationDesc;
         TupleDesc    tupDesc;

-        /* insert may only have one plan, inheritance is not expanded */
-        Assert(nplans == 1);
+        /* insert may only have one relation, inheritance is not expanded */
+        Assert(nrels == 1);

         /* already exists if created by RETURNING processing above */
         if (mtstate->ps.ps_ExprContext == NULL)
@@ -2761,34 +2807,24 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
      * EvalPlanQual mechanism needs to be told about them.  Locate the
      * relevant ExecRowMarks.
      */
+    arowmarks = NIL;
     foreach(l, node->rowMarks)
     {
         PlanRowMark *rc = lfirst_node(PlanRowMark, l);
         ExecRowMark *erm;
+        ExecAuxRowMark *aerm;

         /* ignore "parent" rowmarks; they are irrelevant at runtime */
         if (rc->isParent)
             continue;

-        /* find ExecRowMark (same for all subplans) */
+        /* Find ExecRowMark and build ExecAuxRowMark */
         erm = ExecFindRowMark(estate, rc->rti, false);
-
-        /* build ExecAuxRowMark for each subplan */
-        for (i = 0; i < nplans; i++)
-        {
-            ExecAuxRowMark *aerm;
-
-            subplan = mtstate->mt_plans[i]->plan;
-            aerm = ExecBuildAuxRowMark(erm, subplan->targetlist);
-            mtstate->mt_arowmarks[i] = lappend(mtstate->mt_arowmarks[i], aerm);
-        }
+        aerm = ExecBuildAuxRowMark(erm, subplan->targetlist);
+        arowmarks = lappend(arowmarks, aerm);
     }

-    /* select first subplan */
-    mtstate->mt_whichplan = 0;
-    subplan = (Plan *) linitial(node->plans);
-    EvalPlanQualSetPlan(&mtstate->mt_epqstate, subplan,
-                        mtstate->mt_arowmarks[0]);
+    EvalPlanQualSetPlan(&mtstate->mt_epqstate, subplan, arowmarks);

     /*
      * Initialize projection(s) to create tuples suitable for result rel(s).
@@ -2801,15 +2837,14 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
      *
      * If there are multiple result relations, each one needs its own
      * projection.  Note multiple rels are only possible for UPDATE/DELETE, so
-     * we can't be fooled by some needing a filter and some not.
+     * we can't be fooled by some needing a projection and some not.
      *
      * This section of code is also a convenient place to verify that the
      * output of an INSERT or UPDATE matches the target table(s).
      */
-    for (i = 0; i < nplans; i++)
+    for (i = 0; i < nrels; i++)
     {
         resultRelInfo = &mtstate->resultRelInfo[i];
-        subplan = mtstate->mt_plans[i]->plan;

         /*
          * Prepare to generate tuples suitable for the target relation.
@@ -2827,14 +2862,24 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
                 else
                     need_projection = true;
             }
+
+            /*
+             * The junk-free list must produce a tuple suitable for the result
+             * relation.
+             */
+            ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
+                                insertTargetList);
+
+            /* We'll need a slot matching the table's format. */
+            resultRelInfo->ri_newTupleSlot =
+                table_slot_create(resultRelInfo->ri_RelationDesc,
+                                  &mtstate->ps.state->es_tupleTable);
+
+            /* Build ProjectionInfo if needed (it probably isn't). */
             if (need_projection)
             {
                 TupleDesc    relDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);

-                resultRelInfo->ri_newTupleSlot =
-                    table_slot_create(resultRelInfo->ri_RelationDesc,
-                                      &mtstate->ps.state->es_tupleTable);
-
                 /* need an expression context to do the projection */
                 if (mtstate->ps.ps_ExprContext == NULL)
                     ExecAssignExprContext(estate, &mtstate->ps);
@@ -2846,13 +2891,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
                                             &mtstate->ps,
                                             relDesc);
             }
-
-            /*
-             * The junk-free list must produce a tuple suitable for the result
-             * relation.
-             */
-            ExecCheckPlanOutput(resultRelInfo->ri_RelationDesc,
-                                insertTargetList);
         }
         else if (operation == CMD_UPDATE)
         {
@@ -2863,7 +2901,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)

             /*
              * For UPDATE, we use the old tuple to fill up missing values in
-             * the tuple produced by the plan to get the new tuple.
+             * the tuple produced by the plan to get the new tuple.  We need
+             * two slots, both matching the table's desired format.
              */
             resultRelInfo->ri_oldTupleSlot =
                 table_slot_create(resultRelInfo->ri_RelationDesc,
@@ -2931,6 +2970,60 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
         }
     }

+    /*
+     * If this is an inherited update/delete, there will be a junk attribute
+     * named "tableoid" present in the subplan's targetlist.  It will be used
+     * to identify the result relation for a given tuple to be
+     * updated/deleted.
+     */
+    mtstate->mt_resultOidAttno =
+        ExecFindJunkAttributeInTlist(subplan->targetlist, "tableoid");
+    Assert(AttributeNumberIsValid(mtstate->mt_resultOidAttno) || nrels == 1);
+    mtstate->mt_lastResultOid = InvalidOid; /* force lookup at first tuple */
+    mtstate->mt_lastResultIndex = 0;    /* must be zero if no such attr */
+
+    /*
+     * If there are a lot of result relations, use a hash table to speed the
+     * lookups.  If there are not a lot, a simple linear search is faster.
+     *
+     * It's not clear where the threshold is, but try 64 for starters.  In a
+     * debugging build, use a small threshold so that we get some test
+     * coverage of both code paths.
+     */
+#ifdef USE_ASSERT_CHECKING
+#define MT_NRELS_HASH 4
+#else
+#define MT_NRELS_HASH 64
+#endif
+    if (nrels >= MT_NRELS_HASH)
+    {
+        HASHCTL        hash_ctl;
+
+        hash_ctl.keysize = sizeof(Oid);
+        hash_ctl.entrysize = sizeof(MTTargetRelLookup);
+        hash_ctl.hcxt = CurrentMemoryContext;
+        mtstate->mt_resultOidHash =
+            hash_create("ModifyTable target hash",
+                        nrels, &hash_ctl,
+                        HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+        for (i = 0; i < nrels; i++)
+        {
+            Oid            hashkey;
+            MTTargetRelLookup *mtlookup;
+            bool        found;
+
+            resultRelInfo = &mtstate->resultRelInfo[i];
+            hashkey = RelationGetRelid(resultRelInfo->ri_RelationDesc);
+            mtlookup = (MTTargetRelLookup *)
+                hash_search(mtstate->mt_resultOidHash, &hashkey,
+                            HASH_ENTER, &found);
+            Assert(!found);
+            mtlookup->relationIndex = i;
+        }
+    }
+    else
+        mtstate->mt_resultOidHash = NULL;
+
     /*
      * Determine if the FDW supports batch insert and determine the batch
      * size (a FDW may support batching, but it may be disabled for the
@@ -2942,7 +3035,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
     if (operation == CMD_INSERT)
     {
         resultRelInfo = mtstate->resultRelInfo;
-        for (i = 0; i < nplans; i++)
+        for (i = 0; i < nrels; i++)
         {
             if (!resultRelInfo->ri_usesFdwDirectModify &&
                 resultRelInfo->ri_FdwRoutine != NULL &&
@@ -2991,7 +3084,7 @@ ExecEndModifyTable(ModifyTableState *node)
     /*
      * Allow any FDWs to shut down
      */
-    for (i = 0; i < node->mt_nplans; i++)
+    for (i = 0; i < node->mt_nrels; i++)
     {
         ResultRelInfo *resultRelInfo = node->resultRelInfo + i;

@@ -3031,10 +3124,9 @@ ExecEndModifyTable(ModifyTableState *node)
     EvalPlanQualEnd(&node->mt_epqstate);

     /*
-     * shut down subplans
+     * shut down subplan
      */
-    for (i = 0; i < node->mt_nplans; i++)
-        ExecEndNode(node->mt_plans[i]);
+    ExecEndNode(outerPlanState(node));
 }

 void
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1ec586729b..b99babcf2a 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -206,7 +206,6 @@ _copyModifyTable(const ModifyTable *from)
     COPY_SCALAR_FIELD(rootRelation);
     COPY_SCALAR_FIELD(partColsUpdated);
     COPY_NODE_FIELD(resultRelations);
-    COPY_NODE_FIELD(plans);
     COPY_NODE_FIELD(updateColnosLists);
     COPY_NODE_FIELD(withCheckOptionLists);
     COPY_NODE_FIELD(returningLists);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 38226530c6..860e9a2a06 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -4002,12 +4002,6 @@ planstate_tree_walker(PlanState *planstate,
     /* special child plans */
     switch (nodeTag(plan))
     {
-        case T_ModifyTable:
-            if (planstate_walk_members(((ModifyTableState *) planstate)->mt_plans,
-                                       ((ModifyTableState *) planstate)->mt_nplans,
-                                       walker, context))
-                return true;
-            break;
         case T_Append:
             if (planstate_walk_members(((AppendState *) planstate)->appendplans,
                                        ((AppendState *) planstate)->as_nplans,
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 99fb38c05a..a18213ed87 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -407,7 +407,6 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
     WRITE_UINT_FIELD(rootRelation);
     WRITE_BOOL_FIELD(partColsUpdated);
     WRITE_NODE_FIELD(resultRelations);
-    WRITE_NODE_FIELD(plans);
     WRITE_NODE_FIELD(updateColnosLists);
     WRITE_NODE_FIELD(withCheckOptionLists);
     WRITE_NODE_FIELD(returningLists);
@@ -2136,14 +2135,13 @@ _outModifyTablePath(StringInfo str, const ModifyTablePath *node)

     _outPathInfo(str, (const Path *) node);

+    WRITE_NODE_FIELD(subpath);
     WRITE_ENUM_FIELD(operation, CmdType);
     WRITE_BOOL_FIELD(canSetTag);
     WRITE_UINT_FIELD(nominalRelation);
     WRITE_UINT_FIELD(rootRelation);
     WRITE_BOOL_FIELD(partColsUpdated);
     WRITE_NODE_FIELD(resultRelations);
-    WRITE_NODE_FIELD(subpaths);
-    WRITE_NODE_FIELD(subroots);
     WRITE_NODE_FIELD(updateColnosLists);
     WRITE_NODE_FIELD(withCheckOptionLists);
     WRITE_NODE_FIELD(returningLists);
@@ -2260,7 +2258,10 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
     WRITE_NODE_FIELD(right_join_clauses);
     WRITE_NODE_FIELD(full_join_clauses);
     WRITE_NODE_FIELD(join_info_list);
+    WRITE_BITMAPSET_FIELD(all_result_relids);
+    WRITE_BITMAPSET_FIELD(leaf_result_relids);
     WRITE_NODE_FIELD(append_rel_list);
+    WRITE_NODE_FIELD(row_identity_vars);
     WRITE_NODE_FIELD(rowMarks);
     WRITE_NODE_FIELD(placeholder_list);
     WRITE_NODE_FIELD(fkey_list);
@@ -2573,6 +2574,17 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
     WRITE_OID_FIELD(parent_reloid);
 }

+static void
+_outRowIdentityVarInfo(StringInfo str, const RowIdentityVarInfo *node)
+{
+    WRITE_NODE_TYPE("ROWIDENTITYVARINFO");
+
+    WRITE_NODE_FIELD(rowidvar);
+    WRITE_INT_FIELD(rowidwidth);
+    WRITE_STRING_FIELD(rowidname);
+    WRITE_BITMAPSET_FIELD(rowidrels);
+}
+
 static void
 _outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
 {
@@ -4222,6 +4234,9 @@ outNode(StringInfo str, const void *obj)
             case T_AppendRelInfo:
                 _outAppendRelInfo(str, obj);
                 break;
+            case T_RowIdentityVarInfo:
+                _outRowIdentityVarInfo(str, obj);
+                break;
             case T_PlaceHolderInfo:
                 _outPlaceHolderInfo(str, obj);
                 break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 0b6331d3da..f241cc268f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1682,7 +1682,6 @@ _readModifyTable(void)
     READ_UINT_FIELD(rootRelation);
     READ_BOOL_FIELD(partColsUpdated);
     READ_NODE_FIELD(resultRelations);
-    READ_NODE_FIELD(plans);
     READ_NODE_FIELD(updateColnosLists);
     READ_NODE_FIELD(withCheckOptionLists);
     READ_NODE_FIELD(returningLists);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d73ac562eb..078f6572c9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1148,7 +1148,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
             Var           *parentvar = (Var *) lfirst(parentvars);
             Node       *childvar = (Node *) lfirst(childvars);

-            if (IsA(parentvar, Var))
+            if (IsA(parentvar, Var) && parentvar->varno == parentRTindex)
             {
                 int            pndx = parentvar->varattno - rel->min_attr;
                 int32        child_width = 0;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index ff536e6b24..aa0bb2bcab 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3397,7 +3397,7 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
      * and pass them through to EvalPlanQual via a side channel; but for now,
      * we just don't remove implied quals at all for target relations.
      */
-    is_target_rel = (rel->relid == root->parse->resultRelation ||
+    is_target_rel = (bms_is_member(rel->relid, root->all_result_relids) ||
                      get_plan_rowmark(root->rowMarks, rel->relid) != NULL);

     /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 4bb482879f..69b4e5a703 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -297,11 +297,11 @@ static SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy, Plan *lefttree,
 static LockRows *make_lockrows(Plan *lefttree, List *rowMarks, int epqParam);
 static Result *make_result(List *tlist, Node *resconstantqual, Plan *subplan);
 static ProjectSet *make_project_set(List *tlist, Plan *subplan);
-static ModifyTable *make_modifytable(PlannerInfo *root,
+static ModifyTable *make_modifytable(PlannerInfo *root, Plan *subplan,
                                      CmdType operation, bool canSetTag,
                                      Index nominalRelation, Index rootRelation,
                                      bool partColsUpdated,
-                                     List *resultRelations, List *subplans, List *subroots,
+                                     List *resultRelations,
                                      List *updateColnosLists,
                                      List *withCheckOptionLists, List *returningLists,
                                      List *rowMarks, OnConflictExpr *onconflict, int epqParam);
@@ -2252,12 +2252,7 @@ create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path)
     /*
      * During setrefs.c, we'll need the grouping_map to fix up the cols lists
      * in GroupingFunc nodes.  Save it for setrefs.c to use.
-     *
-     * This doesn't work if we're in an inheritance subtree (see notes in
-     * create_modifytable_plan).  Fortunately we can't be because there would
-     * never be grouping in an UPDATE/DELETE; but let's Assert that.
      */
-    Assert(root->inhTargetKind == INHKIND_NONE);
     Assert(root->grouping_map == NULL);
     root->grouping_map = grouping_map;

@@ -2419,12 +2414,7 @@ create_minmaxagg_plan(PlannerInfo *root, MinMaxAggPath *best_path)
      * with InitPlan output params.  (We can't just do that locally in the
      * MinMaxAgg node, because path nodes above here may have Agg references
      * as well.)  Save the mmaggregates list to tell setrefs.c to do that.
-     *
-     * This doesn't work if we're in an inheritance subtree (see notes in
-     * create_modifytable_plan).  Fortunately we can't be because there would
-     * never be aggregates in an UPDATE/DELETE; but let's Assert that.
      */
-    Assert(root->inhTargetKind == INHKIND_NONE);
     Assert(root->minmax_aggs == NIL);
     root->minmax_aggs = best_path->mmaggregates;

@@ -2641,44 +2631,23 @@ static ModifyTable *
 create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)
 {
     ModifyTable *plan;
-    List       *subplans = NIL;
-    ListCell   *subpaths,
-               *subroots,
-               *lc;
-
-    /* Build the plan for each input path */
-    forboth(subpaths, best_path->subpaths,
-            subroots, best_path->subroots)
-    {
-        Path       *subpath = (Path *) lfirst(subpaths);
-        PlannerInfo *subroot = (PlannerInfo *) lfirst(subroots);
-        Plan       *subplan;
+    Path       *subpath = best_path->subpath;
+    Plan       *subplan;

-        /*
-         * In an inherited UPDATE/DELETE, reference the per-child modified
-         * subroot while creating Plans from Paths for the child rel.  This is
-         * a kluge, but otherwise it's too hard to ensure that Plan creation
-         * functions (particularly in FDWs) don't depend on the contents of
-         * "root" matching what they saw at Path creation time.  The main
-         * downside is that creation functions for Plans that might appear
-         * below a ModifyTable cannot expect to modify the contents of "root"
-         * and have it "stick" for subsequent processing such as setrefs.c.
-         * That's not great, but it seems better than the alternative.
-         */
-        subplan = create_plan_recurse(subroot, subpath, CP_EXACT_TLIST);
+    /* Subplan must produce exactly the specified tlist */
+    subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);

-        subplans = lappend(subplans, subplan);
-    }
+    /* Transfer resname/resjunk labeling, too, to keep executor happy */
+    apply_tlist_labeling(subplan->targetlist, root->processed_tlist);

     plan = make_modifytable(root,
+                            subplan,
                             best_path->operation,
                             best_path->canSetTag,
                             best_path->nominalRelation,
                             best_path->rootRelation,
                             best_path->partColsUpdated,
                             best_path->resultRelations,
-                            subplans,
-                            best_path->subroots,
                             best_path->updateColnosLists,
                             best_path->withCheckOptionLists,
                             best_path->returningLists,
@@ -2688,41 +2657,6 @@ create_modifytable_plan(PlannerInfo *root, ModifyTablePath *best_path)

     copy_generic_path_info(&plan->plan, &best_path->path);

-    forboth(lc, subplans,
-            subroots, best_path->subroots)
-    {
-        Plan       *subplan = (Plan *) lfirst(lc);
-        PlannerInfo *subroot = (PlannerInfo *) lfirst(subroots);
-
-        /*
-         * Fix up the resnos of query's TLEs to make them match their ordinal
-         * position in the list, which they may not in the case of an UPDATE.
-         * It's safe to revise that targetlist now, because nothing after this
-         * point needs those resnos to match target relation's attribute
-         * numbers.
-         * XXX - we do this simply because apply_tlist_labeling() asserts that
-         * resnos in processed_tlist and resnos in subplan targetlist are
-         * exactly same, but maybe we can just remove the assert?
-         */
-        if (plan->operation == CMD_UPDATE)
-        {
-            ListCell   *l;
-            AttrNumber    resno = 1;
-
-            foreach(l, subroot->processed_tlist)
-            {
-                TargetEntry *tle = lfirst(l);
-
-                tle = flatCopyTargetEntry(tle);
-                tle->resno = resno++;
-                lfirst(l) = tle;
-            }
-        }
-
-        /* Transfer resname/resjunk labeling, too, to keep executor happy */
-        apply_tlist_labeling(subplan->targetlist, subroot->processed_tlist);
-    }
-
     return plan;
 }

@@ -6910,11 +6844,11 @@ make_project_set(List *tlist,
  *      Build a ModifyTable plan node
  */
 static ModifyTable *
-make_modifytable(PlannerInfo *root,
+make_modifytable(PlannerInfo *root, Plan *subplan,
                  CmdType operation, bool canSetTag,
                  Index nominalRelation, Index rootRelation,
                  bool partColsUpdated,
-                 List *resultRelations, List *subplans, List *subroots,
+                 List *resultRelations,
                  List *updateColnosLists,
                  List *withCheckOptionLists, List *returningLists,
                  List *rowMarks, OnConflictExpr *onconflict, int epqParam)
@@ -6923,11 +6857,8 @@ make_modifytable(PlannerInfo *root,
     List       *fdw_private_list;
     Bitmapset  *direct_modify_plans;
     ListCell   *lc;
-    ListCell   *lc2;
     int            i;

-    Assert(list_length(resultRelations) == list_length(subplans));
-    Assert(list_length(resultRelations) == list_length(subroots));
     Assert(operation == CMD_UPDATE ?
            list_length(resultRelations) == list_length(updateColnosLists) :
            updateColnosLists == NIL);
@@ -6936,7 +6867,7 @@ make_modifytable(PlannerInfo *root,
     Assert(returningLists == NIL ||
            list_length(resultRelations) == list_length(returningLists));

-    node->plan.lefttree = NULL;
+    node->plan.lefttree = subplan;
     node->plan.righttree = NULL;
     node->plan.qual = NIL;
     /* setrefs.c will fill in the targetlist, if needed */
@@ -6948,7 +6879,6 @@ make_modifytable(PlannerInfo *root,
     node->rootRelation = rootRelation;
     node->partColsUpdated = partColsUpdated;
     node->resultRelations = resultRelations;
-    node->plans = subplans;
     if (!onconflict)
     {
         node->onConflictAction = ONCONFLICT_NONE;
@@ -6988,10 +6918,9 @@ make_modifytable(PlannerInfo *root,
     fdw_private_list = NIL;
     direct_modify_plans = NULL;
     i = 0;
-    forboth(lc, resultRelations, lc2, subroots)
+    foreach(lc, resultRelations)
     {
         Index        rti = lfirst_int(lc);
-        PlannerInfo *subroot = lfirst_node(PlannerInfo, lc2);
         FdwRoutine *fdwroutine;
         List       *fdw_private;
         bool        direct_modify;
@@ -7003,16 +6932,16 @@ make_modifytable(PlannerInfo *root,
          * so it's not a baserel; and there are also corner cases for
          * updatable views where the target rel isn't a baserel.)
          */
-        if (rti < subroot->simple_rel_array_size &&
-            subroot->simple_rel_array[rti] != NULL)
+        if (rti < root->simple_rel_array_size &&
+            root->simple_rel_array[rti] != NULL)
         {
-            RelOptInfo *resultRel = subroot->simple_rel_array[rti];
+            RelOptInfo *resultRel = root->simple_rel_array[rti];

             fdwroutine = resultRel->fdwroutine;
         }
         else
         {
-            RangeTblEntry *rte = planner_rt_fetch(rti, subroot);
+            RangeTblEntry *rte = planner_rt_fetch(rti, root);

             Assert(rte->rtekind == RTE_RELATION);
             if (rte->relkind == RELKIND_FOREIGN_TABLE)
@@ -7035,16 +6964,16 @@ make_modifytable(PlannerInfo *root,
             fdwroutine->IterateDirectModify != NULL &&
             fdwroutine->EndDirectModify != NULL &&
             withCheckOptionLists == NIL &&
-            !has_row_triggers(subroot, rti, operation) &&
-            !has_stored_generated_columns(subroot, rti))
-            direct_modify = fdwroutine->PlanDirectModify(subroot, node, rti, i);
+            !has_row_triggers(root, rti, operation) &&
+            !has_stored_generated_columns(root, rti))
+            direct_modify = fdwroutine->PlanDirectModify(root, node, rti, i);
         if (direct_modify)
             direct_modify_plans = bms_add_member(direct_modify_plans, i);

         if (!direct_modify &&
             fdwroutine != NULL &&
             fdwroutine->PlanForeignModify != NULL)
-            fdw_private = fdwroutine->PlanForeignModify(subroot, node, rti, i);
+            fdw_private = fdwroutine->PlanForeignModify(root, node, rti, i);
         else
             fdw_private = NIL;
         fdw_private_list = lappend(fdw_private_list, fdw_private);
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e1a13e20c5..273ac0acf7 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -263,6 +263,13 @@ query_planner(PlannerInfo *root,
      */
     add_other_rels_to_query(root);

+    /*
+     * Distribute any UPDATE/DELETE row identity variables to the target
+     * relations.  This can't be done till we've finished expansion of
+     * appendrels.
+     */
+    distribute_row_identity_vars(root);
+
     /*
      * Ready to do the primary planning.
      */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ccb9166a8e..eb5c3c5092 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -129,9 +129,7 @@ typedef struct
 /* Local functions */
 static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
 static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
-static void inheritance_planner(PlannerInfo *root);
-static void grouping_planner(PlannerInfo *root, bool inheritance_update,
-                             double tuple_fraction);
+static void grouping_planner(PlannerInfo *root, double tuple_fraction);
 static grouping_sets_data *preprocess_grouping_sets(PlannerInfo *root);
 static List *remap_to_groupclause_idx(List *groupClause, List *gsets,
                                       int *tleref_to_colnum_map);
@@ -615,7 +613,11 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
     root->multiexpr_params = NIL;
     root->eq_classes = NIL;
     root->ec_merging_done = false;
+    root->all_result_relids =
+        parse->resultRelation ? bms_make_singleton(parse->resultRelation) : NULL;
+    root->leaf_result_relids = NULL;    /* we'll find out leaf-ness later */
     root->append_rel_list = NIL;
+    root->row_identity_vars = NIL;
     root->rowMarks = NIL;
     memset(root->upper_rels, 0, sizeof(root->upper_rels));
     memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -624,7 +626,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
     root->grouping_map = NULL;
     root->minmax_aggs = NIL;
     root->qual_security_level = 0;
-    root->inhTargetKind = INHKIND_NONE;
     root->hasPseudoConstantQuals = false;
     root->hasAlternativeSubPlans = false;
     root->hasRecursion = hasRecursion;
@@ -744,6 +745,19 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
                                             list_length(rte->securityQuals));
     }

+    /*
+     * If we have now verified that the query target relation is
+     * non-inheriting, mark it as a leaf target.
+     */
+    if (parse->resultRelation)
+    {
+        RangeTblEntry *rte = rt_fetch(parse->resultRelation, parse->rtable);
+
+        if (!rte->inh)
+            root->leaf_result_relids =
+                bms_make_singleton(parse->resultRelation);
+    }
+
     /*
      * Preprocess RowMark information.  We need to do this after subquery
      * pullup, so that all base relations are present.
@@ -1000,14 +1014,9 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
         remove_useless_result_rtes(root);

     /*
-     * Do the main planning.  If we have an inherited target relation, that
-     * needs special processing, else go straight to grouping_planner.
+     * Do the main planning.
      */
-    if (parse->resultRelation &&
-        rt_fetch(parse->resultRelation, parse->rtable)->inh)
-        inheritance_planner(root);
-    else
-        grouping_planner(root, false, tuple_fraction);
+    grouping_planner(root, tuple_fraction);

     /*
      * Capture the set of outer-level param IDs we have access to, for use in
@@ -1181,631 +1190,6 @@ preprocess_phv_expression(PlannerInfo *root, Expr *expr)
     return (Expr *) preprocess_expression(root, (Node *) expr, EXPRKIND_PHV);
 }

-/*
- * inheritance_planner
- *      Generate Paths in the case where the result relation is an
- *      inheritance set.
- *
- * We have to handle this case differently from cases where a source relation
- * is an inheritance set. Source inheritance is expanded at the bottom of the
- * plan tree (see allpaths.c), but target inheritance has to be expanded at
- * the top.  The reason is that for UPDATE, each target relation needs a
- * different targetlist matching its own column set.  Fortunately,
- * the UPDATE/DELETE target can never be the nullable side of an outer join,
- * so it's OK to generate the plan this way.
- *
- * Returns nothing; the useful output is in the Paths we attach to
- * the (UPPERREL_FINAL, NULL) upperrel stored in *root.
- *
- * Note that we have not done set_cheapest() on the final rel; it's convenient
- * to leave this to the caller.
- */
-static void
-inheritance_planner(PlannerInfo *root)
-{
-    Query       *parse = root->parse;
-    int            top_parentRTindex = parse->resultRelation;
-    List       *select_rtable;
-    List       *select_appinfos;
-    List       *child_appinfos;
-    List       *old_child_rtis;
-    List       *new_child_rtis;
-    Bitmapset  *subqueryRTindexes;
-    Index        next_subquery_rti;
-    int            nominalRelation = -1;
-    Index        rootRelation = 0;
-    List       *final_rtable = NIL;
-    List       *final_rowmarks = NIL;
-    List       *final_appendrels = NIL;
-    int            save_rel_array_size = 0;
-    RelOptInfo **save_rel_array = NULL;
-    AppendRelInfo **save_append_rel_array = NULL;
-    List       *subpaths = NIL;
-    List       *subroots = NIL;
-    List       *resultRelations = NIL;
-    List       *updateColnosLists = NIL;
-    List       *withCheckOptionLists = NIL;
-    List       *returningLists = NIL;
-    List       *rowMarks;
-    RelOptInfo *final_rel;
-    ListCell   *lc;
-    ListCell   *lc2;
-    Index        rti;
-    RangeTblEntry *parent_rte;
-    Bitmapset  *parent_relids;
-    Query      **parent_parses;
-
-    /* Should only get here for UPDATE or DELETE */
-    Assert(parse->commandType == CMD_UPDATE ||
-           parse->commandType == CMD_DELETE);
-
-    /*
-     * We generate a modified instance of the original Query for each target
-     * relation, plan that, and put all the plans into a list that will be
-     * controlled by a single ModifyTable node.  All the instances share the
-     * same rangetable, but each instance must have its own set of subquery
-     * RTEs within the finished rangetable because (1) they are likely to get
-     * scribbled on during planning, and (2) it's not inconceivable that
-     * subqueries could get planned differently in different cases.  We need
-     * not create duplicate copies of other RTE kinds, in particular not the
-     * target relations, because they don't have either of those issues.  Not
-     * having to duplicate the target relations is important because doing so
-     * (1) would result in a rangetable of length O(N^2) for N targets, with
-     * at least O(N^3) work expended here; and (2) would greatly complicate
-     * management of the rowMarks list.
-     *
-     * To begin with, generate a bitmapset of the relids of the subquery RTEs.
-     */
-    subqueryRTindexes = NULL;
-    rti = 1;
-    foreach(lc, parse->rtable)
-    {
-        RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-        if (rte->rtekind == RTE_SUBQUERY)
-            subqueryRTindexes = bms_add_member(subqueryRTindexes, rti);
-        rti++;
-    }
-
-    /*
-     * If the parent RTE is a partitioned table, we should use that as the
-     * nominal target relation, because the RTEs added for partitioned tables
-     * (including the root parent) as child members of the inheritance set do
-     * not appear anywhere else in the plan, so the confusion explained below
-     * for non-partitioning inheritance cases is not possible.
-     */
-    parent_rte = rt_fetch(top_parentRTindex, parse->rtable);
-    Assert(parent_rte->inh);
-    if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
-    {
-        nominalRelation = top_parentRTindex;
-        rootRelation = top_parentRTindex;
-    }
-
-    /*
-     * Before generating the real per-child-relation plans, do a cycle of
-     * planning as though the query were a SELECT.  The objective here is to
-     * find out which child relations need to be processed, using the same
-     * expansion and pruning logic as for a SELECT.  We'll then pull out the
-     * RangeTblEntry-s generated for the child rels, and make use of the
-     * AppendRelInfo entries for them to guide the real planning.  (This is
-     * rather inefficient; we could perhaps stop short of making a full Path
-     * tree.  But this whole function is inefficient and slated for
-     * destruction, so let's not contort query_planner for that.)
-     */
-    {
-        PlannerInfo *subroot;
-
-        /*
-         * Flat-copy the PlannerInfo to prevent modification of the original.
-         */
-        subroot = makeNode(PlannerInfo);
-        memcpy(subroot, root, sizeof(PlannerInfo));
-
-        /*
-         * Make a deep copy of the parsetree for this planning cycle to mess
-         * around with, and change it to look like a SELECT.  (Hack alert: the
-         * target RTE still has updatedCols set if this is an UPDATE, so that
-         * expand_partitioned_rtentry will correctly update
-         * subroot->partColsUpdated.)
-         */
-        subroot->parse = copyObject(root->parse);
-
-        subroot->parse->commandType = CMD_SELECT;
-        subroot->parse->resultRelation = 0;
-
-        /*
-         * Ensure the subroot has its own copy of the original
-         * append_rel_list, since it'll be scribbled on.  (Note that at this
-         * point, the list only contains AppendRelInfos for flattened UNION
-         * ALL subqueries.)
-         */
-        subroot->append_rel_list = copyObject(root->append_rel_list);
-
-        /*
-         * Better make a private copy of the rowMarks, too.
-         */
-        subroot->rowMarks = copyObject(root->rowMarks);
-
-        /* There shouldn't be any OJ info to translate, as yet */
-        Assert(subroot->join_info_list == NIL);
-        /* and we haven't created PlaceHolderInfos, either */
-        Assert(subroot->placeholder_list == NIL);
-
-        /* Generate Path(s) for accessing this result relation */
-        grouping_planner(subroot, true, 0.0 /* retrieve all tuples */ );
-
-        /* Extract the info we need. */
-        select_rtable = subroot->parse->rtable;
-        select_appinfos = subroot->append_rel_list;
-
-        /*
-         * We need to propagate partColsUpdated back, too.  (The later
-         * planning cycles will not set this because they won't run
-         * expand_partitioned_rtentry for the UPDATE target.)
-         */
-        root->partColsUpdated = subroot->partColsUpdated;
-    }
-
-    /*----------
-     * Since only one rangetable can exist in the final plan, we need to make
-     * sure that it contains all the RTEs needed for any child plan.  This is
-     * complicated by the need to use separate subquery RTEs for each child.
-     * We arrange the final rtable as follows:
-     * 1. All original rtable entries (with their original RT indexes).
-     * 2. All the relation RTEs generated for children of the target table.
-     * 3. Subquery RTEs for children after the first.  We need N * (K - 1)
-     *    RT slots for this, if there are N subqueries and K child tables.
-     * 4. Additional RTEs generated during the child planning runs, such as
-     *    children of inheritable RTEs other than the target table.
-     * We assume that each child planning run will create an identical set
-     * of type-4 RTEs.
-     *
-     * So the next thing to do is append the type-2 RTEs (the target table's
-     * children) to the original rtable.  We look through select_appinfos
-     * to find them.
-     *
-     * To identify which AppendRelInfos are relevant as we thumb through
-     * select_appinfos, we need to look for both direct and indirect children
-     * of top_parentRTindex, so we use a bitmap of known parent relids.
-     * expand_inherited_rtentry() always processes a parent before any of that
-     * parent's children, so we should see an intermediate parent before its
-     * children.
-     *----------
-     */
-    child_appinfos = NIL;
-    old_child_rtis = NIL;
-    new_child_rtis = NIL;
-    parent_relids = bms_make_singleton(top_parentRTindex);
-    foreach(lc, select_appinfos)
-    {
-        AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
-        RangeTblEntry *child_rte;
-
-        /* append_rel_list contains all append rels; ignore others */
-        if (!bms_is_member(appinfo->parent_relid, parent_relids))
-            continue;
-
-        /* remember relevant AppendRelInfos for use below */
-        child_appinfos = lappend(child_appinfos, appinfo);
-
-        /* extract RTE for this child rel */
-        child_rte = rt_fetch(appinfo->child_relid, select_rtable);
-
-        /* and append it to the original rtable */
-        parse->rtable = lappend(parse->rtable, child_rte);
-
-        /* remember child's index in the SELECT rtable */
-        old_child_rtis = lappend_int(old_child_rtis, appinfo->child_relid);
-
-        /* and its new index in the final rtable */
-        new_child_rtis = lappend_int(new_child_rtis, list_length(parse->rtable));
-
-        /* if child is itself partitioned, update parent_relids */
-        if (child_rte->inh)
-        {
-            Assert(child_rte->relkind == RELKIND_PARTITIONED_TABLE);
-            parent_relids = bms_add_member(parent_relids, appinfo->child_relid);
-        }
-    }
-
-    /*
-     * It's possible that the RTIs we just assigned for the child rels in the
-     * final rtable are different from what they were in the SELECT query.
-     * Adjust the AppendRelInfos so that they will correctly map RT indexes to
-     * the final indexes.  We can do this left-to-right since no child rel's
-     * final RT index could be greater than what it had in the SELECT query.
-     */
-    forboth(lc, old_child_rtis, lc2, new_child_rtis)
-    {
-        int            old_child_rti = lfirst_int(lc);
-        int            new_child_rti = lfirst_int(lc2);
-
-        if (old_child_rti == new_child_rti)
-            continue;            /* nothing to do */
-
-        Assert(old_child_rti > new_child_rti);
-
-        ChangeVarNodes((Node *) child_appinfos,
-                       old_child_rti, new_child_rti, 0);
-    }
-
-    /*
-     * Now set up rangetable entries for subqueries for additional children
-     * (the first child will just use the original ones).  These all have to
-     * look more or less real, or EXPLAIN will get unhappy; so we just make
-     * them all clones of the original subqueries.
-     */
-    next_subquery_rti = list_length(parse->rtable) + 1;
-    if (subqueryRTindexes != NULL)
-    {
-        int            n_children = list_length(child_appinfos);
-
-        while (n_children-- > 1)
-        {
-            int            oldrti = -1;
-
-            while ((oldrti = bms_next_member(subqueryRTindexes, oldrti)) >= 0)
-            {
-                RangeTblEntry *subqrte;
-
-                subqrte = rt_fetch(oldrti, parse->rtable);
-                parse->rtable = lappend(parse->rtable, copyObject(subqrte));
-            }
-        }
-    }
-
-    /*
-     * The query for each child is obtained by translating the query for its
-     * immediate parent, since the AppendRelInfo data we have shows deltas
-     * between parents and children.  We use the parent_parses array to
-     * remember the appropriate query trees.  This is indexed by parent relid.
-     * Since the maximum number of parents is limited by the number of RTEs in
-     * the SELECT query, we use that number to allocate the array.  An extra
-     * entry is needed since relids start from 1.
-     */
-    parent_parses = (Query **) palloc0((list_length(select_rtable) + 1) *
-                                       sizeof(Query *));
-    parent_parses[top_parentRTindex] = parse;
-
-    /*
-     * And now we can get on with generating a plan for each child table.
-     */
-    foreach(lc, child_appinfos)
-    {
-        AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
-        Index        this_subquery_rti = next_subquery_rti;
-        Query       *parent_parse;
-        PlannerInfo *subroot;
-        RangeTblEntry *child_rte;
-        RelOptInfo *sub_final_rel;
-        Path       *subpath;
-
-        /*
-         * expand_inherited_rtentry() always processes a parent before any of
-         * that parent's children, so the parent query for this relation
-         * should already be available.
-         */
-        parent_parse = parent_parses[appinfo->parent_relid];
-        Assert(parent_parse != NULL);
-
-        /*
-         * We need a working copy of the PlannerInfo so that we can control
-         * propagation of information back to the main copy.
-         */
-        subroot = makeNode(PlannerInfo);
-        memcpy(subroot, root, sizeof(PlannerInfo));
-
-        /*
-         * Generate modified query with this rel as target.  We first apply
-         * adjust_appendrel_attrs, which copies the Query and changes
-         * references to the parent RTE to refer to the current child RTE,
-         * then fool around with subquery RTEs.
-         */
-        subroot->parse = (Query *)
-            adjust_appendrel_attrs(subroot,
-                                   (Node *) parent_parse,
-                                   1, &appinfo);
-
-        /*
-         * If there are securityQuals attached to the parent, move them to the
-         * child rel (they've already been transformed properly for that).
-         */
-        parent_rte = rt_fetch(appinfo->parent_relid, subroot->parse->rtable);
-        child_rte = rt_fetch(appinfo->child_relid, subroot->parse->rtable);
-        child_rte->securityQuals = parent_rte->securityQuals;
-        parent_rte->securityQuals = NIL;
-
-        /*
-         * HACK: setting this to a value other than INHKIND_NONE signals to
-         * relation_excluded_by_constraints() to treat the result relation as
-         * being an appendrel member.
-         */
-        subroot->inhTargetKind =
-            (rootRelation != 0) ? INHKIND_PARTITIONED : INHKIND_INHERITED;
-
-        /*
-         * If this child is further partitioned, remember it as a parent.
-         * Since a partitioned table does not have any data, we don't need to
-         * create a plan for it, and we can stop processing it here.  We do,
-         * however, need to remember its modified PlannerInfo for use when
-         * processing its children, since we'll update their varnos based on
-         * the delta from immediate parent to child, not from top to child.
-         *
-         * Note: a very non-obvious point is that we have not yet added
-         * duplicate subquery RTEs to the subroot's rtable.  We mustn't,
-         * because then its children would have two sets of duplicates,
-         * confusing matters.
-         */
-        if (child_rte->inh)
-        {
-            Assert(child_rte->relkind == RELKIND_PARTITIONED_TABLE);
-            parent_parses[appinfo->child_relid] = subroot->parse;
-            continue;
-        }
-
-        /*
-         * Set the nominal target relation of the ModifyTable node if not
-         * already done.  If the target is a partitioned table, we already set
-         * nominalRelation to refer to the partition root, above.  For
-         * non-partitioned inheritance cases, we'll use the first child
-         * relation (even if it's excluded) as the nominal target relation.
-         * Because of the way expand_inherited_rtentry works, that should be
-         * the RTE representing the parent table in its role as a simple
-         * member of the inheritance set.
-         *
-         * It would be logically cleaner to *always* use the inheritance
-         * parent RTE as the nominal relation; but that RTE is not otherwise
-         * referenced in the plan in the non-partitioned inheritance case.
-         * Instead the duplicate child RTE created by expand_inherited_rtentry
-         * is used elsewhere in the plan, so using the original parent RTE
-         * would give rise to confusing use of multiple aliases in EXPLAIN
-         * output for what the user will think is the "same" table.  OTOH,
-         * it's not a problem in the partitioned inheritance case, because
-         * there is no duplicate RTE for the parent.
-         */
-        if (nominalRelation < 0)
-            nominalRelation = appinfo->child_relid;
-
-        /*
-         * As above, each child plan run needs its own append_rel_list and
-         * rowmarks, which should start out as pristine copies of the
-         * originals.  There can't be any references to UPDATE/DELETE target
-         * rels in them; but there could be subquery references, which we'll
-         * fix up in a moment.
-         */
-        subroot->append_rel_list = copyObject(root->append_rel_list);
-        subroot->rowMarks = copyObject(root->rowMarks);
-
-        /*
-         * If this isn't the first child Query, adjust Vars and jointree
-         * entries to reference the appropriate set of subquery RTEs.
-         */
-        if (final_rtable != NIL && subqueryRTindexes != NULL)
-        {
-            int            oldrti = -1;
-
-            while ((oldrti = bms_next_member(subqueryRTindexes, oldrti)) >= 0)
-            {
-                Index        newrti = next_subquery_rti++;
-
-                ChangeVarNodes((Node *) subroot->parse, oldrti, newrti, 0);
-                ChangeVarNodes((Node *) subroot->append_rel_list,
-                               oldrti, newrti, 0);
-                ChangeVarNodes((Node *) subroot->rowMarks, oldrti, newrti, 0);
-            }
-        }
-
-        /* There shouldn't be any OJ info to translate, as yet */
-        Assert(subroot->join_info_list == NIL);
-        /* and we haven't created PlaceHolderInfos, either */
-        Assert(subroot->placeholder_list == NIL);
-
-        /* Generate Path(s) for accessing this result relation */
-        grouping_planner(subroot, true, 0.0 /* retrieve all tuples */ );
-
-        /*
-         * Select cheapest path in case there's more than one.  We always run
-         * modification queries to conclusion, so we care only for the
-         * cheapest-total path.
-         */
-        sub_final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
-        set_cheapest(sub_final_rel);
-        subpath = sub_final_rel->cheapest_total_path;
-
-        /*
-         * If this child rel was excluded by constraint exclusion, exclude it
-         * from the result plan.
-         */
-        if (IS_DUMMY_REL(sub_final_rel))
-            continue;
-
-        /*
-         * If this is the first non-excluded child, its post-planning rtable
-         * becomes the initial contents of final_rtable; otherwise, copy its
-         * modified subquery RTEs into final_rtable, to ensure we have sane
-         * copies of those.  Also save the first non-excluded child's version
-         * of the rowmarks list; we assume all children will end up with
-         * equivalent versions of that.  Likewise for append_rel_list.
-         */
-        if (final_rtable == NIL)
-        {
-            final_rtable = subroot->parse->rtable;
-            final_rowmarks = subroot->rowMarks;
-            final_appendrels = subroot->append_rel_list;
-        }
-        else
-        {
-            Assert(list_length(final_rtable) ==
-                   list_length(subroot->parse->rtable));
-            if (subqueryRTindexes != NULL)
-            {
-                int            oldrti = -1;
-
-                while ((oldrti = bms_next_member(subqueryRTindexes, oldrti)) >= 0)
-                {
-                    Index        newrti = this_subquery_rti++;
-                    RangeTblEntry *subqrte;
-                    ListCell   *newrticell;
-
-                    subqrte = rt_fetch(newrti, subroot->parse->rtable);
-                    newrticell = list_nth_cell(final_rtable, newrti - 1);
-                    lfirst(newrticell) = subqrte;
-                }
-            }
-        }
-
-        /*
-         * We need to collect all the RelOptInfos from all child plans into
-         * the main PlannerInfo, since setrefs.c will need them.  We use the
-         * last child's simple_rel_array, so we have to propagate forward the
-         * RelOptInfos that were already built in previous children.
-         */
-        Assert(subroot->simple_rel_array_size >= save_rel_array_size);
-        for (rti = 1; rti < save_rel_array_size; rti++)
-        {
-            RelOptInfo *brel = save_rel_array[rti];
-
-            if (brel)
-                subroot->simple_rel_array[rti] = brel;
-        }
-        save_rel_array_size = subroot->simple_rel_array_size;
-        save_rel_array = subroot->simple_rel_array;
-        save_append_rel_array = subroot->append_rel_array;
-
-        /*
-         * Make sure any initplans from this rel get into the outer list. Note
-         * we're effectively assuming all children generate the same
-         * init_plans.
-         */
-        root->init_plans = subroot->init_plans;
-
-        /* Build list of sub-paths */
-        subpaths = lappend(subpaths, subpath);
-
-        /* Build list of modified subroots, too */
-        subroots = lappend(subroots, subroot);
-
-        /* Build list of target-relation RT indexes */
-        resultRelations = lappend_int(resultRelations, appinfo->child_relid);
-
-        /* Accumulate lists of UPDATE target columns */
-        if (parse->commandType == CMD_UPDATE)
-            updateColnosLists = lappend(updateColnosLists,
-                                        subroot->update_colnos);
-
-        /* Build lists of per-relation WCO and RETURNING targetlists */
-        if (parse->withCheckOptions)
-            withCheckOptionLists = lappend(withCheckOptionLists,
-                                           subroot->parse->withCheckOptions);
-        if (parse->returningList)
-            returningLists = lappend(returningLists,
-                                     subroot->parse->returningList);
-
-        Assert(!parse->onConflict);
-    }
-
-    /* Result path must go into outer query's FINAL upperrel */
-    final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
-
-    /*
-     * We don't currently worry about setting final_rel's consider_parallel
-     * flag in this case, nor about allowing FDWs or create_upper_paths_hook
-     * to get control here.
-     */
-
-    if (subpaths == NIL)
-    {
-        /*
-         * We managed to exclude every child rel, so generate a dummy path
-         * representing the empty set.  Although it's clear that no data will
-         * be updated or deleted, we will still need to have a ModifyTable
-         * node so that any statement triggers are executed.  (This could be
-         * cleaner if we fixed nodeModifyTable.c to support zero child nodes,
-         * but that probably wouldn't be a net win.)
-         */
-        Path       *dummy_path;
-
-        /* tlist processing never got done, either */
-        root->processed_tlist = preprocess_targetlist(root);
-        final_rel->reltarget = create_pathtarget(root, root->processed_tlist);
-
-        /* Make a dummy path, cf set_dummy_rel_pathlist() */
-        dummy_path = (Path *) create_append_path(NULL, final_rel, NIL, NIL,
-                                                 NIL, NULL, 0, false,
-                                                 -1);
-
-        /* These lists must be nonempty to make a valid ModifyTable node */
-        subpaths = list_make1(dummy_path);
-        subroots = list_make1(root);
-        resultRelations = list_make1_int(parse->resultRelation);
-        if (parse->commandType == CMD_UPDATE)
-            updateColnosLists = lappend(updateColnosLists,
-                                        root->update_colnos);
-        if (parse->withCheckOptions)
-            withCheckOptionLists = list_make1(parse->withCheckOptions);
-        if (parse->returningList)
-            returningLists = list_make1(parse->returningList);
-        /* Disable tuple routing, too, just to be safe */
-        root->partColsUpdated = false;
-    }
-    else
-    {
-        /*
-         * Put back the final adjusted rtable into the original copy of the
-         * Query.  (We mustn't do this if we found no non-excluded children,
-         * since we never saved an adjusted rtable at all.)
-         */
-        parse->rtable = final_rtable;
-        root->simple_rel_array_size = save_rel_array_size;
-        root->simple_rel_array = save_rel_array;
-        root->append_rel_array = save_append_rel_array;
-
-        /* Must reconstruct original's simple_rte_array, too */
-        root->simple_rte_array = (RangeTblEntry **)
-            palloc0((list_length(final_rtable) + 1) * sizeof(RangeTblEntry *));
-        rti = 1;
-        foreach(lc, final_rtable)
-        {
-            RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
-
-            root->simple_rte_array[rti++] = rte;
-        }
-
-        /* Put back adjusted rowmarks and appendrels, too */
-        root->rowMarks = final_rowmarks;
-        root->append_rel_list = final_appendrels;
-    }
-
-    /*
-     * If there was a FOR [KEY] UPDATE/SHARE clause, the LockRows node will
-     * have dealt with fetching non-locked marked rows, else we need to have
-     * ModifyTable do that.
-     */
-    if (parse->rowMarks)
-        rowMarks = NIL;
-    else
-        rowMarks = root->rowMarks;
-
-    /* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
-    add_path(final_rel, (Path *)
-             create_modifytable_path(root, final_rel,
-                                     parse->commandType,
-                                     parse->canSetTag,
-                                     nominalRelation,
-                                     rootRelation,
-                                     root->partColsUpdated,
-                                     resultRelations,
-                                     subpaths,
-                                     subroots,
-                                     updateColnosLists,
-                                     withCheckOptionLists,
-                                     returningLists,
-                                     rowMarks,
-                                     NULL,
-                                     assign_special_exec_param(root)));
-}
-
 /*--------------------
  * grouping_planner
  *      Perform planning steps related to grouping, aggregation, etc.
@@ -1813,11 +1197,6 @@ inheritance_planner(PlannerInfo *root)
  * This function adds all required top-level processing to the scan/join
  * Path(s) produced by query_planner.
  *
- * If inheritance_update is true, we're being called from inheritance_planner
- * and should not include a ModifyTable step in the resulting Path(s).
- * (inheritance_planner will create a single ModifyTable node covering all the
- * target tables.)
- *
  * tuple_fraction is the fraction of tuples we expect will be retrieved.
  * tuple_fraction is interpreted as follows:
  *      0: expect all tuples to be retrieved (normal case)
@@ -1835,8 +1214,7 @@ inheritance_planner(PlannerInfo *root)
  *--------------------
  */
 static void
-grouping_planner(PlannerInfo *root, bool inheritance_update,
-                 double tuple_fraction)
+grouping_planner(PlannerInfo *root, double tuple_fraction)
 {
     Query       *parse = root->parse;
     int64        offset_est = 0;
@@ -1980,7 +1358,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
          * that we can transfer its decoration (resnames etc) to the topmost
          * tlist of the finished Plan.  This is kept in processed_tlist.
          */
-        root->processed_tlist = preprocess_targetlist(root);
+        preprocess_targetlist(root);

         /*
          * Mark all the aggregates with resolved aggtranstypes, and detect
@@ -2318,17 +1696,117 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
         }

         /*
-         * If this is an INSERT/UPDATE/DELETE, and we're not being called from
-         * inheritance_planner, add the ModifyTable node.
+         * If this is an INSERT/UPDATE/DELETE, add the ModifyTable node.
          */
-        if (parse->commandType != CMD_SELECT && !inheritance_update)
+        if (parse->commandType != CMD_SELECT)
         {
             Index        rootRelation;
-            List *updateColnosLists;
-            List       *withCheckOptionLists;
-            List       *returningLists;
+            List       *resultRelations = NIL;
+            List       *updateColnosLists = NIL;
+            List       *withCheckOptionLists = NIL;
+            List       *returningLists = NIL;
             List       *rowMarks;

+            if (bms_membership(root->all_result_relids) == BMS_MULTIPLE)
+            {
+                /* Inherited UPDATE/DELETE */
+                RelOptInfo *top_result_rel = find_base_rel(root,
+                                                           parse->resultRelation);
+                int            resultRelation = -1;
+
+                /* Add only leaf children to ModifyTable. */
+                while ((resultRelation = bms_next_member(root->leaf_result_relids,
+                                                         resultRelation)) >= 0)
+                {
+                    RelOptInfo *this_result_rel = find_base_rel(root,
+                                                                resultRelation);
+
+                    /*
+                     * Also exclude any leaf rels that have turned dummy since
+                     * being added to the list, for example, by being excluded
+                     * by constraint exclusion.
+                     */
+                    if (IS_DUMMY_REL(this_result_rel))
+                        continue;
+
+                    /* Build per-target-rel lists needed by ModifyTable */
+                    resultRelations = lappend_int(resultRelations,
+                                                  resultRelation);
+                    if (parse->commandType == CMD_UPDATE)
+                    {
+                        List       *update_colnos = root->update_colnos;
+
+                        if (this_result_rel != top_result_rel)
+                            update_colnos =
+                                adjust_inherited_attnums_multilevel(root,
+                                                                    update_colnos,
+                                                                    this_result_rel->relid,
+                                                                    top_result_rel->relid);
+                        updateColnosLists = lappend(updateColnosLists,
+                                                    update_colnos);
+                    }
+                    if (parse->withCheckOptions)
+                    {
+                        List       *withCheckOptions = parse->withCheckOptions;
+
+                        if (this_result_rel != top_result_rel)
+                            withCheckOptions = (List *)
+                                adjust_appendrel_attrs_multilevel(root,
+                                                                  (Node *) withCheckOptions,
+                                                                  this_result_rel->relids,
+                                                                  top_result_rel->relids);
+                        withCheckOptionLists = lappend(withCheckOptionLists,
+                                                       withCheckOptions);
+                    }
+                    if (parse->returningList)
+                    {
+                        List       *returningList = parse->returningList;
+
+                        if (this_result_rel != top_result_rel)
+                            returningList = (List *)
+                                adjust_appendrel_attrs_multilevel(root,
+                                                                  (Node *) returningList,
+                                                                  this_result_rel->relids,
+                                                                  top_result_rel->relids);
+                        returningLists = lappend(returningLists,
+                                                 returningList);
+                    }
+                }
+
+                if (resultRelations == NIL)
+                {
+                    /*
+                     * We managed to exclude every child rel, so generate a
+                     * dummy one-relation plan using info for the top target
+                     * rel (even though that may not be a leaf target).
+                     * Although it's clear that no data will be updated or
+                     * deleted, we still need to have a ModifyTable node so
+                     * that any statement triggers will be executed.  (This
+                     * could be cleaner if we fixed nodeModifyTable.c to allow
+                     * zero target relations, but that probably wouldn't be a
+                     * net win.)
+                     */
+                    resultRelations = list_make1_int(parse->resultRelation);
+                    if (parse->commandType == CMD_UPDATE)
+                        updateColnosLists = list_make1(root->update_colnos);
+                    if (parse->withCheckOptions)
+                        withCheckOptionLists = list_make1(parse->withCheckOptions);
+                    if (parse->returningList)
+                        returningLists = list_make1(parse->returningList);
+                }
+            }
+            else
+            {
+                /* Single-relation INSERT/UPDATE/DELETE. */
+                resultRelations = list_make1_int(parse->resultRelation);
+                if (parse->commandType == CMD_UPDATE)
+                    updateColnosLists = list_make1(root->update_colnos);
+                if (parse->withCheckOptions)
+                    withCheckOptionLists = list_make1(parse->withCheckOptions);
+                if (parse->returningList)
+                    returningLists = list_make1(parse->returningList);
+            }
+
             /*
              * If target is a partition root table, we need to mark the
              * ModifyTable node appropriately for that.
@@ -2339,26 +1817,6 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
             else
                 rootRelation = 0;

-            /* Set up the UPDATE target columns list-of-lists, if needed. */
-            if (parse->commandType == CMD_UPDATE)
-                updateColnosLists = list_make1(root->update_colnos);
-            else
-                updateColnosLists = NIL;
-
-            /*
-             * Set up the WITH CHECK OPTION and RETURNING lists-of-lists, if
-             * needed.
-             */
-            if (parse->withCheckOptions)
-                withCheckOptionLists = list_make1(parse->withCheckOptions);
-            else
-                withCheckOptionLists = NIL;
-
-            if (parse->returningList)
-                returningLists = list_make1(parse->returningList);
-            else
-                returningLists = NIL;
-
             /*
              * If there was a FOR [KEY] UPDATE/SHARE clause, the LockRows node
              * will have dealt with fetching non-locked marked rows, else we
@@ -2371,14 +1829,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,

             path = (Path *)
                 create_modifytable_path(root, final_rel,
+                                        path,
                                         parse->commandType,
                                         parse->canSetTag,
                                         parse->resultRelation,
                                         rootRelation,
-                                        false,
-                                        list_make1_int(parse->resultRelation),
-                                        list_make1(path),
-                                        list_make1(root),
+                                        root->partColsUpdated,
+                                        resultRelations,
                                         updateColnosLists,
                                         withCheckOptionLists,
                                         returningLists,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 42f088ad71..578ce4e6c8 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -867,6 +867,29 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
                     set_upper_references(root, plan, rtoffset);
                 else
                 {
+                    /*
+                     * The tlist of a childless Result could contain
+                     * unresolved ROWID_VAR Vars, in case it's representing a
+                     * target relation which is completely empty because of
+                     * constraint exclusion.  Replace any such Vars by null
+                     * constants, as though they'd been resolved for a leaf
+                     * scan node that doesn't support them.  We could have
+                     * fix_scan_expr do this, but since the case is only
+                     * expected to occur here, it seems safer to special-case
+                     * it here and keep the assertions that ROWID_VARs
+                     * shouldn't be seen by fix_scan_expr.
+                     */
+                    foreach(l, splan->plan.targetlist)
+                    {
+                        TargetEntry *tle = (TargetEntry *) lfirst(l);
+                        Var           *var = (Var *) tle->expr;
+
+                        if (var && IsA(var, Var) && var->varno == ROWID_VAR)
+                            tle->expr = (Expr *) makeNullConst(var->vartype,
+                                                               var->vartypmod,
+                                                               var->varcollid);
+                    }
+
                     splan->plan.targetlist =
                         fix_scan_list(root, splan->plan.targetlist,
                                       rtoffset, NUM_EXEC_TLIST(plan));
@@ -896,23 +919,20 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
                 if (splan->returningLists)
                 {
                     List       *newRL = NIL;
+                    Plan       *subplan = outerPlan(splan);
                     ListCell   *lcrl,
-                               *lcrr,
-                               *lcp;
+                               *lcrr;

                     /*
-                     * Pass each per-subplan returningList through
+                     * Pass each per-resultrel returningList through
                      * set_returning_clause_references().
                      */
                     Assert(list_length(splan->returningLists) == list_length(splan->resultRelations));
-                    Assert(list_length(splan->returningLists) == list_length(splan->plans));
-                    forthree(lcrl, splan->returningLists,
-                             lcrr, splan->resultRelations,
-                             lcp, splan->plans)
+                    forboth(lcrl, splan->returningLists,
+                            lcrr, splan->resultRelations)
                     {
                         List       *rlist = (List *) lfirst(lcrl);
                         Index        resultrel = lfirst_int(lcrr);
-                        Plan       *subplan = (Plan *) lfirst(lcp);

                         rlist = set_returning_clause_references(root,
                                                                 rlist,
@@ -982,12 +1002,6 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
                     rc->rti += rtoffset;
                     rc->prti += rtoffset;
                 }
-                foreach(l, splan->plans)
-                {
-                    lfirst(l) = set_plan_refs(root,
-                                              (Plan *) lfirst(l),
-                                              rtoffset);
-                }

                 /*
                  * Append this ModifyTable node's final result relation RT
@@ -1791,6 +1805,13 @@ fix_alternative_subplan(PlannerInfo *root, AlternativeSubPlan *asplan,
  * choosing the best implementation for AlternativeSubPlans,
  * looking up operator opcode info for OpExpr and related nodes,
  * and adding OIDs from regclass Const nodes into root->glob->relationOids.
+ *
+ * 'node': the expression to be modified
+ * 'rtoffset': how much to increment varnos by
+ * 'num_exec': estimated number of executions of expression
+ *
+ * The expression tree is either copied-and-modified, or modified in-place
+ * if that seems safe.
  */
 static Node *
 fix_scan_expr(PlannerInfo *root, Node *node, int rtoffset, double num_exec)
@@ -1839,11 +1860,12 @@ fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context)
         Assert(var->varlevelsup == 0);

         /*
-         * We should not see any Vars marked INNER_VAR or OUTER_VAR.  But an
-         * indexqual expression could contain INDEX_VAR Vars.
+         * We should not see Vars marked INNER_VAR, OUTER_VAR, or ROWID_VAR.
+         * But an indexqual expression could contain INDEX_VAR Vars.
          */
         Assert(var->varno != INNER_VAR);
         Assert(var->varno != OUTER_VAR);
+        Assert(var->varno != ROWID_VAR);
         if (!IS_SPECIAL_VARNO(var->varno))
             var->varno += context->rtoffset;
         if (var->varnosyn > 0)
@@ -1906,6 +1928,7 @@ fix_scan_expr_walker(Node *node, fix_scan_expr_context *context)
 {
     if (node == NULL)
         return false;
+    Assert(!(IsA(node, Var) && ((Var *) node)->varno == ROWID_VAR));
     Assert(!IsA(node, PlaceHolderVar));
     Assert(!IsA(node, AlternativeSubPlan));
     fix_expr_common(context->root, node);
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index f3e46e0959..b12ab7de2d 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2533,7 +2533,6 @@ finalize_plan(PlannerInfo *root, Plan *plan,
         case T_ModifyTable:
             {
                 ModifyTable *mtplan = (ModifyTable *) plan;
-                ListCell   *l;

                 /* Force descendant scan nodes to reference epqParam */
                 locally_added_param = mtplan->epqParam;
@@ -2548,16 +2547,6 @@ finalize_plan(PlannerInfo *root, Plan *plan,
                 finalize_primnode((Node *) mtplan->onConflictWhere,
                                   &context);
                 /* exclRelTlist contains only Vars, doesn't need examination */
-                foreach(l, mtplan->plans)
-                {
-                    context.paramids =
-                        bms_add_members(context.paramids,
-                                        finalize_plan(root,
-                                                      (Plan *) lfirst(l),
-                                                      gather_param,
-                                                      valid_params,
-                                                      scan_params));
-                }
             }
             break;

diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index e18553ac7c..62a1668796 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -920,7 +920,10 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
     subroot->multiexpr_params = NIL;
     subroot->eq_classes = NIL;
     subroot->ec_merging_done = false;
+    subroot->all_result_relids = NULL;
+    subroot->leaf_result_relids = NULL;
     subroot->append_rel_list = NIL;
+    subroot->row_identity_vars = NIL;
     subroot->rowMarks = NIL;
     memset(subroot->upper_rels, 0, sizeof(subroot->upper_rels));
     memset(subroot->upper_targets, 0, sizeof(subroot->upper_targets));
@@ -929,7 +932,6 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
     subroot->grouping_map = NULL;
     subroot->minmax_aggs = NIL;
     subroot->qual_security_level = 0;
-    subroot->inhTargetKind = INHKIND_NONE;
     subroot->hasRecursion = false;
     subroot->wt_param_id = -1;
     subroot->non_recursive_path = NULL;
diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 488e8cfd4d..363132185d 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -11,7 +11,8 @@
  *
  * For UPDATE and DELETE queries, the targetlist must also contain "junk"
  * tlist entries needed to allow the executor to identify the rows to be
- * updated or deleted; for example, the ctid of a heap row.
+ * updated or deleted; for example, the ctid of a heap row.  (The planner
+ * adds these; they're not in what we receive from the planner/rewriter.)
  *
  * For all query types, there can be additional junk tlist entries, such as
  * sort keys, Vars needed for a RETURNING list, and row ID information needed
@@ -19,20 +20,9 @@
  *
  * The query rewrite phase also does preprocessing of the targetlist (see
  * rewriteTargetListIU).  The division of labor between here and there is
- * partially historical, but it's not entirely arbitrary.  In particular,
- * consider an UPDATE across an inheritance tree.  What rewriteTargetListIU
- * does need be done only once (because it depends only on the properties of
- * the parent relation).  What's done here has to be done over again for each
- * child relation, because it depends on the properties of the child, which
- * might be of a different relation type, or have more columns and/or a
- * different column order than the parent.
- *
- * The fact that rewriteTargetListIU sorts non-resjunk tlist entries by column
- * position, which expand_targetlist depends on, violates the above comment
- * because the sorting is only valid for the parent relation.  In inherited
- * UPDATE cases, adjust_inherited_tlist runs in between to take care of fixing
- * the tlists for child tables to keep expand_targetlist happy.  We do it like
- * that because it's faster in typical non-inherited cases.
+ * partially historical, but it's not entirely arbitrary.  The stuff done
+ * here is closely connected to physical access to tables, whereas the
+ * rewriter's work is more concerned with SQL semantics.
  *
  *
  * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
@@ -46,19 +36,17 @@

 #include "postgres.h"

-#include "access/sysattr.h"
 #include "access/table.h"
-#include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
+#include "optimizer/appendinfo.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/prep.h"
 #include "optimizer/tlist.h"
 #include "parser/parse_coerce.h"
 #include "parser/parsetree.h"
-#include "rewrite/rewriteHandler.h"
 #include "utils/rel.h"

-static List *make_update_colnos(List *tlist);
+static List *extract_update_colnos(List *tlist);
 static List *expand_targetlist(List *tlist, int command_type,
                                Index result_relation, Relation rel);

@@ -67,13 +55,15 @@ static List *expand_targetlist(List *tlist, int command_type,
  * preprocess_targetlist
  *      Driver for preprocessing the parse tree targetlist.
  *
- *      Returns the new targetlist.
+ * The preprocessed targetlist is returned in root->processed_tlist.
+ * Also, if this is an UPDATE, we return a list of target column numbers
+ * in root->update_colnos.  (Resnos in processed_tlist will be consecutive,
+ * so do not look at that to find out which columns are targets!)
  *
  * As a side effect, if there's an ON CONFLICT UPDATE clause, its targetlist
- * is also preprocessed (and updated in-place).  Also, if this is an UPDATE,
- * we return a list of target column numbers in root->update_colnos.
+ * is also preprocessed (and updated in-place).
  */
-List *
+void
 preprocess_targetlist(PlannerInfo *root)
 {
     Query       *parse = root->parse;
@@ -106,29 +96,39 @@ preprocess_targetlist(PlannerInfo *root)
     else
         Assert(command_type == CMD_SELECT);

-    /*
-     * For UPDATE/DELETE, add any junk column(s) needed to allow the executor
-     * to identify the rows to be updated or deleted.  Note that this step
-     * scribbles on parse->targetList, which is not very desirable, but we
-     * keep it that way to avoid changing APIs used by FDWs.
-     */
-    if (command_type == CMD_UPDATE || command_type == CMD_DELETE)
-        rewriteTargetListUD(parse, target_rte, target_relation);
-
     /*
      * In an INSERT, the executor expects the targetlist to match the exact
      * order of the target table's attributes, including entries for
      * attributes not mentioned in the source query.
      *
      * In an UPDATE, we don't rearrange the tlist order, but we need to make a
-     * separate list of the target attribute numbers, in tlist order.
+     * separate list of the target attribute numbers, in tlist order, and then
+     * renumber the processed_tlist entries to be consecutive.
      */
     tlist = parse->targetList;
     if (command_type == CMD_INSERT)
         tlist = expand_targetlist(tlist, command_type,
                                   result_relation, target_relation);
     else if (command_type == CMD_UPDATE)
-        root->update_colnos = make_update_colnos(tlist);
+        root->update_colnos = extract_update_colnos(tlist);
+
+    /*
+     * For non-inherited UPDATE/DELETE, register any junk column(s) needed to
+     * allow the executor to identify the rows to be updated or deleted.  In
+     * the inheritance case, we do nothing now, leaving this to be dealt with
+     * when expand_inherited_rtentry() makes the leaf target relations.  (But
+     * there might not be any leaf target relations, in which case we must do
+     * this in distribute_row_identity_vars().)
+     */
+    if ((command_type == CMD_UPDATE || command_type == CMD_DELETE) &&
+        !target_rte->inh)
+    {
+        /* row-identity logic expects to add stuff to processed_tlist */
+        root->processed_tlist = tlist;
+        add_row_identity_columns(root, result_relation,
+                                 target_rte, target_relation);
+        tlist = root->processed_tlist;
+    }

     /*
      * Add necessary junk columns for rowmarked rels.  These values are needed
@@ -136,6 +136,14 @@ preprocess_targetlist(PlannerInfo *root)
      * rechecking.  See comments for PlanRowMark in plannodes.h.  If you
      * change this stanza, see also expand_inherited_rtentry(), which has to
      * be able to add on junk columns equivalent to these.
+     *
+     * (Someday it might be useful to fold these resjunk columns into the
+     * row-identity-column management used for UPDATE/DELETE.  Today is not
+     * that day, however.  One notable issue is that it seems important that
+     * the whole-row Vars made here use the real table rowtype, not RECORD, so
+     * that conversion to/from child relations' rowtypes will happen.  Also,
+     * since these entries don't potentially bloat with more and more child
+     * relations, there's not really much need for column sharing.)
      */
     foreach(lc, root->rowMarks)
     {
@@ -235,6 +243,8 @@ preprocess_targetlist(PlannerInfo *root)
         list_free(vars);
     }

+    root->processed_tlist = tlist;
+
     /*
      * If there's an ON CONFLICT UPDATE clause, preprocess its targetlist too
      * while we have the relation open.
@@ -248,22 +258,25 @@ preprocess_targetlist(PlannerInfo *root)

     if (target_relation)
         table_close(target_relation, NoLock);
-
-    return tlist;
 }

 /*
- * make_update_colnos
+ * extract_update_colnos
  *         Extract a list of the target-table column numbers that
- *         an UPDATE's targetlist wants to assign to.
+ *         an UPDATE's targetlist wants to assign to, then renumber.
  *
- * We just need to capture the resno's of the non-junk tlist entries.
+ * The convention in the parser and rewriter is that the resnos in an
+ * UPDATE's non-resjunk TLE entries are the target column numbers
+ * to assign to.  Here, we extract that info into a separate list, and
+ * then convert the tlist to the sequential-numbering convention that's
+ * used by all other query types.
  */
 static List *
-make_update_colnos(List *tlist)
+extract_update_colnos(List *tlist)
 {
-    List*update_colnos = NIL;
-    ListCell *lc;
+    List       *update_colnos = NIL;
+    AttrNumber    nextresno = 1;
+    ListCell   *lc;

     foreach(lc, tlist)
     {
@@ -271,6 +284,7 @@ make_update_colnos(List *tlist)

         if (!tle->resjunk)
             update_colnos = lappend_int(update_colnos, tle->resno);
+        tle->resno = nextresno++;
     }
     return update_colnos;
 }
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 86922a273c..af46f581ac 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -15,9 +15,12 @@
 #include "postgres.h"

 #include "access/htup_details.h"
+#include "access/table.h"
+#include "foreign/fdwapi.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
+#include "optimizer/pathnode.h"
 #include "parser/parsetree.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
@@ -37,8 +40,6 @@ static void make_inh_translation_list(Relation oldrelation,
                                       AppendRelInfo *appinfo);
 static Node *adjust_appendrel_attrs_mutator(Node *node,
                                             adjust_appendrel_attrs_context *context);
-static List *adjust_inherited_tlist(List *tlist,
-                                    AppendRelInfo *context);


 /*
@@ -194,7 +195,6 @@ Node *
 adjust_appendrel_attrs(PlannerInfo *root, Node *node, int nappinfos,
                        AppendRelInfo **appinfos)
 {
-    Node       *result;
     adjust_appendrel_attrs_context context;

     context.root = root;
@@ -204,40 +204,10 @@ adjust_appendrel_attrs(PlannerInfo *root, Node *node, int nappinfos,
     /* If there's nothing to adjust, don't call this function. */
     Assert(nappinfos >= 1 && appinfos != NULL);

-    /*
-     * Must be prepared to start with a Query or a bare expression tree.
-     */
-    if (node && IsA(node, Query))
-    {
-        Query       *newnode;
-        int            cnt;
+    /* Should never be translating a Query tree. */
+    Assert(node == NULL || !IsA(node, Query));

-        newnode = query_tree_mutator((Query *) node,
-                                     adjust_appendrel_attrs_mutator,
-                                     (void *) &context,
-                                     QTW_IGNORE_RC_SUBQUERIES);
-        for (cnt = 0; cnt < nappinfos; cnt++)
-        {
-            AppendRelInfo *appinfo = appinfos[cnt];
-
-            if (newnode->resultRelation == appinfo->parent_relid)
-            {
-                newnode->resultRelation = appinfo->child_relid;
-                /* Fix tlist resnos too, if it's inherited UPDATE */
-                if (newnode->commandType == CMD_UPDATE)
-                    newnode->targetList =
-                        adjust_inherited_tlist(newnode->targetList,
-                                               appinfo);
-                break;
-            }
-        }
-
-        result = (Node *) newnode;
-    }
-    else
-        result = adjust_appendrel_attrs_mutator(node, &context);
-
-    return result;
+    return adjust_appendrel_attrs_mutator(node, &context);
 }

 static Node *
@@ -343,6 +313,57 @@ adjust_appendrel_attrs_mutator(Node *node,
             }
             /* system attributes don't need any other translation */
         }
+        else if (var->varno == ROWID_VAR)
+        {
+            /*
+             * If it's a ROWID_VAR placeholder, see if we've reached a leaf
+             * target rel, for which we can translate the Var to a specific
+             * instantiation.  We should never be asked to translate to a set
+             * of relids containing more than one leaf target rel, so the
+             * answer will be unique.  If we're still considering non-leaf
+             * inheritance levels, return the ROWID_VAR Var as-is.
+             */
+            Relids        leaf_result_relids = context->root->leaf_result_relids;
+            Index        leaf_relid = 0;
+
+            for (cnt = 0; cnt < nappinfos; cnt++)
+            {
+                if (bms_is_member(appinfos[cnt]->child_relid,
+                                  leaf_result_relids))
+                {
+                    if (leaf_relid)
+                        elog(ERROR, "cannot translate to multiple leaf relids");
+                    leaf_relid = appinfos[cnt]->child_relid;
+                }
+            }
+
+            if (leaf_relid)
+            {
+                RowIdentityVarInfo *ridinfo = (RowIdentityVarInfo *)
+                list_nth(context->root->row_identity_vars, var->varattno - 1);
+
+                if (bms_is_member(leaf_relid, ridinfo->rowidrels))
+                {
+                    /* Substitute the Var given in the RowIdentityVarInfo */
+                    var = copyObject(ridinfo->rowidvar);
+                    /* ... but use the correct relid */
+                    var->varno = leaf_relid;
+                    /* varnosyn in the RowIdentityVarInfo is probably wrong */
+                    var->varnosyn = 0;
+                    var->varattnosyn = 0;
+                }
+                else
+                {
+                    /*
+                     * This leaf rel can't return the desired value, so
+                     * substitute a NULL of the correct type.
+                     */
+                    return (Node *) makeNullConst(var->vartype,
+                                                  var->vartypmod,
+                                                  var->varcollid);
+                }
+            }
+        }
         return (Node *) var;
     }
     if (IsA(node, CurrentOfExpr))
@@ -361,44 +382,6 @@ adjust_appendrel_attrs_mutator(Node *node,
         }
         return (Node *) cexpr;
     }
-    if (IsA(node, RangeTblRef))
-    {
-        RangeTblRef *rtr = (RangeTblRef *) copyObject(node);
-
-        for (cnt = 0; cnt < nappinfos; cnt++)
-        {
-            AppendRelInfo *appinfo = appinfos[cnt];
-
-            if (rtr->rtindex == appinfo->parent_relid)
-            {
-                rtr->rtindex = appinfo->child_relid;
-                break;
-            }
-        }
-        return (Node *) rtr;
-    }
-    if (IsA(node, JoinExpr))
-    {
-        /* Copy the JoinExpr node with correct mutation of subnodes */
-        JoinExpr   *j;
-        AppendRelInfo *appinfo;
-
-        j = (JoinExpr *) expression_tree_mutator(node,
-                                                 adjust_appendrel_attrs_mutator,
-                                                 (void *) context);
-        /* now fix JoinExpr's rtindex (probably never happens) */
-        for (cnt = 0; cnt < nappinfos; cnt++)
-        {
-            appinfo = appinfos[cnt];
-
-            if (j->rtindex == appinfo->parent_relid)
-            {
-                j->rtindex = appinfo->child_relid;
-                break;
-            }
-        }
-        return (Node *) j;
-    }
     if (IsA(node, PlaceHolderVar))
     {
         /* Copy the PlaceHolderVar node with correct mutation of subnodes */
@@ -486,6 +469,9 @@ adjust_appendrel_attrs_mutator(Node *node,
      */
     Assert(!IsA(node, SubLink));
     Assert(!IsA(node, Query));
+    /* We should never see these Query substructures, either. */
+    Assert(!IsA(node, RangeTblRef));
+    Assert(!IsA(node, JoinExpr));

     return expression_tree_mutator(node, adjust_appendrel_attrs_mutator,
                                    (void *) context);
@@ -621,100 +607,101 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 }

 /*
- * Adjust the targetlist entries of an inherited UPDATE operation
- *
- * The expressions have already been fixed, but we have to make sure that
- * the target resnos match the child table (they may not, in the case of
- * a column that was added after-the-fact by ALTER TABLE).  In some cases
- * this can force us to re-order the tlist to preserve resno ordering.
- * (We do all this work in special cases so that preptlist.c is fast for
- * the typical case.)
- *
- * The given tlist has already been through expression_tree_mutator;
- * therefore the TargetEntry nodes are fresh copies that it's okay to
- * scribble on.
- *
- * Note that this is not needed for INSERT because INSERT isn't inheritable.
+ * adjust_inherited_attnums
+ *      Translate an integer list of attribute numbers from parent to child.
  */
-static List *
-adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
+List *
+adjust_inherited_attnums(List *attnums, AppendRelInfo *context)
 {
-    bool        changed_it = false;
-    ListCell   *tl;
-    List       *new_tlist;
-    bool        more;
-    int            attrno;
+    List       *result = NIL;
+    ListCell   *lc;

     /* This should only happen for an inheritance case, not UNION ALL */
     Assert(OidIsValid(context->parent_reloid));

-    /* Scan tlist and update resnos to match attnums of child rel */
-    foreach(tl, tlist)
+    /* Look up each attribute in the AppendRelInfo's translated_vars list */
+    foreach(lc, attnums)
     {
-        TargetEntry *tle = (TargetEntry *) lfirst(tl);
+        AttrNumber    parentattno = lfirst_int(lc);
         Var           *childvar;

-        if (tle->resjunk)
-            continue;            /* ignore junk items */
-
         /* Look up the translation of this column: it must be a Var */
-        if (tle->resno <= 0 ||
-            tle->resno > list_length(context->translated_vars))
+        if (parentattno <= 0 ||
+            parentattno > list_length(context->translated_vars))
             elog(ERROR, "attribute %d of relation \"%s\" does not exist",
-                 tle->resno, get_rel_name(context->parent_reloid));
-        childvar = (Var *) list_nth(context->translated_vars, tle->resno - 1);
+                 parentattno, get_rel_name(context->parent_reloid));
+        childvar = (Var *) list_nth(context->translated_vars, parentattno - 1);
         if (childvar == NULL || !IsA(childvar, Var))
             elog(ERROR, "attribute %d of relation \"%s\" does not exist",
-                 tle->resno, get_rel_name(context->parent_reloid));
+                 parentattno, get_rel_name(context->parent_reloid));

-        if (tle->resno != childvar->varattno)
-        {
-            tle->resno = childvar->varattno;
-            changed_it = true;
-        }
+        result = lappend_int(result, childvar->varattno);
     }
+    return result;
+}

-    /*
-     * If we changed anything, re-sort the tlist by resno, and make sure
-     * resjunk entries have resnos above the last real resno.  The sort
-     * algorithm is a bit stupid, but for such a seldom-taken path, small is
-     * probably better than fast.
-     */
-    if (!changed_it)
-        return tlist;
+/*
+ * adjust_inherited_attnums_multilevel
+ *      As above, but traverse multiple inheritance levels as needed.
+ */
+List *
+adjust_inherited_attnums_multilevel(PlannerInfo *root, List *attnums,
+                                    Index child_relid, Index top_parent_relid)
+{
+    AppendRelInfo *appinfo = root->append_rel_array[child_relid];

-    new_tlist = NIL;
-    more = true;
-    for (attrno = 1; more; attrno++)
-    {
-        more = false;
-        foreach(tl, tlist)
-        {
-            TargetEntry *tle = (TargetEntry *) lfirst(tl);
+    if (!appinfo)
+        elog(ERROR, "child rel %d not found in append_rel_array", child_relid);

-            if (tle->resjunk)
-                continue;        /* ignore junk items */
+    /* Recurse if immediate parent is not the top parent. */
+    if (appinfo->parent_relid != top_parent_relid)
+        attnums = adjust_inherited_attnums_multilevel(root, attnums,
+                                                      appinfo->parent_relid,
+                                                      top_parent_relid);

-            if (tle->resno == attrno)
-                new_tlist = lappend(new_tlist, tle);
-            else if (tle->resno > attrno)
-                more = true;
-        }
-    }
+    /* Now translate for this child */
+    return adjust_inherited_attnums(attnums, appinfo);
+}

-    foreach(tl, tlist)
+/*
+ * get_translated_update_targetlist
+ *      Get the processed_tlist of an UPDATE query, translated as needed to
+ *      match a child target relation.
+ *
+ * Optionally also return the list of target column numbers translated
+ * to this target relation.  (The resnos in processed_tlist MUST NOT be
+ * relied on for this purpose.)
+ */
+void
+get_translated_update_targetlist(PlannerInfo *root, Index relid,
+                                 List **processed_tlist, List **update_colnos)
+{
+    /* This is pretty meaningless for commands other than UPDATE. */
+    Assert(root->parse->commandType == CMD_UPDATE);
+    if (relid == root->parse->resultRelation)
     {
-        TargetEntry *tle = (TargetEntry *) lfirst(tl);
-
-        if (!tle->resjunk)
-            continue;            /* here, ignore non-junk items */
-
-        tle->resno = attrno;
-        new_tlist = lappend(new_tlist, tle);
-        attrno++;
+        /*
+         * Non-inheritance case, so it's easy.  The caller might be expecting
+         * a tree it can scribble on, though, so copy.
+         */
+        *processed_tlist = copyObject(root->processed_tlist);
+        if (update_colnos)
+            *update_colnos = copyObject(root->update_colnos);
+    }
+    else
+    {
+        Assert(bms_is_member(relid, root->all_result_relids));
+        *processed_tlist = (List *)
+            adjust_appendrel_attrs_multilevel(root,
+                                              (Node *) root->processed_tlist,
+                                              bms_make_singleton(relid),
+                                              bms_make_singleton(root->parse->resultRelation));
+        if (update_colnos)
+            *update_colnos =
+                adjust_inherited_attnums_multilevel(root, root->update_colnos,
+                                                    relid,
+                                                    root->parse->resultRelation);
     }
-
-    return new_tlist;
 }

 /*
@@ -746,3 +733,270 @@ find_appinfos_by_relids(PlannerInfo *root, Relids relids, int *nappinfos)
     }
     return appinfos;
 }
+
+
+/*****************************************************************************
+ *
+ *        ROW-IDENTITY VARIABLE MANAGEMENT
+ *
+ * This code lacks a good home, perhaps.  We choose to keep it here because
+ * adjust_appendrel_attrs_mutator() is its principal co-conspirator.  That
+ * function does most of what is needed to expand ROWID_VAR Vars into the
+ * right things.
+ *
+ *****************************************************************************/
+
+/*
+ * add_row_identity_var
+ *      Register a row-identity column to be used in UPDATE/DELETE.
+ *
+ * The Var must be equal(), aside from varno, to any other row-identity
+ * column with the same rowid_name.  Thus, for example, "wholerow"
+ * row identities had better use vartype == RECORDOID.
+ *
+ * rtindex is currently redundant with rowid_var->varno, but we specify
+ * it as a separate parameter in case this is ever generalized to support
+ * non-Var expressions.  (We could reasonably handle expressions over
+ * Vars of the specified rtindex, but for now that seems unnecessary.)
+ */
+void
+add_row_identity_var(PlannerInfo *root, Var *orig_var,
+                     Index rtindex, const char *rowid_name)
+{
+    TargetEntry *tle;
+    Var           *rowid_var;
+    RowIdentityVarInfo *ridinfo;
+    ListCell   *lc;
+
+    /* For now, the argument must be just a Var of the given rtindex */
+    Assert(IsA(orig_var, Var));
+    Assert(orig_var->varno == rtindex);
+    Assert(orig_var->varlevelsup == 0);
+
+    /*
+     * If we're doing non-inherited UPDATE/DELETE, there's little need for
+     * ROWID_VAR shenanigans.  Just shove the presented Var into the
+     * processed_tlist, and we're done.
+     */
+    if (rtindex == root->parse->resultRelation)
+    {
+        tle = makeTargetEntry((Expr *) orig_var,
+                              list_length(root->processed_tlist) + 1,
+                              pstrdup(rowid_name),
+                              true);
+        root->processed_tlist = lappend(root->processed_tlist, tle);
+        return;
+    }
+
+    /*
+     * Otherwise, rtindex should reference a leaf target relation that's being
+     * added to the query during expand_inherited_rtentry().
+     */
+    Assert(bms_is_member(rtindex, root->leaf_result_relids));
+    Assert(root->append_rel_array[rtindex] != NULL);
+
+    /*
+     * We have to find a matching RowIdentityVarInfo, or make one if there is
+     * none.  To allow using equal() to match the vars, change the varno to
+     * ROWID_VAR, leaving all else alone.
+     */
+    rowid_var = copyObject(orig_var);
+    /* This could eventually become ChangeVarNodes() */
+    rowid_var->varno = ROWID_VAR;
+
+    /* Look for an existing row-id column of the same name */
+    foreach(lc, root->row_identity_vars)
+    {
+        ridinfo = (RowIdentityVarInfo *) lfirst(lc);
+        if (strcmp(rowid_name, ridinfo->rowidname) != 0)
+            continue;
+        if (equal(rowid_var, ridinfo->rowidvar))
+        {
+            /* Found a match; we need only record that rtindex needs it too */
+            ridinfo->rowidrels = bms_add_member(ridinfo->rowidrels, rtindex);
+            return;
+        }
+        else
+        {
+            /* Ooops, can't handle this */
+            elog(ERROR, "conflicting uses of row-identity name \"%s\"",
+                 rowid_name);
+        }
+    }
+
+    /* No request yet, so add a new RowIdentityVarInfo */
+    ridinfo = makeNode(RowIdentityVarInfo);
+    ridinfo->rowidvar = copyObject(rowid_var);
+    /* for the moment, estimate width using just the datatype info */
+    ridinfo->rowidwidth = get_typavgwidth(exprType((Node *) rowid_var),
+                                          exprTypmod((Node *) rowid_var));
+    ridinfo->rowidname = pstrdup(rowid_name);
+    ridinfo->rowidrels = bms_make_singleton(rtindex);
+
+    root->row_identity_vars = lappend(root->row_identity_vars, ridinfo);
+
+    /* Change rowid_var into a reference to this row_identity_vars entry */
+    rowid_var->varattno = list_length(root->row_identity_vars);
+
+    /* Push the ROWID_VAR reference variable into processed_tlist */
+    tle = makeTargetEntry((Expr *) rowid_var,
+                          list_length(root->processed_tlist) + 1,
+                          pstrdup(rowid_name),
+                          true);
+    root->processed_tlist = lappend(root->processed_tlist, tle);
+}
+
+/*
+ * add_row_identity_columns
+ *
+ * This function adds the row identity columns needed by the core code.
+ * FDWs might call add_row_identity_var() for themselves to add nonstandard
+ * columns.  (Duplicate requests are fine.)
+ */
+void
+add_row_identity_columns(PlannerInfo *root, Index rtindex,
+                         RangeTblEntry *target_rte,
+                         Relation target_relation)
+{
+    CmdType        commandType = root->parse->commandType;
+    char        relkind = target_relation->rd_rel->relkind;
+    Var           *var;
+
+    Assert(commandType == CMD_UPDATE || commandType == CMD_DELETE);
+
+    if (relkind == RELKIND_RELATION ||
+        relkind == RELKIND_MATVIEW ||
+        relkind == RELKIND_PARTITIONED_TABLE)
+    {
+        /*
+         * Emit CTID so that executor can find the row to update or delete.
+         */
+        var = makeVar(rtindex,
+                      SelfItemPointerAttributeNumber,
+                      TIDOID,
+                      -1,
+                      InvalidOid,
+                      0);
+        add_row_identity_var(root, var, rtindex, "ctid");
+    }
+    else if (relkind == RELKIND_FOREIGN_TABLE)
+    {
+        /*
+         * Let the foreign table's FDW add whatever junk TLEs it wants.
+         */
+        FdwRoutine *fdwroutine;
+
+        fdwroutine = GetFdwRoutineForRelation(target_relation, false);
+
+        if (fdwroutine->AddForeignUpdateTargets != NULL)
+            fdwroutine->AddForeignUpdateTargets(root, rtindex,
+                                                target_rte, target_relation);
+
+        /*
+         * For UPDATE, we need to make the FDW fetch unchanged columns by
+         * asking it to fetch a whole-row Var.  That's because the top-level
+         * targetlist only contains entries for changed columns, but
+         * ExecUpdate will need to build the complete new tuple.  (Actually,
+         * we only really need this in UPDATEs that are not pushed to the
+         * remote side, but it's hard to tell if that will be the case at the
+         * point when this function is called.)
+         *
+         * We will also need the whole row if there are any row triggers, so
+         * that the executor will have the "old" row to pass to the trigger.
+         * Alas, this misses system columns.
+         */
+        if (commandType == CMD_UPDATE ||
+            (target_relation->trigdesc &&
+             (target_relation->trigdesc->trig_delete_after_row ||
+              target_relation->trigdesc->trig_delete_before_row)))
+        {
+            var = makeVar(rtindex,
+                          InvalidAttrNumber,
+                          RECORDOID,
+                          -1,
+                          InvalidOid,
+                          0);
+            add_row_identity_var(root, var, rtindex, "wholerow");
+        }
+    }
+}
+
+/*
+ * distribute_row_identity_vars
+ *
+ * After we have finished identifying all the row identity columns
+ * needed by an inherited UPDATE/DELETE query, make sure that these
+ * columns will be generated by all the target relations.
+ *
+ * This is more or less like what build_base_rel_tlists() does,
+ * except that it would not understand what to do with ROWID_VAR Vars.
+ * Since that function runs before inheritance relations are expanded,
+ * it will never see any such Vars anyway.
+ */
+void
+distribute_row_identity_vars(PlannerInfo *root)
+{
+    Query       *parse = root->parse;
+    int            result_relation = parse->resultRelation;
+    RangeTblEntry *target_rte;
+    RelOptInfo *target_rel;
+    ListCell   *lc;
+
+    /* There's nothing to do if this isn't an inherited UPDATE/DELETE. */
+    if (parse->commandType != CMD_UPDATE && parse->commandType != CMD_DELETE)
+    {
+        Assert(root->row_identity_vars == NIL);
+        return;
+    }
+    target_rte = rt_fetch(result_relation, parse->rtable);
+    if (!target_rte->inh)
+    {
+        Assert(root->row_identity_vars == NIL);
+        return;
+    }
+
+    /*
+     * Ordinarily, we expect that leaf result relation(s) will have added some
+     * ROWID_VAR Vars to the query.  However, it's possible that constraint
+     * exclusion suppressed every leaf relation.  The executor will get upset
+     * if the plan has no row identity columns at all, even though it will
+     * certainly process no rows.  Handle this edge case by re-opening the top
+     * result relation and adding the row identity columns it would have used,
+     * as preprocess_targetlist() would have done if it weren't marked "inh".
+     * (This is a bit ugly, but it seems better to confine the ugliness and
+     * extra cycles to this unusual corner case.)  We needn't worry about
+     * fixing the rel's reltarget, as that won't affect the finished plan.
+     */
+    if (root->row_identity_vars == NIL)
+    {
+        Relation    target_relation;
+
+        target_relation = table_open(target_rte->relid, NoLock);
+        add_row_identity_columns(root, result_relation,
+                                 target_rte, target_relation);
+        table_close(target_relation, NoLock);
+        return;
+    }
+
+    /*
+     * Dig through the processed_tlist to find the ROWID_VAR reference Vars,
+     * and forcibly copy them into the reltarget list of the topmost target
+     * relation.  That's sufficient because they'll be copied to the
+     * individual leaf target rels (with appropriate translation) later,
+     * during appendrel expansion --- see set_append_rel_size().
+     */
+    target_rel = find_base_rel(root, result_relation);
+
+    foreach(lc, root->processed_tlist)
+    {
+        TargetEntry *tle = lfirst(lc);
+        Var           *var = (Var *) tle->expr;
+
+        if (var && IsA(var, Var) && var->varno == ROWID_VAR)
+        {
+            target_rel->reltarget->exprs =
+                lappend(target_rel->reltarget->exprs, copyObject(var));
+            /* reltarget cost and width will be computed later */
+        }
+    }
+}
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index be1c9ddd96..13f67ab744 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -219,6 +219,10 @@ expand_inherited_rtentry(PlannerInfo *root, RelOptInfo *rel,
      * targetlist and update parent rel's reltarget.  This should match what
      * preprocess_targetlist() would have added if the mark types had been
      * requested originally.
+     *
+     * (Someday it might be useful to fold these resjunk columns into the
+     * row-identity-column management used for UPDATE/DELETE.  Today is not
+     * that day, however.)
      */
     if (oldrc)
     {
@@ -585,6 +589,46 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,

         root->rowMarks = lappend(root->rowMarks, childrc);
     }
+
+    /*
+     * If we are creating a child of the query target relation (only possible
+     * in UPDATE/DELETE), add it to all_result_relids, as well as
+     * leaf_result_relids if appropriate, and make sure that we generate
+     * required row-identity data.
+     */
+    if (bms_is_member(parentRTindex, root->all_result_relids))
+    {
+        /* OK, record the child as a result rel too. */
+        root->all_result_relids = bms_add_member(root->all_result_relids,
+                                                 childRTindex);
+
+        /* Non-leaf partitions don't need any row identity info. */
+        if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
+        {
+            Var           *rrvar;
+
+            root->leaf_result_relids = bms_add_member(root->leaf_result_relids,
+                                                      childRTindex);
+
+            /*
+             * If we have any child target relations, assume they all need to
+             * generate a junk "tableoid" column.  (If only one child survives
+             * pruning, we wouldn't really need this, but it's not worth
+             * thrashing about to avoid it.)
+             */
+            rrvar = makeVar(childRTindex,
+                            TableOidAttributeNumber,
+                            OIDOID,
+                            -1,
+                            InvalidOid,
+                            0);
+            add_row_identity_var(root, rrvar, childRTindex, "tableoid");
+
+            /* Register any row-identity columns needed by this child. */
+            add_row_identity_columns(root, childRTindex,
+                                     childrte, childrel);
+        }
+    }
 }

 /*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a97929c13f..64edf7e9fc 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3539,6 +3539,7 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
  *      Creates a pathnode that represents performing INSERT/UPDATE/DELETE mods
  *
  * 'rel' is the parent relation associated with the result
+ * 'subpath' is a Path producing source data
  * 'operation' is the operation type
  * 'canSetTag' is true if we set the command tag/es_processed
  * 'nominalRelation' is the parent RT index for use of EXPLAIN
@@ -3546,8 +3547,6 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
  * 'partColsUpdated' is true if any partitioning columns are being updated,
  *        either from the target relation or a descendent partitioned table.
  * 'resultRelations' is an integer list of actual RT indexes of target rel(s)
- * 'subpaths' is a list of Path(s) producing source data (one per rel)
- * 'subroots' is a list of PlannerInfo structs (one per rel)
  * 'updateColnosLists' is a list of UPDATE target column number lists
  *        (one sublist per rel); or NIL if not an UPDATE
  * 'withCheckOptionLists' is a list of WCO lists (one per rel)
@@ -3558,22 +3557,18 @@ create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
  */
 ModifyTablePath *
 create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
+                        Path *subpath,
                         CmdType operation, bool canSetTag,
                         Index nominalRelation, Index rootRelation,
                         bool partColsUpdated,
-                        List *resultRelations, List *subpaths,
-                        List *subroots,
+                        List *resultRelations,
                         List *updateColnosLists,
                         List *withCheckOptionLists, List *returningLists,
                         List *rowMarks, OnConflictExpr *onconflict,
                         int epqParam)
 {
     ModifyTablePath *pathnode = makeNode(ModifyTablePath);
-    double        total_size;
-    ListCell   *lc;

-    Assert(list_length(resultRelations) == list_length(subpaths));
-    Assert(list_length(resultRelations) == list_length(subroots));
     Assert(operation == CMD_UPDATE ?
            list_length(resultRelations) == list_length(updateColnosLists) :
            updateColnosLists == NIL);
@@ -3594,7 +3589,7 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
     pathnode->path.pathkeys = NIL;

     /*
-     * Compute cost & rowcount as sum of subpath costs & rowcounts.
+     * Compute cost & rowcount as subpath cost & rowcount (if RETURNING)
      *
      * Currently, we don't charge anything extra for the actual table
      * modification work, nor for the WITH CHECK OPTIONS or RETURNING
@@ -3603,42 +3598,33 @@ create_modifytable_path(PlannerInfo *root, RelOptInfo *rel,
      * costs to change any higher-level planning choices.  But we might want
      * to make it look better sometime.
      */
-    pathnode->path.startup_cost = 0;
-    pathnode->path.total_cost = 0;
-    pathnode->path.rows = 0;
-    total_size = 0;
-    foreach(lc, subpaths)
+    pathnode->path.startup_cost = subpath->startup_cost;
+    pathnode->path.total_cost = subpath->total_cost;
+    if (returningLists != NIL)
     {
-        Path       *subpath = (Path *) lfirst(lc);
+        pathnode->path.rows = subpath->rows;

-        if (lc == list_head(subpaths))    /* first node? */
-            pathnode->path.startup_cost = subpath->startup_cost;
-        pathnode->path.total_cost += subpath->total_cost;
-        if (returningLists != NIL)
-        {
-            pathnode->path.rows += subpath->rows;
-            total_size += subpath->pathtarget->width * subpath->rows;
-        }
+        /*
+         * Set width to match the subpath output.  XXX this is totally wrong:
+         * we should return an average of the RETURNING tlist widths.  But
+         * it's what happened historically, and improving it is a task for
+         * another day.  (Again, it's mostly window dressing.)
+         */
+        pathnode->path.pathtarget->width = subpath->pathtarget->width;
+    }
+    else
+    {
+        pathnode->path.rows = 0;
+        pathnode->path.pathtarget->width = 0;
     }

-    /*
-     * Set width to the average width of the subpath outputs.  XXX this is
-     * totally wrong: we should return an average of the RETURNING tlist
-     * widths.  But it's what happened historically, and improving it is a task
-     * for another day.
-     */
-    if (pathnode->path.rows > 0)
-        total_size /= pathnode->path.rows;
-    pathnode->path.pathtarget->width = rint(total_size);
-
+    pathnode->subpath = subpath;
     pathnode->operation = operation;
     pathnode->canSetTag = canSetTag;
     pathnode->nominalRelation = nominalRelation;
     pathnode->rootRelation = rootRelation;
     pathnode->partColsUpdated = partColsUpdated;
     pathnode->resultRelations = resultRelations;
-    pathnode->subpaths = subpaths;
-    pathnode->subroots = subroots;
     pathnode->updateColnosLists = updateColnosLists;
     pathnode->withCheckOptionLists = withCheckOptionLists;
     pathnode->returningLists = returningLists;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 6c39bf893f..d0fb3b6834 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1453,18 +1453,11 @@ relation_excluded_by_constraints(PlannerInfo *root,

             /*
              * When constraint_exclusion is set to 'partition' we only handle
-             * appendrel members.  Normally, they are RELOPT_OTHER_MEMBER_REL
-             * relations, but we also consider inherited target relations as
-             * appendrel members for the purposes of constraint exclusion
-             * (since, indeed, they were appendrel members earlier in
-             * inheritance_planner).
-             *
-             * In both cases, partition pruning was already applied, so there
-             * is no need to consider the rel's partition constraints here.
+             * appendrel members.  Partition pruning has already been applied,
+             * so there is no need to consider the rel's partition constraints
+             * here.
              */
-            if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL ||
-                (rel->relid == root->parse->resultRelation &&
-                 root->inhTargetKind != INHKIND_NONE))
+            if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
                 break;            /* appendrel member, so process it */
             return false;

@@ -1477,9 +1470,7 @@ relation_excluded_by_constraints(PlannerInfo *root,
              * its partition constraints haven't been considered yet, so
              * include them in the processing here.
              */
-            if (rel->reloptkind == RELOPT_BASEREL &&
-                !(rel->relid == root->parse->resultRelation &&
-                  root->inhTargetKind != INHKIND_NONE))
+            if (rel->reloptkind == RELOPT_BASEREL)
                 include_partition = true;
             break;                /* always try to exclude */
     }
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 345c877aeb..e105a4d5f1 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -977,8 +977,6 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
     foreach(vars, input_rel->reltarget->exprs)
     {
         Var           *var = (Var *) lfirst(vars);
-        RelOptInfo *baserel;
-        int            ndx;

         /*
          * Ignore PlaceHolderVars in the input tlists; we'll make our own
@@ -996,17 +994,35 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
             elog(ERROR, "unexpected node type in rel targetlist: %d",
                  (int) nodeTag(var));

-        /* Get the Var's original base rel */
-        baserel = find_base_rel(root, var->varno);
-
-        /* Is it still needed above this joinrel? */
-        ndx = var->varattno - baserel->min_attr;
-        if (bms_nonempty_difference(baserel->attr_needed[ndx], relids))
+        if (var->varno == ROWID_VAR)
         {
-            /* Yup, add it to the output */
-            joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs, var);
+            /* UPDATE/DELETE row identity vars are always needed */
+            RowIdentityVarInfo *ridinfo = (RowIdentityVarInfo *)
+            list_nth(root->row_identity_vars, var->varattno - 1);
+
+            joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs,
+                                                var);
             /* Vars have cost zero, so no need to adjust reltarget->cost */
-            joinrel->reltarget->width += baserel->attr_widths[ndx];
+            joinrel->reltarget->width += ridinfo->rowidwidth;
+        }
+        else
+        {
+            RelOptInfo *baserel;
+            int            ndx;
+
+            /* Get the Var's original base rel */
+            baserel = find_base_rel(root, var->varno);
+
+            /* Is it still needed above this joinrel? */
+            ndx = var->varattno - baserel->min_attr;
+            if (bms_nonempty_difference(baserel->attr_needed[ndx], relids))
+            {
+                /* Yup, add it to the output */
+                joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs,
+                                                    var);
+                /* Vars have cost zero, so no need to adjust reltarget->cost */
+                joinrel->reltarget->width += baserel->attr_widths[ndx];
+            }
         }
     }
 }
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index f9175987f8..92661abae2 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -705,16 +705,9 @@ adjustJoinTreeList(Query *parsetree, bool removert, int rt_index)
  *
  * We must do items 1,2,3 before firing rewrite rules, else rewritten
  * references to NEW.foo will produce wrong or incomplete results.  Item 4
- * is not needed for rewriting, but will be needed by the planner, and we
+ * is not needed for rewriting, but it is helpful for the planner, and we
  * can do it essentially for free while handling the other items.
  *
- * Note that for an inheritable UPDATE, this processing is only done once,
- * using the parent relation as reference.  It must not do anything that
- * will not be correct when transposed to the child relation(s).  (Step 4
- * is incorrect by this light, since child relations might have different
- * column ordering, but the planner will fix things by re-sorting the tlist
- * for each child.)
- *
  * If values_rte is non-NULL (i.e., we are doing a multi-row INSERT using
  * values from a VALUES RTE), we populate *unused_values_attrnos with the
  * attribute numbers of any unused columns from the VALUES RTE.  This can
@@ -1607,94 +1600,6 @@ rewriteValuesRTE(Query *parsetree, RangeTblEntry *rte, int rti,
 }


-/*
- * rewriteTargetListUD - rewrite UPDATE/DELETE targetlist as needed
- *
- * This function adds a "junk" TLE that is needed to allow the executor to
- * find the original row for the update or delete.  When the target relation
- * is a regular table, the junk TLE emits the ctid attribute of the original
- * row.  When the target relation is a foreign table, we let the FDW decide
- * what to add.
- *
- * We used to do this during RewriteQuery(), but now that inheritance trees
- * can contain a mix of regular and foreign tables, we must postpone it till
- * planning, after the inheritance tree has been expanded.  In that way we
- * can do the right thing for each child table.
- */
-void
-rewriteTargetListUD(Query *parsetree, RangeTblEntry *target_rte,
-                    Relation target_relation)
-{
-    Var           *var = NULL;
-    const char *attrname;
-    TargetEntry *tle;
-
-    if (target_relation->rd_rel->relkind == RELKIND_RELATION ||
-        target_relation->rd_rel->relkind == RELKIND_MATVIEW ||
-        target_relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-    {
-        /*
-         * Emit CTID so that executor can find the row to update or delete.
-         */
-        var = makeVar(parsetree->resultRelation,
-                      SelfItemPointerAttributeNumber,
-                      TIDOID,
-                      -1,
-                      InvalidOid,
-                      0);
-
-        attrname = "ctid";
-    }
-    else if (target_relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
-    {
-        /*
-         * Let the foreign table's FDW add whatever junk TLEs it wants.
-         */
-        FdwRoutine *fdwroutine;
-
-        fdwroutine = GetFdwRoutineForRelation(target_relation, false);
-
-        if (fdwroutine->AddForeignUpdateTargets != NULL)
-            fdwroutine->AddForeignUpdateTargets(parsetree, target_rte,
-                                                target_relation);
-
-        /*
-         * For UPDATE, we need to make the FDW fetch unchanged columns by
-         * asking it to fetch a whole-row Var.  That's because the top-level
-         * targetlist only contains entries for changed columns.  (Actually,
-         * we only really need this for UPDATEs that are not pushed to the
-         * remote side, but it's hard to tell if that will be the case at the
-         * point when this function is called.)
-         *
-         * We will also need the whole row if there are any row triggers, so
-         * that the executor will have the "old" row to pass to the trigger.
-         * Alas, this misses system columns.
-         */
-        if (parsetree->commandType == CMD_UPDATE ||
-            (target_relation->trigdesc &&
-             (target_relation->trigdesc->trig_delete_after_row ||
-              target_relation->trigdesc->trig_delete_before_row)))
-        {
-            var = makeWholeRowVar(target_rte,
-                                  parsetree->resultRelation,
-                                  0,
-                                  false);
-
-            attrname = "wholerow";
-        }
-    }
-
-    if (var != NULL)
-    {
-        tle = makeTargetEntry((Expr *) var,
-                              list_length(parsetree->targetList) + 1,
-                              pstrdup(attrname),
-                              true);
-
-        parsetree->targetList = lappend(parsetree->targetList, tle);
-    }
-}
-
 /*
  * Record in target_rte->extraUpdatedCols the indexes of any generated columns
  * that depend on any columns mentioned in target_rte->updatedCols.
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index f0de2a25c9..03c22c80c3 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -4572,16 +4572,12 @@ set_deparse_plan(deparse_namespace *dpns, Plan *plan)
      * We special-case Append and MergeAppend to pretend that the first child
      * plan is the OUTER referent; we have to interpret OUTER Vars in their
      * tlists according to one of the children, and the first one is the most
-     * natural choice.  Likewise special-case ModifyTable to pretend that the
-     * first child plan is the OUTER referent; this is to support RETURNING
-     * lists containing references to non-target relations.
+     * natural choice.
      */
     if (IsA(plan, Append))
         dpns->outer_plan = linitial(((Append *) plan)->appendplans);
     else if (IsA(plan, MergeAppend))
         dpns->outer_plan = linitial(((MergeAppend *) plan)->mergeplans);
-    else if (IsA(plan, ModifyTable))
-        dpns->outer_plan = linitial(((ModifyTable *) plan)->plans);
     else
         dpns->outer_plan = outerPlan(plan);

diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 248f78da45..bd68fd5f8c 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -65,7 +65,8 @@ typedef void (*GetForeignUpperPaths_function) (PlannerInfo *root,
                                                RelOptInfo *output_rel,
                                                void *extra);

-typedef void (*AddForeignUpdateTargets_function) (Query *parsetree,
+typedef void (*AddForeignUpdateTargets_function) (PlannerInfo *root,
+                                                  Index rtindex,
                                                   RangeTblEntry *target_rte,
                                                   Relation target_relation);

diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 7af6d48525..d5f96609d8 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -422,7 +422,7 @@ typedef struct ResultRelInfo
      * For UPDATE/DELETE result relations, the attribute number of the row
      * identity junk attribute in the source plan's output tuples
      */
-    AttrNumber        ri_RowIdAttNo;
+    AttrNumber    ri_RowIdAttNo;

     /* Projection to generate new tuple in an INSERT/UPDATE */
     ProjectionInfo *ri_projectNew;
@@ -666,10 +666,7 @@ typedef struct ExecRowMark
  * Each LockRows and ModifyTable node keeps a list of the rowmarks it needs to
  * deal with.  In addition to a pointer to the related entry in es_rowmarks,
  * this struct carries the column number(s) of the resjunk columns associated
- * with the rowmark (see comments for PlanRowMark for more detail).  In the
- * case of ModifyTable, there has to be a separate ExecAuxRowMark list for
- * each child plan, because the resjunk columns could be at different physical
- * column positions in different subplans.
+ * with the rowmark (see comments for PlanRowMark for more detail).
  */
 typedef struct ExecAuxRowMark
 {
@@ -1071,9 +1068,8 @@ typedef struct PlanState
  * EvalPlanQualSlot), and/or found using the rowmark mechanism (non-locking
  * rowmarks by the EPQ machinery itself, locking ones by the caller).
  *
- * While the plan to be checked may be changed using EvalPlanQualSetPlan() -
- * e.g. so all source plans for a ModifyTable node can be processed - all such
- * plans need to share the same EState.
+ * While the plan to be checked may be changed using EvalPlanQualSetPlan(),
+ * all such plans need to share the same EState.
  */
 typedef struct EPQState
 {
@@ -1167,23 +1163,31 @@ typedef struct ModifyTableState
     CmdType        operation;        /* INSERT, UPDATE, or DELETE */
     bool        canSetTag;        /* do we set the command tag/es_processed? */
     bool        mt_done;        /* are we done? */
-    PlanState **mt_plans;        /* subplans (one per target rel) */
-    int            mt_nplans;        /* number of plans in the array */
-    int            mt_whichplan;    /* which one is being executed (0..n-1) */
-    TupleTableSlot **mt_scans;    /* input tuple corresponding to underlying
-                                 * plans */
-    ResultRelInfo *resultRelInfo;    /* per-subplan target relations */
+    int            mt_nrels;        /* number of entries in resultRelInfo[] */
+    ResultRelInfo *resultRelInfo;    /* info about target relation(s) */

     /*
      * Target relation mentioned in the original statement, used to fire
-     * statement-level triggers and as the root for tuple routing.
+     * statement-level triggers and as the root for tuple routing.  (This
+     * might point to one of the resultRelInfo[] entries, but it can also be a
+     * distinct struct.)
      */
     ResultRelInfo *rootResultRelInfo;

-    List      **mt_arowmarks;    /* per-subplan ExecAuxRowMark lists */
     EPQState    mt_epqstate;    /* for evaluating EvalPlanQual rechecks */
     bool        fireBSTriggers; /* do we need to fire stmt triggers? */

+    /*
+     * These fields are used for inherited UPDATE and DELETE, to track which
+     * target relation a given tuple is from.  If there are a lot of target
+     * relations, we use a hash table to translate table OIDs to
+     * resultRelInfo[] indexes; otherwise mt_resultOidHash is NULL.
+     */
+    int            mt_resultOidAttno;    /* resno of "tableoid" junk attr */
+    Oid            mt_lastResultOid;    /* last-seen value of tableoid */
+    int            mt_lastResultIndex; /* corresponding index in resultRelInfo[] */
+    HTAB       *mt_resultOidHash;    /* optional hash table to speed lookups */
+
     /*
      * Slot for storing tuples in the root partitioned table's rowtype during
      * an UPDATE of a partitioned table.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e22df890ef..16c750da02 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -270,6 +270,7 @@ typedef enum NodeTag
     T_PlaceHolderVar,
     T_SpecialJoinInfo,
     T_AppendRelInfo,
+    T_RowIdentityVarInfo,
     T_PlaceHolderInfo,
     T_MinMaxAggInfo,
     T_PlannerParamItem,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index bed9f4da09..c37d1259cf 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -77,18 +77,6 @@ typedef enum UpperRelationKind
     /* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
 } UpperRelationKind;

-/*
- * This enum identifies which type of relation is being planned through the
- * inheritance planner.  INHKIND_NONE indicates the inheritance planner
- * was not used.
- */
-typedef enum InheritanceKind
-{
-    INHKIND_NONE,
-    INHKIND_INHERITED,
-    INHKIND_PARTITIONED
-} InheritanceKind;
-
 /*----------
  * PlannerGlobal
  *        Global information for planning/optimization
@@ -276,6 +264,17 @@ struct PlannerInfo

     List       *join_info_list; /* list of SpecialJoinInfos */

+    /*
+     * all_result_relids is empty for SELECT, otherwise it contains at least
+     * parse->resultRelation.  For UPDATE/DELETE across an inheritance or
+     * partitioning tree, the result rel's child relids are added.  When using
+     * multi-level partitioning, intermediate partitioned rels are included.
+     * leaf_result_relids is similar except that only actual result tables,
+     * not partitioned tables, are included in it.
+     */
+    Relids        all_result_relids;    /* set of all result relids */
+    Relids        leaf_result_relids; /* set of all leaf relids */
+
     /*
      * Note: for AppendRelInfos describing partitions of a partitioned table,
      * we guarantee that partitions that come earlier in the partitioned
@@ -283,6 +282,8 @@ struct PlannerInfo
      */
     List       *append_rel_list;    /* list of AppendRelInfos */

+    List       *row_identity_vars;    /* list of RowIdentityVarInfos */
+
     List       *rowMarks;        /* list of PlanRowMarks */

     List       *placeholder_list;    /* list of PlaceHolderInfos */
@@ -322,7 +323,8 @@ struct PlannerInfo
      * For UPDATE, processed_tlist remains in the order the user wrote the
      * assignments.  This list contains the target table's attribute numbers
      * to which the first N entries of processed_tlist are to be assigned.
-     * (Any additional entries in processed_tlist must be resjunk.)
+     * (Any additional entries in processed_tlist must be resjunk.)  DO NOT
+     * use the resnos in processed_tlist to identify the UPDATE targets.
      */
     List       *update_colnos;

@@ -341,9 +343,6 @@ struct PlannerInfo
     Index        qual_security_level;    /* minimum security_level for quals */
     /* Note: qual_security_level is zero if there are no securityQuals */

-    InheritanceKind inhTargetKind;    /* indicates if the target relation is an
-                                     * inheritance child or partition or a
-                                     * partitioned table */
     bool        hasJoinRTEs;    /* true if any RTEs are RTE_JOIN kind */
     bool        hasLateralRTEs; /* true if any RTEs are marked LATERAL */
     bool        hasHavingQual;    /* true if havingQual was non-null */
@@ -1833,20 +1832,19 @@ typedef struct LockRowsPath
  * ModifyTablePath represents performing INSERT/UPDATE/DELETE modifications
  *
  * We represent most things that will be in the ModifyTable plan node
- * literally, except we have child Path(s) not Plan(s).  But analysis of the
+ * literally, except we have a child Path not Plan.  But analysis of the
  * OnConflictExpr is deferred to createplan.c, as is collection of FDW data.
  */
 typedef struct ModifyTablePath
 {
     Path        path;
+    Path       *subpath;        /* Path producing source data */
     CmdType        operation;        /* INSERT, UPDATE, or DELETE */
     bool        canSetTag;        /* do we set the command tag/es_processed? */
     Index        nominalRelation;    /* Parent RT index for use of EXPLAIN */
     Index        rootRelation;    /* Root RT index, if target is partitioned */
-    bool        partColsUpdated;    /* some part key in hierarchy updated */
+    bool        partColsUpdated;    /* some part key in hierarchy updated? */
     List       *resultRelations;    /* integer list of RT indexes */
-    List       *subpaths;        /* Path(s) producing source data */
-    List       *subroots;        /* per-target-table PlannerInfos */
     List       *updateColnosLists; /* per-target-table update_colnos lists */
     List       *withCheckOptionLists;    /* per-target-table WCO lists */
     List       *returningLists; /* per-target-table RETURNING tlists */
@@ -2303,6 +2301,34 @@ typedef struct AppendRelInfo
     Oid            parent_reloid;    /* OID of parent relation */
 } AppendRelInfo;

+/*
+ * Information about a row-identity "resjunk" column in UPDATE/DELETE.
+ *
+ * In partitioned UPDATE/DELETE it's important for child partitions to share
+ * row-identity columns whenever possible, so as not to chew up too many
+ * targetlist columns.  We use these structs to track which identity columns
+ * have been requested.  In the finished plan, each of these will give rise
+ * to one resjunk entry in the targetlist of the ModifyTable's subplan node.
+ *
+ * All the Vars stored in RowIdentityVarInfos must have varno ROWID_VAR, for
+ * convenience of detecting duplicate requests.  We'll replace that, in the
+ * final plan, with the varno of the generating rel.
+ *
+ * Outside this list, a Var with varno ROWID_VAR and varattno k is a reference
+ * to the k-th element of the row_identity_vars list (k counting from 1).
+ * We add such a reference to root->processed_tlist when creating the entry,
+ * and it propagates into the plan tree from there.
+ */
+typedef struct RowIdentityVarInfo
+{
+    NodeTag        type;
+
+    Var           *rowidvar;        /* Var to be evaluated (but varno=ROWID_VAR) */
+    int32        rowidwidth;        /* estimated average width */
+    char       *rowidname;        /* name of the resjunk column */
+    Relids        rowidrels;        /* RTE indexes of target rels using this */
+} RowIdentityVarInfo;
+
 /*
  * For each distinct placeholder expression generated during planning, we
  * store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 7d74bd92b8..f371390f7f 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -196,7 +196,7 @@ typedef struct ProjectSet

 /* ----------------
  *     ModifyTable node -
- *        Apply rows produced by subplan(s) to result table(s),
+ *        Apply rows produced by outer plan to result table(s),
  *        by inserting, updating, or deleting.
  *
  * If the originally named target table is a partitioned table, both
@@ -206,7 +206,7 @@ typedef struct ProjectSet
  * EXPLAIN should claim is the INSERT/UPDATE/DELETE target.
  *
  * Note that rowMarks and epqParam are presumed to be valid for all the
- * subplan(s); they can't contain any info that varies across subplans.
+ * table(s); they can't contain any info that varies across tables.
  * ----------------
  */
 typedef struct ModifyTable
@@ -216,9 +216,8 @@ typedef struct ModifyTable
     bool        canSetTag;        /* do we set the command tag/es_processed? */
     Index        nominalRelation;    /* Parent RT index for use of EXPLAIN */
     Index        rootRelation;    /* Root RT index, if target is partitioned */
-    bool        partColsUpdated;    /* some part key in hierarchy updated */
+    bool        partColsUpdated;    /* some part key in hierarchy updated? */
     List       *resultRelations;    /* integer list of RT indexes */
-    List       *plans;            /* plan(s) producing source data */
     List       *updateColnosLists; /* per-target-table update_colnos lists */
     List       *withCheckOptionLists;    /* per-target-table WCO lists */
     List       *returningLists; /* per-target-table RETURNING tlists */
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index d4ce037088..c25605ce0c 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -158,6 +158,10 @@ typedef struct Expr
  * than a heap column.  (In ForeignScan and CustomScan plan nodes, INDEX_VAR
  * is abused to signify references to columns of a custom scan tuple type.)
  *
+ * ROWID_VAR is used in the planner to identify nonce variables that carry
+ * row identity information during UPDATE/DELETE.  This value should never
+ * be seen outside the planner.
+ *
  * In the parser, varnosyn and varattnosyn are either identical to
  * varno/varattno, or they specify the column's position in an aliased JOIN
  * RTE that hides the semantic referent RTE's refname.  This is a syntactic
@@ -171,6 +175,7 @@ typedef struct Expr
 #define    INNER_VAR        65000    /* reference to inner subplan */
 #define    OUTER_VAR        65001    /* reference to outer subplan */
 #define    INDEX_VAR        65002    /* reference to index column */
+#define    ROWID_VAR        65003    /* row identity column during planning */

 #define IS_SPECIAL_VARNO(varno)        ((varno) >= INNER_VAR)

@@ -1386,13 +1391,14 @@ typedef struct InferenceElem
  * column for the item; so there may be missing or out-of-order resnos.
  * It is even legal to have duplicated resnos; consider
  *        UPDATE table SET arraycol[1] = ..., arraycol[2] = ..., ...
- * The two meanings come together in the executor, because the planner
- * transforms INSERT/UPDATE tlists into a normalized form with exactly
- * one entry for each column of the destination table.  Before that's
- * happened, however, it is risky to assume that resno == position.
- * Generally get_tle_by_resno() should be used rather than list_nth()
- * to fetch tlist entries by resno, and only in SELECT should you assume
- * that resno is a unique identifier.
+ * In an INSERT, the rewriter and planner will normalize the tlist by
+ * reordering it into physical column order and filling in default values
+ * for any columns not assigned values by the original query.  In an UPDATE,
+ * after the rewriter merges multiple assignments for the same column, the
+ * planner extracts the target-column numbers into a separate "update_colnos"
+ * list, and then renumbers the tlist elements serially.  Thus, tlist resnos
+ * match ordinal position in all tlists seen by the executor; but it is wrong
+ * to assume that before planning has happened.
  *
  * resname is required to represent the correct column name in non-resjunk
  * entries of top-level SELECT targetlists, since it will be used as the
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index 4cbf8c26cc..39d04d9cc0 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -28,8 +28,23 @@ extern Node *adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node,
 extern Relids adjust_child_relids(Relids relids, int nappinfos,
                                   AppendRelInfo **appinfos);
 extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
-                                             Relids child_relids, Relids top_parent_relids);
+                                             Relids child_relids,
+                                             Relids top_parent_relids);
+extern List *adjust_inherited_attnums(List *attnums, AppendRelInfo *context);
+extern List *adjust_inherited_attnums_multilevel(PlannerInfo *root,
+                                                 List *attnums,
+                                                 Index child_relid,
+                                                 Index top_parent_relid);
+extern void get_translated_update_targetlist(PlannerInfo *root, Index relid,
+                                             List **processed_tlist,
+                                             List **update_colnos);
 extern AppendRelInfo **find_appinfos_by_relids(PlannerInfo *root,
                                                Relids relids, int *nappinfos);
+extern void add_row_identity_var(PlannerInfo *root, Var *rowid_var,
+                                 Index rtindex, const char *rowid_name);
+extern void add_row_identity_columns(PlannerInfo *root, Index rtindex,
+                                     RangeTblEntry *target_rte,
+                                     Relation target_relation);
+extern void distribute_row_identity_vars(PlannerInfo *root);

 #endif                            /* APPENDINFO_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9673a4a638..d539bc2783 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -260,11 +260,11 @@ extern LockRowsPath *create_lockrows_path(PlannerInfo *root, RelOptInfo *rel,
                                           Path *subpath, List *rowMarks, int epqParam);
 extern ModifyTablePath *create_modifytable_path(PlannerInfo *root,
                                                 RelOptInfo *rel,
+                                                Path *subpath,
                                                 CmdType operation, bool canSetTag,
                                                 Index nominalRelation, Index rootRelation,
                                                 bool partColsUpdated,
-                                                List *resultRelations, List *subpaths,
-                                                List *subroots,
+                                                List *resultRelations,
                                                 List *updateColnosLists,
                                                 List *withCheckOptionLists, List *returningLists,
                                                 List *rowMarks, OnConflictExpr *onconflict,
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index f49196a4d3..b1c4065689 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -34,7 +34,7 @@ extern Relids get_relids_for_join(Query *query, int joinrelid);
 /*
  * prototypes for preptlist.c
  */
-extern List *preprocess_targetlist(PlannerInfo *root);
+extern void preprocess_targetlist(PlannerInfo *root);

 extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex);

diff --git a/src/include/rewrite/rewriteHandler.h b/src/include/rewrite/rewriteHandler.h
index 1fea1a4691..728a60c0b0 100644
--- a/src/include/rewrite/rewriteHandler.h
+++ b/src/include/rewrite/rewriteHandler.h
@@ -23,8 +23,6 @@ extern void AcquireRewriteLocks(Query *parsetree,
                                 bool forUpdatePushedDown);

 extern Node *build_column_default(Relation rel, int attrno);
-extern void rewriteTargetListUD(Query *parsetree, RangeTblEntry *target_rte,
-                                Relation target_relation);

 extern void fill_extraUpdatedCols(RangeTblEntry *target_rte,
                                   Relation target_relation);
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 94e43c3410..1c703c351f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -545,27 +545,25 @@ create table some_tab_child () inherits (some_tab);
 insert into some_tab_child values(1,2);
 explain (verbose, costs off)
 update some_tab set a = a + 1 where false;
-           QUERY PLAN
---------------------------------
+                       QUERY PLAN
+--------------------------------------------------------
  Update on public.some_tab
-   Update on public.some_tab
    ->  Result
-         Output: (a + 1), ctid
+         Output: (some_tab.a + 1), NULL::oid, NULL::tid
          One-Time Filter: false
-(5 rows)
+(4 rows)

 update some_tab set a = a + 1 where false;
 explain (verbose, costs off)
 update some_tab set a = a + 1 where false returning b, a;
-           QUERY PLAN
---------------------------------
+                       QUERY PLAN
+--------------------------------------------------------
  Update on public.some_tab
-   Output: b, a
-   Update on public.some_tab
+   Output: some_tab.b, some_tab.a
    ->  Result
-         Output: (a + 1), ctid
+         Output: (some_tab.a + 1), NULL::oid, NULL::tid
          One-Time Filter: false
-(6 rows)
+(5 rows)

 update some_tab set a = a + 1 where false returning b, a;
  b | a
@@ -670,7 +668,7 @@ explain update parted_tab set a = 2 where false;
                        QUERY PLAN
 --------------------------------------------------------
  Update on parted_tab  (cost=0.00..0.00 rows=0 width=0)
-   ->  Result  (cost=0.00..0.00 rows=0 width=0)
+   ->  Result  (cost=0.00..0.00 rows=0 width=10)
          One-Time Filter: false
 (3 rows)

diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index ff157ceb1c..73c0f3e04b 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -212,7 +212,7 @@ explain (costs off, format json) insert into insertconflicttest values (0, 'Bilb
        "Plans": [                                                      +
          {                                                             +
            "Node Type": "Result",                                      +
-           "Parent Relationship": "Member",                            +
+           "Parent Relationship": "Outer",                             +
            "Parallel Aware": false                                     +
          }                                                             +
        ]                                                               +
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 0057f41caa..27f7525b3e 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1926,37 +1926,27 @@ WHERE EXISTS (
     FROM int4_tbl,
          LATERAL (SELECT int4_tbl.f1 FROM int8_tbl LIMIT 2) ss
     WHERE prt1_l.c IS NULL);
-                          QUERY PLAN
----------------------------------------------------------------
+                        QUERY PLAN
+----------------------------------------------------------
  Delete on prt1_l
    Delete on prt1_l_p1 prt1_l_1
    Delete on prt1_l_p3_p1 prt1_l_2
    Delete on prt1_l_p3_p2 prt1_l_3
    ->  Nested Loop Semi Join
-         ->  Seq Scan on prt1_l_p1 prt1_l_1
-               Filter: (c IS NULL)
-         ->  Nested Loop
-               ->  Seq Scan on int4_tbl
-               ->  Subquery Scan on ss
-                     ->  Limit
-                           ->  Seq Scan on int8_tbl
-   ->  Nested Loop Semi Join
-         ->  Seq Scan on prt1_l_p3_p1 prt1_l_2
-               Filter: (c IS NULL)
-         ->  Nested Loop
-               ->  Seq Scan on int4_tbl
-               ->  Subquery Scan on ss_1
-                     ->  Limit
-                           ->  Seq Scan on int8_tbl int8_tbl_1
-   ->  Nested Loop Semi Join
-         ->  Seq Scan on prt1_l_p3_p2 prt1_l_3
-               Filter: (c IS NULL)
-         ->  Nested Loop
-               ->  Seq Scan on int4_tbl
-               ->  Subquery Scan on ss_2
-                     ->  Limit
-                           ->  Seq Scan on int8_tbl int8_tbl_2
-(28 rows)
+         ->  Append
+               ->  Seq Scan on prt1_l_p1 prt1_l_1
+                     Filter: (c IS NULL)
+               ->  Seq Scan on prt1_l_p3_p1 prt1_l_2
+                     Filter: (c IS NULL)
+               ->  Seq Scan on prt1_l_p3_p2 prt1_l_3
+                     Filter: (c IS NULL)
+         ->  Materialize
+               ->  Nested Loop
+                     ->  Seq Scan on int4_tbl
+                     ->  Subquery Scan on ss
+                           ->  Limit
+                                 ->  Seq Scan on int8_tbl
+(18 rows)

 --
 -- negative testcases
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index bde29e38a9..c4e827caec 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2463,74 +2463,43 @@ deallocate ab_q6;
 insert into ab values (1,2);
 explain (analyze, costs off, summary off, timing off)
 update ab_a1 set b = 3 from ab where ab.a = 1 and ab.a = ab_a1.a;
-                                     QUERY PLAN
--------------------------------------------------------------------------------------
+                                        QUERY PLAN
+-------------------------------------------------------------------------------------------
  Update on ab_a1 (actual rows=0 loops=1)
    Update on ab_a1_b1 ab_a1_1
    Update on ab_a1_b2 ab_a1_2
    Update on ab_a1_b3 ab_a1_3
-   ->  Nested Loop (actual rows=0 loops=1)
-         ->  Append (actual rows=1 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
-                     Recheck Cond: (a = 1)
-                     Heap Blocks: exact=1
-                     ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
-         ->  Materialize (actual rows=0 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_1 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
    ->  Nested Loop (actual rows=1 loops=1)
          ->  Append (actual rows=1 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
+               ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_1 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
                            Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
-                     Recheck Cond: (a = 1)
-                     Heap Blocks: exact=1
-                     ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-         ->  Materialize (actual rows=1 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b2 ab_a1_2 (actual rows=1 loops=1)
                      Recheck Cond: (a = 1)
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
                            Index Cond: (a = 1)
-   ->  Nested Loop (actual rows=0 loops=1)
-         ->  Append (actual rows=1 loops=1)
-               ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
-                     Recheck Cond: (a = 1)
-                     Heap Blocks: exact=1
-                     ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-               ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
-                     Recheck Cond: (a = 1)
-                     ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
-                           Index Cond: (a = 1)
-         ->  Materialize (actual rows=0 loops=1)
                ->  Bitmap Heap Scan on ab_a1_b3 ab_a1_3 (actual rows=0 loops=1)
                      Recheck Cond: (a = 1)
                      ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
                            Index Cond: (a = 1)
-(65 rows)
+         ->  Materialize (actual rows=1 loops=1)
+               ->  Append (actual rows=1 loops=1)
+                     ->  Bitmap Heap Scan on ab_a1_b1 ab_1 (actual rows=0 loops=1)
+                           Recheck Cond: (a = 1)
+                           ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                                 Index Cond: (a = 1)
+                     ->  Bitmap Heap Scan on ab_a1_b2 ab_2 (actual rows=1 loops=1)
+                           Recheck Cond: (a = 1)
+                           Heap Blocks: exact=1
+                           ->  Bitmap Index Scan on ab_a1_b2_a_idx (actual rows=1 loops=1)
+                                 Index Cond: (a = 1)
+                     ->  Bitmap Heap Scan on ab_a1_b3 ab_3 (actual rows=0 loops=1)
+                           Recheck Cond: (a = 1)
+                           ->  Bitmap Index Scan on ab_a1_b3_a_idx (actual rows=1 loops=1)
+                                 Index Cond: (a = 1)
+(34 rows)

 table ab;
  a | b
@@ -2551,29 +2520,12 @@ update ab_a1 set b = 3 from ab_a2 where ab_a2.b = (select 1);
    Update on ab_a1_b3 ab_a1_3
    InitPlan 1 (returns $0)
      ->  Result (actual rows=1 loops=1)
-   ->  Nested Loop (actual rows=1 loops=1)
-         ->  Seq Scan on ab_a1_b1 ab_a1_1 (actual rows=1 loops=1)
-         ->  Materialize (actual rows=1 loops=1)
-               ->  Append (actual rows=1 loops=1)
-                     ->  Seq Scan on ab_a2_b1 ab_a2_1 (actual rows=1 loops=1)
-                           Filter: (b = $0)
-                     ->  Seq Scan on ab_a2_b2 ab_a2_2 (never executed)
-                           Filter: (b = $0)
-                     ->  Seq Scan on ab_a2_b3 ab_a2_3 (never executed)
-                           Filter: (b = $0)
-   ->  Nested Loop (actual rows=1 loops=1)
-         ->  Seq Scan on ab_a1_b2 ab_a1_2 (actual rows=1 loops=1)
-         ->  Materialize (actual rows=1 loops=1)
-               ->  Append (actual rows=1 loops=1)
-                     ->  Seq Scan on ab_a2_b1 ab_a2_1 (actual rows=1 loops=1)
-                           Filter: (b = $0)
-                     ->  Seq Scan on ab_a2_b2 ab_a2_2 (never executed)
-                           Filter: (b = $0)
-                     ->  Seq Scan on ab_a2_b3 ab_a2_3 (never executed)
-                           Filter: (b = $0)
-   ->  Nested Loop (actual rows=1 loops=1)
-         ->  Seq Scan on ab_a1_b3 ab_a1_3 (actual rows=1 loops=1)
-         ->  Materialize (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=3 loops=1)
+         ->  Append (actual rows=3 loops=1)
+               ->  Seq Scan on ab_a1_b1 ab_a1_1 (actual rows=1 loops=1)
+               ->  Seq Scan on ab_a1_b2 ab_a1_2 (actual rows=1 loops=1)
+               ->  Seq Scan on ab_a1_b3 ab_a1_3 (actual rows=1 loops=1)
+         ->  Materialize (actual rows=1 loops=3)
                ->  Append (actual rows=1 loops=1)
                      ->  Seq Scan on ab_a2_b1 ab_a2_1 (actual rows=1 loops=1)
                            Filter: (b = $0)
@@ -2581,7 +2533,7 @@ update ab_a1 set b = 3 from ab_a2 where ab_a2.b = (select 1);
                            Filter: (b = $0)
                      ->  Seq Scan on ab_a2_b3 ab_a2_3 (never executed)
                            Filter: (b = $0)
-(36 rows)
+(19 rows)

 select tableoid::regclass, * from ab;
  tableoid | a | b
@@ -3420,28 +3372,30 @@ explain (costs off) select * from pp_lp where a = 1;
 (5 rows)

 explain (costs off) update pp_lp set value = 10 where a = 1;
-            QUERY PLAN
-----------------------------------
+               QUERY PLAN
+----------------------------------------
  Update on pp_lp
    Update on pp_lp1 pp_lp_1
    Update on pp_lp2 pp_lp_2
-   ->  Seq Scan on pp_lp1 pp_lp_1
-         Filter: (a = 1)
-   ->  Seq Scan on pp_lp2 pp_lp_2
-         Filter: (a = 1)
-(7 rows)
+   ->  Append
+         ->  Seq Scan on pp_lp1 pp_lp_1
+               Filter: (a = 1)
+         ->  Seq Scan on pp_lp2 pp_lp_2
+               Filter: (a = 1)
+(8 rows)

 explain (costs off) delete from pp_lp where a = 1;
-            QUERY PLAN
-----------------------------------
+               QUERY PLAN
+----------------------------------------
  Delete on pp_lp
    Delete on pp_lp1 pp_lp_1
    Delete on pp_lp2 pp_lp_2
-   ->  Seq Scan on pp_lp1 pp_lp_1
-         Filter: (a = 1)
-   ->  Seq Scan on pp_lp2 pp_lp_2
-         Filter: (a = 1)
-(7 rows)
+   ->  Append
+         ->  Seq Scan on pp_lp1 pp_lp_1
+               Filter: (a = 1)
+         ->  Seq Scan on pp_lp2 pp_lp_2
+               Filter: (a = 1)
+(8 rows)

 set constraint_exclusion = 'off'; -- this should not affect the result.
 explain (costs off) select * from pp_lp where a = 1;
@@ -3455,28 +3409,30 @@ explain (costs off) select * from pp_lp where a = 1;
 (5 rows)

 explain (costs off) update pp_lp set value = 10 where a = 1;
-            QUERY PLAN
-----------------------------------
+               QUERY PLAN
+----------------------------------------
  Update on pp_lp
    Update on pp_lp1 pp_lp_1
    Update on pp_lp2 pp_lp_2
-   ->  Seq Scan on pp_lp1 pp_lp_1
-         Filter: (a = 1)
-   ->  Seq Scan on pp_lp2 pp_lp_2
-         Filter: (a = 1)
-(7 rows)
+   ->  Append
+         ->  Seq Scan on pp_lp1 pp_lp_1
+               Filter: (a = 1)
+         ->  Seq Scan on pp_lp2 pp_lp_2
+               Filter: (a = 1)
+(8 rows)

 explain (costs off) delete from pp_lp where a = 1;
-            QUERY PLAN
-----------------------------------
+               QUERY PLAN
+----------------------------------------
  Delete on pp_lp
    Delete on pp_lp1 pp_lp_1
    Delete on pp_lp2 pp_lp_2
-   ->  Seq Scan on pp_lp1 pp_lp_1
-         Filter: (a = 1)
-   ->  Seq Scan on pp_lp2 pp_lp_2
-         Filter: (a = 1)
-(7 rows)
+   ->  Append
+         ->  Seq Scan on pp_lp1 pp_lp_1
+               Filter: (a = 1)
+         ->  Seq Scan on pp_lp2 pp_lp_2
+               Filter: (a = 1)
+(8 rows)

 drop table pp_lp;
 -- Ensure enable_partition_prune does not affect non-partitioned tables.
@@ -3500,28 +3456,31 @@ explain (costs off) select * from inh_lp where a = 1;
 (5 rows)

 explain (costs off) update inh_lp set value = 10 where a = 1;
-             QUERY PLAN
-------------------------------------
+                   QUERY PLAN
+------------------------------------------------
  Update on inh_lp
-   Update on inh_lp
-   Update on inh_lp1 inh_lp_1
-   ->  Seq Scan on inh_lp
-         Filter: (a = 1)
-   ->  Seq Scan on inh_lp1 inh_lp_1
-         Filter: (a = 1)
-(7 rows)
+   Update on inh_lp inh_lp_1
+   Update on inh_lp1 inh_lp_2
+   ->  Result
+         ->  Append
+               ->  Seq Scan on inh_lp inh_lp_1
+                     Filter: (a = 1)
+               ->  Seq Scan on inh_lp1 inh_lp_2
+                     Filter: (a = 1)
+(9 rows)

 explain (costs off) delete from inh_lp where a = 1;
-             QUERY PLAN
-------------------------------------
+                QUERY PLAN
+------------------------------------------
  Delete on inh_lp
-   Delete on inh_lp
-   Delete on inh_lp1 inh_lp_1
-   ->  Seq Scan on inh_lp
-         Filter: (a = 1)
-   ->  Seq Scan on inh_lp1 inh_lp_1
-         Filter: (a = 1)
-(7 rows)
+   Delete on inh_lp inh_lp_1
+   Delete on inh_lp1 inh_lp_2
+   ->  Append
+         ->  Seq Scan on inh_lp inh_lp_1
+               Filter: (a = 1)
+         ->  Seq Scan on inh_lp1 inh_lp_2
+               Filter: (a = 1)
+(8 rows)

 -- Ensure we don't exclude normal relations when we only expect to exclude
 -- inheritance children
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
index 9506aaef82..b02a682471 100644
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -1632,19 +1632,21 @@ EXPLAIN (COSTS OFF) EXECUTE p2(2);
 --
 SET SESSION AUTHORIZATION regress_rls_bob;
 EXPLAIN (COSTS OFF) UPDATE t1 SET b = b || b WHERE f_leak(b);
-                  QUERY PLAN
------------------------------------------------
+                        QUERY PLAN
+-----------------------------------------------------------
  Update on t1
-   Update on t1
-   Update on t2 t1_1
-   Update on t3 t1_2
-   ->  Seq Scan on t1
-         Filter: (((a % 2) = 0) AND f_leak(b))
-   ->  Seq Scan on t2 t1_1
-         Filter: (((a % 2) = 0) AND f_leak(b))
-   ->  Seq Scan on t3 t1_2
-         Filter: (((a % 2) = 0) AND f_leak(b))
-(10 rows)
+   Update on t1 t1_1
+   Update on t2 t1_2
+   Update on t3 t1_3
+   ->  Result
+         ->  Append
+               ->  Seq Scan on t1 t1_1
+                     Filter: (((a % 2) = 0) AND f_leak(b))
+               ->  Seq Scan on t2 t1_2
+                     Filter: (((a % 2) = 0) AND f_leak(b))
+               ->  Seq Scan on t3 t1_3
+                     Filter: (((a % 2) = 0) AND f_leak(b))
+(12 rows)

 UPDATE t1 SET b = b || b WHERE f_leak(b);
 NOTICE:  f_leak => bbb
@@ -1722,31 +1724,27 @@ NOTICE:  f_leak => cde
 NOTICE:  f_leak => yyyyyy
 EXPLAIN (COSTS OFF) UPDATE t1 SET b=t1.b FROM t2
 WHERE t1.a = 3 and t2.a = 3 AND f_leak(t1.b) AND f_leak(t2.b);
-                           QUERY PLAN
------------------------------------------------------------------
+                              QUERY PLAN
+-----------------------------------------------------------------------
  Update on t1
-   Update on t1
-   Update on t2 t1_1
-   Update on t3 t1_2
-   ->  Nested Loop
-         ->  Seq Scan on t1
-               Filter: ((a = 3) AND ((a % 2) = 0) AND f_leak(b))
-         ->  Seq Scan on t2
-               Filter: ((a = 3) AND ((a % 2) = 1) AND f_leak(b))
-   ->  Nested Loop
-         ->  Seq Scan on t2 t1_1
-               Filter: ((a = 3) AND ((a % 2) = 0) AND f_leak(b))
-         ->  Seq Scan on t2
-               Filter: ((a = 3) AND ((a % 2) = 1) AND f_leak(b))
+   Update on t1 t1_1
+   Update on t2 t1_2
+   Update on t3 t1_3
    ->  Nested Loop
-         ->  Seq Scan on t3 t1_2
-               Filter: ((a = 3) AND ((a % 2) = 0) AND f_leak(b))
          ->  Seq Scan on t2
                Filter: ((a = 3) AND ((a % 2) = 1) AND f_leak(b))
-(19 rows)
+         ->  Append
+               ->  Seq Scan on t1 t1_1
+                     Filter: ((a = 3) AND ((a % 2) = 0) AND f_leak(b))
+               ->  Seq Scan on t2 t1_2
+                     Filter: ((a = 3) AND ((a % 2) = 0) AND f_leak(b))
+               ->  Seq Scan on t3 t1_3
+                     Filter: ((a = 3) AND ((a % 2) = 0) AND f_leak(b))
+(14 rows)

 UPDATE t1 SET b=t1.b FROM t2
 WHERE t1.a = 3 and t2.a = 3 AND f_leak(t1.b) AND f_leak(t2.b);
+NOTICE:  f_leak => cde
 EXPLAIN (COSTS OFF) UPDATE t2 SET b=t2.b FROM t1
 WHERE t1.a = 3 and t2.a = 3 AND f_leak(t1.b) AND f_leak(t2.b);
                               QUERY PLAN
@@ -1795,46 +1793,30 @@ NOTICE:  f_leak => cde
 EXPLAIN (COSTS OFF) UPDATE t1 t1_1 SET b = t1_2.b FROM t1 t1_2
 WHERE t1_1.a = 4 AND t1_2.a = t1_1.a AND t1_2.b = t1_1.b
 AND f_leak(t1_1.b) AND f_leak(t1_2.b) RETURNING *, t1_1, t1_2;
-                              QUERY PLAN
------------------------------------------------------------------------
+                                 QUERY PLAN
+-----------------------------------------------------------------------------
  Update on t1 t1_1
-   Update on t1 t1_1
-   Update on t2 t1_1_1
-   Update on t3 t1_1_2
+   Update on t1 t1_1_1
+   Update on t2 t1_1_2
+   Update on t3 t1_1_3
    ->  Nested Loop
          Join Filter: (t1_1.b = t1_2.b)
-         ->  Seq Scan on t1 t1_1
-               Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-         ->  Append
-               ->  Seq Scan on t1 t1_2_1
-                     Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-               ->  Seq Scan on t2 t1_2_2
-                     Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-               ->  Seq Scan on t3 t1_2_3
-                     Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-   ->  Nested Loop
-         Join Filter: (t1_1_1.b = t1_2.b)
-         ->  Seq Scan on t2 t1_1_1
-               Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
          ->  Append
-               ->  Seq Scan on t1 t1_2_1
-                     Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-               ->  Seq Scan on t2 t1_2_2
-                     Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-               ->  Seq Scan on t3 t1_2_3
+               ->  Seq Scan on t1 t1_1_1
                      Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-   ->  Nested Loop
-         Join Filter: (t1_1_2.b = t1_2.b)
-         ->  Seq Scan on t3 t1_1_2
-               Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-         ->  Append
-               ->  Seq Scan on t1 t1_2_1
+               ->  Seq Scan on t2 t1_1_2
                      Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-               ->  Seq Scan on t2 t1_2_2
+               ->  Seq Scan on t3 t1_1_3
                      Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-               ->  Seq Scan on t3 t1_2_3
-                     Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
-(37 rows)
+         ->  Materialize
+               ->  Append
+                     ->  Seq Scan on t1 t1_2_1
+                           Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
+                     ->  Seq Scan on t2 t1_2_2
+                           Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
+                     ->  Seq Scan on t3 t1_2_3
+                           Filter: ((a = 4) AND ((a % 2) = 0) AND f_leak(b))
+(21 rows)

 UPDATE t1 t1_1 SET b = t1_2.b FROM t1 t1_2
 WHERE t1_1.a = 4 AND t1_2.a = t1_1.a AND t1_2.b = t1_1.b
@@ -1842,8 +1824,6 @@ AND f_leak(t1_1.b) AND f_leak(t1_2.b) RETURNING *, t1_1, t1_2;
 NOTICE:  f_leak => daddad_updt
 NOTICE:  f_leak => daddad_updt
 NOTICE:  f_leak => defdef
-NOTICE:  f_leak => defdef
-NOTICE:  f_leak => daddad_updt
 NOTICE:  f_leak => defdef
  id  | a |      b      | id  | a |      b      |        t1_1         |        t1_2
 -----+---+-------------+-----+---+-------------+---------------------+---------------------
@@ -1880,19 +1860,20 @@ EXPLAIN (COSTS OFF) DELETE FROM only t1 WHERE f_leak(b);
 (3 rows)

 EXPLAIN (COSTS OFF) DELETE FROM t1 WHERE f_leak(b);
-                  QUERY PLAN
------------------------------------------------
+                     QUERY PLAN
+-----------------------------------------------------
  Delete on t1
-   Delete on t1
-   Delete on t2 t1_1
-   Delete on t3 t1_2
-   ->  Seq Scan on t1
-         Filter: (((a % 2) = 0) AND f_leak(b))
-   ->  Seq Scan on t2 t1_1
-         Filter: (((a % 2) = 0) AND f_leak(b))
-   ->  Seq Scan on t3 t1_2
-         Filter: (((a % 2) = 0) AND f_leak(b))
-(10 rows)
+   Delete on t1 t1_1
+   Delete on t2 t1_2
+   Delete on t3 t1_3
+   ->  Append
+         ->  Seq Scan on t1 t1_1
+               Filter: (((a % 2) = 0) AND f_leak(b))
+         ->  Seq Scan on t2 t1_2
+               Filter: (((a % 2) = 0) AND f_leak(b))
+         ->  Seq Scan on t3 t1_3
+               Filter: (((a % 2) = 0) AND f_leak(b))
+(11 rows)

 DELETE FROM only t1 WHERE f_leak(b) RETURNING tableoid::regclass, *, t1;
 NOTICE:  f_leak => bbbbbb_updt
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 770eab38b5..cdff914b93 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -1607,26 +1607,21 @@ UPDATE rw_view1 SET a = a + 1000 FROM other_tbl_parent WHERE a = id;
                                QUERY PLAN
 -------------------------------------------------------------------------
  Update on base_tbl_parent
-   Update on base_tbl_parent
-   Update on base_tbl_child base_tbl_parent_1
-   ->  Hash Join
-         Hash Cond: (other_tbl_parent.id = base_tbl_parent.a)
-         ->  Append
-               ->  Seq Scan on other_tbl_parent other_tbl_parent_1
-               ->  Seq Scan on other_tbl_child other_tbl_parent_2
-         ->  Hash
-               ->  Seq Scan on base_tbl_parent
+   Update on base_tbl_parent base_tbl_parent_1
+   Update on base_tbl_child base_tbl_parent_2
    ->  Merge Join
-         Merge Cond: (base_tbl_parent_1.a = other_tbl_parent.id)
+         Merge Cond: (base_tbl_parent.a = other_tbl_parent.id)
          ->  Sort
-               Sort Key: base_tbl_parent_1.a
-               ->  Seq Scan on base_tbl_child base_tbl_parent_1
+               Sort Key: base_tbl_parent.a
+               ->  Append
+                     ->  Seq Scan on base_tbl_parent base_tbl_parent_1
+                     ->  Seq Scan on base_tbl_child base_tbl_parent_2
          ->  Sort
                Sort Key: other_tbl_parent.id
                ->  Append
                      ->  Seq Scan on other_tbl_parent other_tbl_parent_1
                      ->  Seq Scan on other_tbl_child other_tbl_parent_2
-(20 rows)
+(15 rows)

 UPDATE rw_view1 SET a = a + 1000 FROM other_tbl_parent WHERE a = id;
 SELECT * FROM ONLY base_tbl_parent ORDER BY a;
@@ -2332,36 +2327,39 @@ SELECT * FROM v1 WHERE a=8;

 EXPLAIN (VERBOSE, COSTS OFF)
 UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
-                                       QUERY PLAN
------------------------------------------------------------------------------------------
+                                             QUERY PLAN
+-----------------------------------------------------------------------------------------------------
  Update on public.t1
-   Update on public.t1
-   Update on public.t11 t1_1
-   Update on public.t12 t1_2
-   Update on public.t111 t1_3
-   ->  Index Scan using t1_a_idx on public.t1
-         Output: 100, t1.ctid
-         Index Cond: ((t1.a > 5) AND (t1.a < 7))
-         Filter: ((t1.a <> 6) AND (SubPlan 1) AND snoop(t1.a) AND leakproof(t1.a))
-         SubPlan 1
-           ->  Append
-                 ->  Seq Scan on public.t12 t12_1
-                       Filter: (t12_1.a = t1.a)
-                 ->  Seq Scan on public.t111 t12_2
-                       Filter: (t12_2.a = t1.a)
-   ->  Index Scan using t11_a_idx on public.t11 t1_1
-         Output: 100, t1_1.ctid
-         Index Cond: ((t1_1.a > 5) AND (t1_1.a < 7))
-         Filter: ((t1_1.a <> 6) AND (SubPlan 1) AND snoop(t1_1.a) AND leakproof(t1_1.a))
-   ->  Index Scan using t12_a_idx on public.t12 t1_2
-         Output: 100, t1_2.ctid
-         Index Cond: ((t1_2.a > 5) AND (t1_2.a < 7))
-         Filter: ((t1_2.a <> 6) AND (SubPlan 1) AND snoop(t1_2.a) AND leakproof(t1_2.a))
-   ->  Index Scan using t111_a_idx on public.t111 t1_3
-         Output: 100, t1_3.ctid
-         Index Cond: ((t1_3.a > 5) AND (t1_3.a < 7))
-         Filter: ((t1_3.a <> 6) AND (SubPlan 1) AND snoop(t1_3.a) AND leakproof(t1_3.a))
-(27 rows)
+   Update on public.t1 t1_1
+   Update on public.t11 t1_2
+   Update on public.t12 t1_3
+   Update on public.t111 t1_4
+   ->  Result
+         Output: 100, t1.tableoid, t1.ctid
+         ->  Append
+               ->  Index Scan using t1_a_idx on public.t1 t1_1
+                     Output: t1_1.tableoid, t1_1.ctid
+                     Index Cond: ((t1_1.a > 5) AND (t1_1.a < 7))
+                     Filter: ((t1_1.a <> 6) AND (SubPlan 1) AND snoop(t1_1.a) AND leakproof(t1_1.a))
+                     SubPlan 1
+                       ->  Append
+                             ->  Seq Scan on public.t12 t12_1
+                                   Filter: (t12_1.a = t1_1.a)
+                             ->  Seq Scan on public.t111 t12_2
+                                   Filter: (t12_2.a = t1_1.a)
+               ->  Index Scan using t11_a_idx on public.t11 t1_2
+                     Output: t1_2.tableoid, t1_2.ctid
+                     Index Cond: ((t1_2.a > 5) AND (t1_2.a < 7))
+                     Filter: ((t1_2.a <> 6) AND (SubPlan 1) AND snoop(t1_2.a) AND leakproof(t1_2.a))
+               ->  Index Scan using t12_a_idx on public.t12 t1_3
+                     Output: t1_3.tableoid, t1_3.ctid
+                     Index Cond: ((t1_3.a > 5) AND (t1_3.a < 7))
+                     Filter: ((t1_3.a <> 6) AND (SubPlan 1) AND snoop(t1_3.a) AND leakproof(t1_3.a))
+               ->  Index Scan using t111_a_idx on public.t111 t1_4
+                     Output: t1_4.tableoid, t1_4.ctid
+                     Index Cond: ((t1_4.a > 5) AND (t1_4.a < 7))
+                     Filter: ((t1_4.a <> 6) AND (SubPlan 1) AND snoop(t1_4.a) AND leakproof(t1_4.a))
+(30 rows)

 UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a < 7 AND a != 6;
 SELECT * FROM v1 WHERE a=100; -- Nothing should have been changed to 100
@@ -2376,36 +2374,39 @@ SELECT * FROM t1 WHERE a=100; -- Nothing should have been changed to 100

 EXPLAIN (VERBOSE, COSTS OFF)
 UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
-                              QUERY PLAN
------------------------------------------------------------------------
+                                    QUERY PLAN
+-----------------------------------------------------------------------------------
  Update on public.t1
-   Update on public.t1
-   Update on public.t11 t1_1
-   Update on public.t12 t1_2
-   Update on public.t111 t1_3
-   ->  Index Scan using t1_a_idx on public.t1
-         Output: (t1.a + 1), t1.ctid
-         Index Cond: ((t1.a > 5) AND (t1.a = 8))
-         Filter: ((SubPlan 1) AND snoop(t1.a) AND leakproof(t1.a))
-         SubPlan 1
-           ->  Append
-                 ->  Seq Scan on public.t12 t12_1
-                       Filter: (t12_1.a = t1.a)
-                 ->  Seq Scan on public.t111 t12_2
-                       Filter: (t12_2.a = t1.a)
-   ->  Index Scan using t11_a_idx on public.t11 t1_1
-         Output: (t1_1.a + 1), t1_1.ctid
-         Index Cond: ((t1_1.a > 5) AND (t1_1.a = 8))
-         Filter: ((SubPlan 1) AND snoop(t1_1.a) AND leakproof(t1_1.a))
-   ->  Index Scan using t12_a_idx on public.t12 t1_2
-         Output: (t1_2.a + 1), t1_2.ctid
-         Index Cond: ((t1_2.a > 5) AND (t1_2.a = 8))
-         Filter: ((SubPlan 1) AND snoop(t1_2.a) AND leakproof(t1_2.a))
-   ->  Index Scan using t111_a_idx on public.t111 t1_3
-         Output: (t1_3.a + 1), t1_3.ctid
-         Index Cond: ((t1_3.a > 5) AND (t1_3.a = 8))
-         Filter: ((SubPlan 1) AND snoop(t1_3.a) AND leakproof(t1_3.a))
-(27 rows)
+   Update on public.t1 t1_1
+   Update on public.t11 t1_2
+   Update on public.t12 t1_3
+   Update on public.t111 t1_4
+   ->  Result
+         Output: (t1.a + 1), t1.tableoid, t1.ctid
+         ->  Append
+               ->  Index Scan using t1_a_idx on public.t1 t1_1
+                     Output: t1_1.a, t1_1.tableoid, t1_1.ctid
+                     Index Cond: ((t1_1.a > 5) AND (t1_1.a = 8))
+                     Filter: ((SubPlan 1) AND snoop(t1_1.a) AND leakproof(t1_1.a))
+                     SubPlan 1
+                       ->  Append
+                             ->  Seq Scan on public.t12 t12_1
+                                   Filter: (t12_1.a = t1_1.a)
+                             ->  Seq Scan on public.t111 t12_2
+                                   Filter: (t12_2.a = t1_1.a)
+               ->  Index Scan using t11_a_idx on public.t11 t1_2
+                     Output: t1_2.a, t1_2.tableoid, t1_2.ctid
+                     Index Cond: ((t1_2.a > 5) AND (t1_2.a = 8))
+                     Filter: ((SubPlan 1) AND snoop(t1_2.a) AND leakproof(t1_2.a))
+               ->  Index Scan using t12_a_idx on public.t12 t1_3
+                     Output: t1_3.a, t1_3.tableoid, t1_3.ctid
+                     Index Cond: ((t1_3.a > 5) AND (t1_3.a = 8))
+                     Filter: ((SubPlan 1) AND snoop(t1_3.a) AND leakproof(t1_3.a))
+               ->  Index Scan using t111_a_idx on public.t111 t1_4
+                     Output: t1_4.a, t1_4.tableoid, t1_4.ctid
+                     Index Cond: ((t1_4.a > 5) AND (t1_4.a = 8))
+                     Filter: ((SubPlan 1) AND snoop(t1_4.a) AND leakproof(t1_4.a))
+(30 rows)

 UPDATE v1 SET a=a+1 WHERE snoop(a) AND leakproof(a) AND a = 8;
 NOTICE:  snooped value: 8
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index dece036069..dc34ac67b3 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -308,8 +308,8 @@ ALTER TABLE part_b_10_b_20 ATTACH PARTITION part_c_1_100 FOR VALUES FROM (1) TO

 -- The order of subplans should be in bound order
 EXPLAIN (costs off) UPDATE range_parted set c = c - 50 WHERE c > 97;
-                   QUERY PLAN
--------------------------------------------------
+                      QUERY PLAN
+-------------------------------------------------------
  Update on range_parted
    Update on part_a_1_a_10 range_parted_1
    Update on part_a_10_a_20 range_parted_2
@@ -318,21 +318,22 @@ EXPLAIN (costs off) UPDATE range_parted set c = c - 50 WHERE c > 97;
    Update on part_d_1_15 range_parted_5
    Update on part_d_15_20 range_parted_6
    Update on part_b_20_b_30 range_parted_7
-   ->  Seq Scan on part_a_1_a_10 range_parted_1
-         Filter: (c > '97'::numeric)
-   ->  Seq Scan on part_a_10_a_20 range_parted_2
-         Filter: (c > '97'::numeric)
-   ->  Seq Scan on part_b_1_b_10 range_parted_3
-         Filter: (c > '97'::numeric)
-   ->  Seq Scan on part_c_1_100 range_parted_4
-         Filter: (c > '97'::numeric)
-   ->  Seq Scan on part_d_1_15 range_parted_5
-         Filter: (c > '97'::numeric)
-   ->  Seq Scan on part_d_15_20 range_parted_6
-         Filter: (c > '97'::numeric)
-   ->  Seq Scan on part_b_20_b_30 range_parted_7
-         Filter: (c > '97'::numeric)
-(22 rows)
+   ->  Append
+         ->  Seq Scan on part_a_1_a_10 range_parted_1
+               Filter: (c > '97'::numeric)
+         ->  Seq Scan on part_a_10_a_20 range_parted_2
+               Filter: (c > '97'::numeric)
+         ->  Seq Scan on part_b_1_b_10 range_parted_3
+               Filter: (c > '97'::numeric)
+         ->  Seq Scan on part_c_1_100 range_parted_4
+               Filter: (c > '97'::numeric)
+         ->  Seq Scan on part_d_1_15 range_parted_5
+               Filter: (c > '97'::numeric)
+         ->  Seq Scan on part_d_15_20 range_parted_6
+               Filter: (c > '97'::numeric)
+         ->  Seq Scan on part_b_20_b_30 range_parted_7
+               Filter: (c > '97'::numeric)
+(23 rows)

 -- fail, row movement happens only within the partition subtree.
 UPDATE part_c_100_200 set c = c - 20, d = c WHERE c = 105;
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 9a6b716ddc..0affacc191 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -2906,47 +2906,35 @@ SELECT * FROM parent;
 EXPLAIN (VERBOSE, COSTS OFF)
 WITH wcte AS ( INSERT INTO int8_tbl VALUES ( 42, 47 ) RETURNING q2 )
 DELETE FROM a USING wcte WHERE aa = q2;
-                     QUERY PLAN
-----------------------------------------------------
+                         QUERY PLAN
+------------------------------------------------------------
  Delete on public.a
-   Delete on public.a
-   Delete on public.b a_1
-   Delete on public.c a_2
-   Delete on public.d a_3
+   Delete on public.a a_1
+   Delete on public.b a_2
+   Delete on public.c a_3
+   Delete on public.d a_4
    CTE wcte
      ->  Insert on public.int8_tbl
            Output: int8_tbl.q2
            ->  Result
                  Output: '42'::bigint, '47'::bigint
-   ->  Nested Loop
-         Output: a.ctid, wcte.*
-         Join Filter: (a.aa = wcte.q2)
-         ->  Seq Scan on public.a
-               Output: a.ctid, a.aa
-         ->  CTE Scan on wcte
+   ->  Hash Join
+         Output: wcte.*, a.tableoid, a.ctid
+         Hash Cond: (a.aa = wcte.q2)
+         ->  Append
+               ->  Seq Scan on public.a a_1
+                     Output: a_1.aa, a_1.tableoid, a_1.ctid
+               ->  Seq Scan on public.b a_2
+                     Output: a_2.aa, a_2.tableoid, a_2.ctid
+               ->  Seq Scan on public.c a_3
+                     Output: a_3.aa, a_3.tableoid, a_3.ctid
+               ->  Seq Scan on public.d a_4
+                     Output: a_4.aa, a_4.tableoid, a_4.ctid
+         ->  Hash
                Output: wcte.*, wcte.q2
-   ->  Nested Loop
-         Output: a_1.ctid, wcte.*
-         Join Filter: (a_1.aa = wcte.q2)
-         ->  Seq Scan on public.b a_1
-               Output: a_1.ctid, a_1.aa
-         ->  CTE Scan on wcte
-               Output: wcte.*, wcte.q2
-   ->  Nested Loop
-         Output: a_2.ctid, wcte.*
-         Join Filter: (a_2.aa = wcte.q2)
-         ->  Seq Scan on public.c a_2
-               Output: a_2.ctid, a_2.aa
-         ->  CTE Scan on wcte
-               Output: wcte.*, wcte.q2
-   ->  Nested Loop
-         Output: a_3.ctid, wcte.*
-         Join Filter: (a_3.aa = wcte.q2)
-         ->  Seq Scan on public.d a_3
-               Output: a_3.ctid, a_3.aa
-         ->  CTE Scan on wcte
-               Output: wcte.*, wcte.q2
-(38 rows)
+               ->  CTE Scan on wcte
+                     Output: wcte.*, wcte.q2
+(26 rows)

 -- error cases
 -- data-modifying WITH tries to use its own output

Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
I wrote:
> I've not made any attempt to do performance testing on this,
> but I think that's about the only thing standing between us
> and committing this thing.

I think the main gating condition for committing this is "does it
make things worse for simple non-partitioned updates?".  The need
for an extra tuple fetch per row certainly makes it seem like there
could be a slowdown there.  However, in some tests here with current
HEAD (54bb91c30), I concur with Amit's findings that there's barely
any slowdown in that case.  I re-did Heikki's worst-case example [1]
of updating both columns of a 2-column table.  I also tried variants
of that involving updating two columns of a 6-column table and of a
10-column table, figuring that those cases might be a bit more
representative of typical usage (see attached scripts).  What I got
was

Times in ms, for the median of 3 runs:

Table width    HEAD    patch    HEAD    patch
        -- jit on ---    -- jit off --

2 columns    12528    13329    12574    12922
6 columns    15861    15862    14775    15249
10 columns    18399    16985    16994    16907

So even with the worst case, it's not too bad, just a few percent
worse, and once you get to a reasonable number of columns the v13
patch is starting to win.

However, I then tried a partitioned equivalent of the 6-column case
(script also attached), and it looks like

6 columns    16551    19097    15637    18201

which is really noticeably worse, 16% or so.  I poked at it with
"perf" to see if there were any clear bottlenecks, and didn't find
a smoking gun.  As best I can tell, the extra overhead must come
from the fact that all the tuples are now passing through an Append
node that's not there in the old-style plan tree.  I find this
surprising, because a (non-parallelized) Append doesn't really *do*
much; it certainly doesn't add any data copying or the like.
Maybe it's not so much the Append as that the rules for what kind of
tuple slot can be used have changed somehow?  Andres would have a
clearer idea about that than I do.

Anyway, I'm satisfied that this patch isn't likely to seriously
hurt non-partitioned cases.  There may be some micro-optimization
that could help simple partitioned cases, though.

This leaves us with a question whether to commit this patch now or
delay it till we have a better grip on why cases like this one are
slower.  I'm inclined to think that since there are a lot of clear
wins for users of partitioning, we shouldn't let the fact that there
are also some losses block committing.  But it's a judgment call.

            regards, tom lane

[1] https://www.postgresql.org/message-id/2e50d782-36f9-e723-0c4b-d133e63c6127%40iki.fi

drop table if exists tab;
create unlogged table tab (a int4, b int4, f3 int, f4 int, f5 text, f6 float8);
\timing on
insert into tab select g, g, g, g, g::text, g from generate_series(1, 10000000) g;
vacuum tab;
explain verbose update tab set b = b, f6 = f6;
update tab set b = b, f6 = f6;
drop table tab;
create unlogged table tab (a int4, b int4, f3 int, f4 int, f5 text, f6 float8);
\timing on
insert into tab select g, g, g, g, g::text, g from generate_series(1, 10000000) g;
vacuum tab;
explain update tab set b = b, a = a;
update tab set b = b, a = a;
drop table if exists tab;
create unlogged table tab (a int4, b int4, f3 int, f4 int, f5 text, f6 float8)
partition by range(a);
do $$
begin
for i in 0..9 loop
  execute 'create unlogged table tab'||i||' partition of tab for values from
 ('||i*1000000||') to ('||(i+1)*1000000||');';
end loop;
end$$;

\timing on
insert into tab select g, g, g, g, g::text, g from generate_series(1, 10000000-1) g;
vacuum tab;
explain verbose update tab set b = b, f6 = f6;
update tab set b = b, f6 = f6;

Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
I wrote:
> ... I also tried variants
> of that involving updating two columns of a 6-column table and of a
> 10-column table, figuring that those cases might be a bit more
> representative of typical usage (see attached scripts).

Argh, I managed to attach the wrong file for the 10-column test
case.  For the archives' sake, here's the right one.

            regards, tom lane

drop table if exists tab;
create unlogged table tab (a int4, b int4, f3 int, f4 int, f5 text,
  f6 float8, f7 int, f8 int, f9 int, f10 int);
\timing on
insert into tab select g, g, g, g, g::text,
  g, g, g, g, g
  from generate_series(1, 10000000) g;
vacuum tab;
explain verbose update tab set b = b, f6 = f6;
update tab set b = b, f6 = f6;

Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
I wrote:
> However, I then tried a partitioned equivalent of the 6-column case
> (script also attached), and it looks like
> 6 columns    16551    19097    15637    18201
> which is really noticeably worse, 16% or so.

... and on the third hand, that might just be some weird compiler-
and platform-specific artifact.

Using the exact same compiler (RHEL8's gcc 8.3.1) on a different
x86_64 machine, I measure the same case as about 7% slowdown not
16%.  That's still not great, but it calls the original measurement
into question, for sure.

Using Apple's clang 12.0.0 on an M1 mini, the patch actually clocks
in a couple percent *faster* than HEAD, for both the partitioned and
unpartitioned 6-column test cases.

So I'm not sure what to make of these results, but my level of concern
is less than it was earlier today.  I might've just gotten trapped by
the usual bugaboo of micro-benchmarking, ie putting too much stock in
only one test case.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Wed, Mar 31, 2021 at 7:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > However, I then tried a partitioned equivalent of the 6-column case
> > (script also attached), and it looks like
> > 6 columns     16551   19097   15637   18201
> > which is really noticeably worse, 16% or so.
>
> ... and on the third hand, that might just be some weird compiler-
> and platform-specific artifact.
>
> Using the exact same compiler (RHEL8's gcc 8.3.1) on a different
> x86_64 machine, I measure the same case as about 7% slowdown not
> 16%.  That's still not great, but it calls the original measurement
> into question, for sure.
>
> Using Apple's clang 12.0.0 on an M1 mini, the patch actually clocks
> in a couple percent *faster* than HEAD, for both the partitioned and
> unpartitioned 6-column test cases.
>
> So I'm not sure what to make of these results, but my level of concern
> is less than it was earlier today.  I might've just gotten trapped by
> the usual bugaboo of micro-benchmarking, ie putting too much stock in
> only one test case.

FWIW, I ran the scripts you shared and see the following (median of 3
runs) times in ms for the UPDATE in each script:

heikki6.sql

master: 19139 (jit=off) 18404 (jit=on)
patched: 20202 (jit=off) 19290 (jit=on)

hekki10.sql

master: 21686 (jit=off) 21435 (jit=on)
patched: 20953 (jit=off) 20161 (jit=on)

Patch shows a win for 10 columns here.

part6.sql

master: 20321 (jit=off) 19580 (jit=on)
patched: 22661 (jit=off) 21636 (jit=on)

I wrote part10.sql and ran that too, with these results:

master: 22280 (jit=off) 21876 (jit=on)
patched: 23466 (jit=off) 22905 (jit=on)

The partitioned case slowdown is roughly 10% with 6 columns, 5% with
10.  I would agree that that's not too bad for a worse-case test case,
nor something we couldn't optimize.  I have yet to look closely at
where the problem lies though.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
I noticed something else interesting.  If you try an actually-useful
UPDATE, ie one that has to do some computation in the target list,
you can get a plan like this if it's a partitioned table:

EXPLAIN (verbose, costs off) UPDATE parent SET f2 = f2 + 1;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Update on public.parent
   Update on public.child1 parent_1
   Update on public.child2 parent_2
   Update on public.child3 parent_3
   ->  Append
         ->  Seq Scan on public.child1 parent_1
               Output: (parent_1.f2 + 1), parent_1.tableoid, parent_1.ctid
         ->  Seq Scan on public.child2 parent_2
               Output: (parent_2.f2 + 1), parent_2.tableoid, parent_2.ctid
         ->  Seq Scan on public.child3 parent_3
               Output: (parent_3.f2 + 1), parent_3.tableoid, parent_3.ctid

But when using traditional inheritance, it looks more like:

EXPLAIN (verbose, costs off) UPDATE parent SET f2 = f2 + 1;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Update on public.parent
   Update on public.parent parent_1
   Update on public.child1 parent_2
   Update on public.child2 parent_3
   Update on public.child3 parent_4
   ->  Result
         Output: (parent.f2 + 1), parent.tableoid, parent.ctid
         ->  Append
               ->  Seq Scan on public.parent parent_1
                     Output: parent_1.f2, parent_1.tableoid, parent_1.ctid
               ->  Seq Scan on public.child1 parent_2
                     Output: parent_2.f2, parent_2.tableoid, parent_2.ctid
               ->  Seq Scan on public.child2 parent_3
                     Output: parent_3.f2, parent_3.tableoid, parent_3.ctid
               ->  Seq Scan on public.child3 parent_4
                     Output: parent_4.f2, parent_4.tableoid, parent_4.ctid

That is, instead of shoving the "f2 + 1" computation down to the table
scans, it gets done in a separate Result node, implying yet another
extra node in the plan with resultant slowdown.  The reason for this
seems to be that apply_scanjoin_target_to_paths has special logic
to push the target down to members of a partitioned table, but it
doesn't do that for other sorts of appendrels.  That isn't new
with this patch, you can see the same behavior in SELECT.

Given the distinct whiff of second-class citizenship that traditional
inheritance has today, I'm not sure how excited people will be about
fixing this.  I've complained before that apply_scanjoin_target_to_paths
is brute-force and needs to be rewritten, but I don't really want to
undertake that task right now.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Wed, Mar 31, 2021 at 11:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I noticed something else interesting.  If you try an actually-useful
> UPDATE, ie one that has to do some computation in the target list,
> you can get a plan like this if it's a partitioned table:
>
> EXPLAIN (verbose, costs off) UPDATE parent SET f2 = f2 + 1;
>                                 QUERY PLAN
> ---------------------------------------------------------------------------
>  Update on public.parent
>    Update on public.child1 parent_1
>    Update on public.child2 parent_2
>    Update on public.child3 parent_3
>    ->  Append
>          ->  Seq Scan on public.child1 parent_1
>                Output: (parent_1.f2 + 1), parent_1.tableoid, parent_1.ctid
>          ->  Seq Scan on public.child2 parent_2
>                Output: (parent_2.f2 + 1), parent_2.tableoid, parent_2.ctid
>          ->  Seq Scan on public.child3 parent_3
>                Output: (parent_3.f2 + 1), parent_3.tableoid, parent_3.ctid
>
> But when using traditional inheritance, it looks more like:
>
> EXPLAIN (verbose, costs off) UPDATE parent SET f2 = f2 + 1;
>                                 QUERY PLAN
> ---------------------------------------------------------------------------
>  Update on public.parent
>    Update on public.parent parent_1
>    Update on public.child1 parent_2
>    Update on public.child2 parent_3
>    Update on public.child3 parent_4
>    ->  Result
>          Output: (parent.f2 + 1), parent.tableoid, parent.ctid
>          ->  Append
>                ->  Seq Scan on public.parent parent_1
>                      Output: parent_1.f2, parent_1.tableoid, parent_1.ctid
>                ->  Seq Scan on public.child1 parent_2
>                      Output: parent_2.f2, parent_2.tableoid, parent_2.ctid
>                ->  Seq Scan on public.child2 parent_3
>                      Output: parent_3.f2, parent_3.tableoid, parent_3.ctid
>                ->  Seq Scan on public.child3 parent_4
>                      Output: parent_4.f2, parent_4.tableoid, parent_4.ctid
>
> That is, instead of shoving the "f2 + 1" computation down to the table
> scans, it gets done in a separate Result node, implying yet another
> extra node in the plan with resultant slowdown.  The reason for this
> seems to be that apply_scanjoin_target_to_paths has special logic
> to push the target down to members of a partitioned table, but it
> doesn't do that for other sorts of appendrels.  That isn't new
> with this patch, you can see the same behavior in SELECT.

I've noticed this too when investigating why
find_modifytable_subplan() needed to deal with a Result node in some
cases.

> Given the distinct whiff of second-class citizenship that traditional
> inheritance has today, I'm not sure how excited people will be about
> fixing this.  I've complained before that apply_scanjoin_target_to_paths
> is brute-force and needs to be rewritten, but I don't really want to
> undertake that task right now.

I remember having *unsuccessfully* tried to make
apply_scanjoin_target_to_paths() do the targetlist pushdown for the
traditional inheritance cases as well.  I agree that rethinking the
whole apply_scanjoin_target_to_paths() approach might be a better use
of our time.  It has a looping-over-the-whole-partition-array
bottleneck for simple lookup queries that I have long wanted to
propose doing something about.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> On Wed, Mar 31, 2021 at 11:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... I've complained before that apply_scanjoin_target_to_paths
>> is brute-force and needs to be rewritten, but I don't really want to
>> undertake that task right now.

> I remember having *unsuccessfully* tried to make
> apply_scanjoin_target_to_paths() do the targetlist pushdown for the
> traditional inheritance cases as well.  I agree that rethinking the
> whole apply_scanjoin_target_to_paths() approach might be a better use
> of our time.  It has a looping-over-the-whole-partition-array
> bottleneck for simple lookup queries that I have long wanted to
> propose doing something about.

I was wondering if we could get anywhere by pushing more smarts
down to the level of create_projection_path itself, ie if we see
we're trying to apply a projection to an AppendPath then push it
underneath that automatically.  Then maybe some of the hackery
in apply_scanjoin_target_to_paths could go away.

There's already an attempt at that in apply_projection_to_path,
but it's not completely clean so there are callers that can't
use it.  Maybe a little more thought about how to do that
in a way that violates no invariants would yield dividends.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Tue, Mar 30, 2021 at 1:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Here's a v13 patchset that I feel pretty good about.

Thanks.  After staring at this for a day now, I do too.

> My original thought for replacing the "fake variable" design was to
> add another RTE holding the extra variables, and then have setrefs.c
> translate the placeholder variables to the real thing at the last
> possible moment.  I soon realized that instead of an actual RTE,
> it'd be better to invent a special varno value akin to INDEX_VAR
> (I called it ROWID_VAR, though I'm not wedded to that name).  Info
> about the associated variables is kept in a list of RowIdentityVarInfo
> structs, which are more suitable than a regular RTE would be.
>
> I got that and the translate-in-setrefs approach more or less working,
> but it was fairly messy, because the need to know about these special
> variables spilled into FDWs and a lot of other places; for example
> indxpath.c needed a special check for them when deciding if an
> index-only scan is possible.  What turns out to be a lot cleaner is
> to handle the translation in adjust_appendrel_attrs_mutator(), so that
> we have converted to real variables by the time we reach any
> relation-scan-level logic.
>
> I did end up having to break the API for FDW AddForeignUpdateTargets
> functions: they need to do things differently when adding junk columns,
> and they need different parameters.  This seems all to the good though,
> because the old API has been a backwards-compatibility hack for some
> time (e.g., in not passing the "root" pointer).

This all looks really neat.

I couldn't help but think that the RowIdentityVarInfo management code
looks a bit like SpecialJunkVarInfo stuff in my earliest patches, but
of course without all the fragility of assigning "fake" attribute
numbers to a "real" base relation(s).

> Some other random notes:
>
> * I was unimpressed with the idea of distinguishing different target
> relations by embedding integer constants in the plan.  In the first
> place, the implementation was extremely fragile --- there was
> absolutely NOTHING tying the counter you used to the subplans' eventual
> indexes in the ModifyTable lists.  Plus I don't have a lot of faith
> that setrefs.c will reliably do what you want in terms of bubbling the
> things up.  Maybe that could be made more robust, but the other problem
> is that the EXPLAIN output is just about unreadable; nobody will
> understand what "(0)" means.  So I went back to the idea of emitting
> tableoid, and installed a hashtable plus a one-entry lookup cache
> to make the run-time mapping as fast as I could.  I'm not necessarily
> saying that this is how it has to be indefinitely, but I think we
> need more work on planner and EXPLAIN infrastructure before we can
> get the idea of directly providing a list index to work nicely.

Okay.

> * I didn't agree with your decision to remove the now-failing test
> cases from postgres_fdw.sql.  I think it's better to leave them there,
> especially in the cases where we were checking the plan as well as
> the execution.  Hopefully we'll be able to un-break those soon.

Okay.

> * I updated a lot of hereby-obsoleted comments, which makes the patch
> a bit longer than v12; but actually the code is a good bit smaller.
> There's a noticeable net code savings in src/backend/optimizer/,
> which there was not before.

Agreed.  (I had evidently missed a bunch of comments referring to the
old ways of how inherited updates are performed.)

> I've not made any attempt to do performance testing on this,
> but I think that's about the only thing standing between us
> and committing this thing.

I reran some of the performance tests I did earlier (I've attached the
modified test running script for reference):

pgbench -n -T60 -M{simple|prepared} -f nojoin.sql

nojoin.sql:

\set a random(1, 1000000)
update test_table t set b = :a where a = :a;

...and here are the tps figures:

-Msimple

nparts  10cols      20cols      40cols

master:
64      10112       9878        10920
128     9662        10691       10604
256     9642        9691        10626
1024    8589        9675        9521

patched:
64      13493       13463       13313
128     13305       13447       12705
256     13190       13161       12954
1024    11791       11408       11786

No variation across various column counts, but the patched improves
the tps for each case by quite a bit.

-Mprepared (plan_cache_mode=force_generic_plan)

master:
64      2286        2285        2266
128     1163        1127        1091
256     531         519         544
1024    77          71          69

patched:
64      6522        6612        6275
128     3568        3625        3372
256     1847        1710        1823
1024    433         427         386

Again, no variation across columns counts.  tps drops as partition
count increases both before and after applying the patches, although
patched performs way better, which is mainly attributable to the
ability of UPDATE to now utilize runtime pruning (actually of the
Append under ModifyTable).  The drop as partition count increases can
be attributed to the fact that with a generic plan, there are a bunch
of steps that must be done across all partitions, such as
AcauireExecutorLocks(), ExecCheckRTPerms(), per-result-rel
initializations in ExecInitModifyTable(), etc., even with the patched.
As mentioned upthread, [1] can help with the last bit.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://commitfest.postgresql.org/32/2621/

Attachment

Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> On Tue, Mar 30, 2021 at 1:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Here's a v13 patchset that I feel pretty good about.

> Thanks.  After staring at this for a day now, I do too.

Thanks for looking!  Pushed after some more docs-fiddling and a final
read-through.  I think the only code change from v13 is that I decided
to convert ExecGetJunkAttribute into a "static inline", since it's
just a thin wrapper around slot_getattr().  Doesn't really help
performance much, but it shouldn't hurt.

> ... The drop as partition count increases can
> be attributed to the fact that with a generic plan, there are a bunch
> of steps that must be done across all partitions, such as
> AcauireExecutorLocks(), ExecCheckRTPerms(), per-result-rel
> initializations in ExecInitModifyTable(), etc., even with the patched.
> As mentioned upthread, [1] can help with the last bit.

I'll try to find some time to look at that one.

I'd previously been thinking that we couldn't be lazy about applying
most of those steps at executor startup, but on second thought,
ExecCheckRTPerms should be a no-op anyway for child tables.  So
maybe it would be okay to not take a lock, much less do the other
stuff, until the particular child table is stored into.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Robert Haas
Date:
On Tue, Mar 30, 2021 at 12:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Maybe that could be made more robust, but the other problem
> is that the EXPLAIN output is just about unreadable; nobody will
> understand what "(0)" means.

I think this was an idea that originally came from me, prompted by
what we already do for:

rhaas=# explain verbose select 1 except select 2;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 HashSetOp Except  (cost=0.00..0.06 rows=1 width=8)
   Output: (1), (0)
   ->  Append  (cost=0.00..0.05 rows=2 width=8)
         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..0.02 rows=1 width=8)
               Output: 1, 0
               ->  Result  (cost=0.00..0.01 rows=1 width=4)
                     Output: 1
         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=8)
               Output: 2, 1
               ->  Result  (cost=0.00..0.01 rows=1 width=4)
                     Output: 2
(11 rows)

That is admittedly pretty magical, but it's a precedent. If you think
the relation OID to subplan index lookup is fast enough that it
doesn't matter, then I guess it's OK, but I guess my opinion is that
the subplan index feels like the thing we really want, and if we're
passing anything else up the plan tree, that seems to be a decision
made out of embarrassment rather than conviction. I think the real
problem here is that the deparsing code isn't in on the secret. If in
the above example, or in this patch, it deparsed as (Subplan Index) at
the parent level, and 0, 1, 2, ... in the children, it wouldn't
confuse anyone, or at least not much more than EXPLAIN output does in
general.

Or even if we just output (Constant-Value) it wouldn't be that bad.
The whole convention of deparsing target lists by recursing into the
children, or one of them, in some ways obscures what's really going
on. I did a talk a few years ago in which I made those target lists
deparse as $OUTER.0, $OUTER.1, $INNER.0, etc. and I think people found
that pretty enlightening, because it's sort of non-obvious in what way
table foo is present when a target list 8 levels up in the join tree
claims to have a value for foo.x. Now, such notation can't really be
recommended in general, because it'd be too hard to understand what
was happening in a lot of cases, but the recursive stuff is clearly
not without its own attendant confusions.

Thanks to both of you for working on this! As I said before, this
seems like really important work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 30, 2021 at 12:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe that could be made more robust, but the other problem
>> is that the EXPLAIN output is just about unreadable; nobody will
>> understand what "(0)" means.

> I think this was an idea that originally came from me, prompted by
> what we already do for:

I agree that we have some existing behavior that's related to this, but
it's still messy, and I couldn't find any evidence that suggested that the
runtime lookup costs anything.  Typical subplans are going to deliver
long runs of tuples from the same target relation, so as long as we
maintain a one-element cache of the last lookup result, it's only about
one comparison per tuple most of the time.

> I think the real
> problem here is that the deparsing code isn't in on the secret.

Agreed; if we spent some more effort on that end of it, maybe we
could do something different here.  I'm not very sure what good
output would look like though.  A key advantage of tableoid is
that that's already a thing people know about.

            regards, tom lane



Re: making update/delete of inheritance trees scale better

From
Robert Haas
Date:
On Wed, Mar 31, 2021 at 1:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I agree that we have some existing behavior that's related to this, but
> it's still messy, and I couldn't find any evidence that suggested that the
> runtime lookup costs anything.  Typical subplans are going to deliver
> long runs of tuples from the same target relation, so as long as we
> maintain a one-element cache of the last lookup result, it's only about
> one comparison per tuple most of the time.

OK, that's pretty fair.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Thu, Apr 1, 2021 at 12:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Tue, Mar 30, 2021 at 1:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Here's a v13 patchset that I feel pretty good about.
>
> > Thanks.  After staring at this for a day now, I do too.
>
> Thanks for looking!  Pushed after some more docs-fiddling and a final
> read-through.  I think the only code change from v13 is that I decided
> to convert ExecGetJunkAttribute into a "static inline", since it's
> just a thin wrapper around slot_getattr().  Doesn't really help
> performance much, but it shouldn't hurt.

Thanks a lot.

> > ... The drop as partition count increases can
> > be attributed to the fact that with a generic plan, there are a bunch
> > of steps that must be done across all partitions, such as
> > AcauireExecutorLocks(), ExecCheckRTPerms(), per-result-rel
> > initializations in ExecInitModifyTable(), etc., even with the patched.
> > As mentioned upthread, [1] can help with the last bit.
>
> I'll try to find some time to look at that one.
>
> I'd previously been thinking that we couldn't be lazy about applying
> most of those steps at executor startup, but on second thought,
> ExecCheckRTPerms should be a no-op anyway for child tables.

Yeah, David did say that in that thread:

https://www.postgresql.org/message-id/CAApHDvqPzsMcKLRpmNpUW97PmaQDTmD7b2BayEPS5AN4LY-0bA%40mail.gmail.com

>  So
> maybe it would be okay to not take a lock, much less do the other
> stuff, until the particular child table is stored into.

Note that the patch over there doesn't do anything about
AcquireExecutorLocks() bottleneck, as there are some yet-unsolved race
conditions that were previously discussed here:

https://www.postgresql.org/message-id/flat/CAKJS1f_kfRQ3ZpjQyHC7=PK9vrhxiHBQFZ+hc0JCwwnRKkF3hg@mail.gmail.com

Anyway, I'll post the rebased version of the patch that we do have.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: making update/delete of inheritance trees scale better

From
David Rowley
Date:
On Thu, 1 Apr 2021 at 15:09, Amit Langote <amitlangote09@gmail.com> wrote:
> Note that the patch over there doesn't do anything about
> AcquireExecutorLocks() bottleneck, as there are some yet-unsolved race
> conditions that were previously discussed here:
>
> https://www.postgresql.org/message-id/flat/CAKJS1f_kfRQ3ZpjQyHC7=PK9vrhxiHBQFZ+hc0JCwwnRKkF3hg@mail.gmail.com

The only way I can think of so far to get around having to lock all
child partitions is pretty drastic and likely it's too late to change
anyway.  The idea is that when you attach an existing table as a
partition that you can no longer access it directly. We'd likely have
to invent a new relkind for partitions for that to work.  This would
mean that we shouldn't ever need to lock individual partitions as all
things which access them must do so via the parent. I imagined that we
might still be able to truncate partitions with an ALTER TABLE ...
TRUNCATE PARTITION ...; or something.   It feels a bit late for all
that now though, especially so with all the CONCURRENTLY work Alvaro
has done to make ATTACH/DETACH not take an AEL.

Additionally, I imagine doing this would upset a lot of people who do
direct accesses to partitions.

Robert also mentioned some ideas in [1]. However, it seems that might
have a performance impact on locking in general.

I think some other DBMSes might not allow direct access to partitions.
Perhaps the locking issue is the reason why.

David

[1] https://www.postgresql.org/message-id/CA%2BTgmoYbtm1uuDne3rRp_uNA2RFiBwXX1ngj3RSLxOfc3oS7cQ%40mail.gmail.com



Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
On Wed, Mar 31, 2021 at 9:54 PM Amit Langote <amitlangote09@gmail.com> wrote:
> I reran some of the performance tests I did earlier (I've attached the
> modified test running script for reference):

For archives' sake, noticing a mistake in my benchmarking script, I
repeated the tests. Apparently, all pgbench runs were performed with
40 column tables, not 10, 20, and 40 as shown in the results.

> pgbench -n -T60 -M{simple|prepared} -f nojoin.sql
>
> nojoin.sql:
>
> \set a random(1, 1000000)
> update test_table t set b = :a where a = :a;
>
> ...and here are the tps figures:
>
> -Msimple
>
> nparts  10cols      20cols      40cols
>
> master:
> 64      10112       9878        10920
> 128     9662        10691       10604
> 256     9642        9691        10626
> 1024    8589        9675        9521
>
> patched:
> 64      13493       13463       13313
> 128     13305       13447       12705
> 256     13190       13161       12954
> 1024    11791       11408       11786
>
> No variation across various column counts, but the patched improves
> the tps for each case by quite a bit.

-Msimple

pre-86dc90056:
nparts  10cols      20cols      40cols

64      11345       10650       10327
128     11014       11005       10069
256     10759       10827       10133
1024    9518        10314       8418

post-86dc90056:
        10cols      20cols      40cols

64      13829       13677       13207
128     13521       12843       12418
256     13071       13006       12926
1024    12351       12036       11739

My previous assertion that the tps does vary across different column
counts seems to hold in this case, that is, -Msimple mode.

> -Mprepared (plan_cache_mode=force_generic_plan)
>
> master:
> 64      2286        2285        2266
> 128     1163        1127        1091
> 256     531         519         544
> 1024    77          71          69
>
> patched:
> 64      6522        6612        6275
> 128     3568        3625        3372
> 256     1847        1710        1823
> 1024    433         427         386
>
> Again, no variation across columns counts.

-Mprepared

pre-86dc90056:
        10cols      20cols      40cols

64      3059        2851        2154
128     1675        1366        1100
256     685         658         544
1024    126         85          76

post-86dc90056:
        10cols      20cols      40cols

64      7665        6966        6444
128     4211        3968        3389
256     2205        2020        1783
1024    545         499         389

In the -Mprepared case however, it does vary, both before and after
86dc90056.  For the post-86dc90056 case, I suspect it's because
ExecBuildUpdateProjection(), whose complexity is O(number-of-columns),
being performed for *all* partitions in ExecInitModifyTable().  In the
-Msimple case, it would always be for only one partition, so it
doesn't make that much of a difference to ExecInitModifyTable() time.

>  tps drops as partition
> count increases both before and after applying the patches, although
> patched performs way better, which is mainly attributable to the
> ability of UPDATE to now utilize runtime pruning (actually of the
> Append under ModifyTable).  The drop as partition count increases can
> be attributed to the fact that with a generic plan, there are a bunch
> of steps that must be done across all partitions, such as
> AcauireExecutorLocks(), ExecCheckRTPerms(), per-result-rel
> initializations in ExecInitModifyTable(), etc., even with the patched.
> As mentioned upthread, [1] can help with the last bit.

Here are the numbers after applying that patch:

        10cols      20cols      40cols

64      17185       17064       16625
128     12261       11648       11968
256     7662        7564        7439
1024    2252        2185        2101

With the patch, ExecBuildUpdateProjection() will be called only once
irrespective of the number of partitions, almost like the -Msimple
case, so the tps across column counts does not vary by much.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



RE: making update/delete of inheritance trees scale better

From
"houzj.fnst@fujitsu.com"
Date:
Hi

After 86dc900, In " src/include/nodes/pathnodes.h ",
I noticed that it uses the word " partitioned UPDATE " in the comment above struct RowIdentityVarInfo.

But, it seems " inherited UPDATE " is used in the rest of places.
Is it better to keep them consistent by using " inherited UPDATE " ?

Best regards,
houzj


Re: making update/delete of inheritance trees scale better

From
Amit Langote
Date:
Hi,

On Mon, May 17, 2021 at 3:07 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> Hi
>
> After 86dc900, In " src/include/nodes/pathnodes.h ",
> I noticed that it uses the word " partitioned UPDATE " in the comment above struct RowIdentityVarInfo.
>
> But, it seems " inherited UPDATE " is used in the rest of places.
> Is it better to keep them consistent by using " inherited UPDATE " ?

Yeah, I would not be opposed to fixing that.  Like this maybe (patch attached)?

- * In partitioned UPDATE/DELETE it's important for child partitions to share
+ * In an inherited UPDATE/DELETE it's important for child tables to share

While at it, I also noticed that the comment refers to the
row_identity_vars, but it can be unclear which variable it is
referring to.  So fixed that too.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

RE: making update/delete of inheritance trees scale better

From
"houzj.fnst@fujitsu.com"
Date:
> On Mon, May 17, 2021 at 3:07 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > Hi
> >
> > After 86dc900, In " src/include/nodes/pathnodes.h ", I noticed that it
> > uses the word " partitioned UPDATE " in the comment above struct
> RowIdentityVarInfo.
> >
> > But, it seems " inherited UPDATE " is used in the rest of places.
> > Is it better to keep them consistent by using " inherited UPDATE " ?
> 
> Yeah, I would not be opposed to fixing that.  Like this maybe (patch attached)?

> - * In partitioned UPDATE/DELETE it's important for child partitions to share
> + * In an inherited UPDATE/DELETE it's important for child tables to 
> + share

Thanks for the change, it looks good to me.

Best regards,
houzj