Thread: Remembering bug #6123

Remembering bug #6123

From
"Kevin Grittner"
Date:
Going back through the patches we had to make to 9.0 to move to
PostgreSQL triggers, I noticed that I let the issues raised as bug
#6123 lie untouched during the 9.2 development cycle.  In my view,
the best suggestion for a solution was proposed by Florian here:
http://archives.postgresql.org/pgsql-hackers/2011-08/msg00388.php
As pointed out in a brief exchange after that post, there would be
ways to do the more exotic things that might be desired, while
preventing apparently straightforward code from doing surprising
things.  I'm not sure whether that discussion fully satisfies the
concerns raised by Robert, though.
Because I let this lapse, it only seems feasible to go forward with
a patch for 9.2 if there is consensus around Florian's proposal.  If
there is any dissent, I guess the thing to do is for me to gather
the issues and see about getting something into 9.3, once 9.2 work
has died down -- in five months or so.  Wisconsin Courts can
continue to deal with the issues using my more simple-minded patch,
but others still are getting surprised by it -- bug #6226 is
apparently another manifestation.
Comments?
-Kevin


Re: Remembering bug #6123

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Going back through the patches we had to make to 9.0 to move to
> PostgreSQL triggers, I noticed that I let the issues raised as bug
> #6123 lie untouched during the 9.2 development cycle.  In my view,
> the best suggestion for a solution was proposed by Florian here:
> http://archives.postgresql.org/pgsql-hackers/2011-08/msg00388.php

Do you mean this:
After every BEFORE trigger invocation, if the trigger returnednon-NULL, check if latest row version is still the same
aswhenthe trigger started. If not, complain.
 

While that sounds relatively safe, if possibly performance-impacting,
it's not apparent to me how it fixes the problem you complained of.
The triggers you were using were modifying rows other than the one
being targeted by the triggering action, so a test like the above would
not notice that they'd done anything.
        regards, tom lane


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane  wrote:
> "Kevin Grittner"  writes:
>> Going back through the patches we had to make to 9.0 to move to
>> PostgreSQL triggers, I noticed that I let the issues raised as bug
>> #6123 lie untouched during the 9.2 development cycle. In my view,
>> the best suggestion for a solution was proposed by Florian here:
> 
>> http://archives.postgresql.org/pgsql-hackers/2011-08/msg00388.php
> 
> Do you mean this:
> 
>     After every BEFORE trigger invocation, if the trigger returned
>     non-NULL, check if latest row version is still the same as when
>     the trigger started. If not, complain.
That is the consice statement of it, yes.
> While that sounds relatively safe, if possibly performance-
> impacting, it's not apparent to me how it fixes the problem you
> complained of.  The triggers you were using were modifying rows
> other than the one being targeted by the triggering action, so a
> test like the above would not notice that they'd done anything.
My initial use-case was that a BEFORE DELETE trigger was deleting
related "child" rows, and the BEFORE DELETE trigger at the child
level was updating counts on the original (parent) row.  The proposed
change would cause an error to be thrown when the parent level
returned a non-NULL value from its BEFORE DELETE trigger.  That would
prevent the silent corruption of the data, so it's a big step forward
in my view; but it's not the behavior we most want in our shop for
this particular case.  In the messages later in the thread, Florian
pointed out that this pattern would allow us to get the desired
behavior:
| BEFORE DELETE ON :
|   DELETE FROM  WHERE parent_id = OLD.id;
|   IF FOUND THEN
|     -- Removing children might have modified our row,
|     -- so returning non-NULL is not an option
|     DELETE FROM  WHERE id = OLD.id;
|     RETURN NULL;
|   ELSE
|     -- No children removed, so our row should be unmodified
|     RETURN OLD;
|   END IF;
The advantage of Florian's approach is that it changes the default
behavior to something very safe, while allowing arbitrarily complex
behavior through correspondingly more complex code.
-Kevin


Re: Remembering bug #6123

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane  wrote:
>> While that sounds relatively safe, if possibly performance-
>> impacting, it's not apparent to me how it fixes the problem you
>> complained of.  The triggers you were using were modifying rows
>> other than the one being targeted by the triggering action, so a
>> test like the above would not notice that they'd done anything.
> My initial use-case was that a BEFORE DELETE trigger was deleting
> related "child" rows, and the BEFORE DELETE trigger at the child
> level was updating counts on the original (parent) row.  The proposed
> change would cause an error to be thrown when the parent level
> returned a non-NULL value from its BEFORE DELETE trigger.

Ah, I see.

I suggest that the current behavior was designed for the case of
independent concurrent updates, and you have not made a good argument
for changing that.  What does make sense to me, in light of these
examples, is to complain if a BEFORE trigger modifies the row "itself"
(including indirect updates).  IOW, at conclusion of trigger firing
(I see no need to do it for each trigger), check to see if the target
row has been outdated *by the current transaction*, and throw error if
not.

And, if you accept the statement of the fix like that, then actually
there is no performance hit because there is no need to introduce new
tests.  All we have to do is start treating HeapTupleSelfUpdated result
from heap_update or heap_delete as an error case instead of an
okay-do-nothing case.  There doesn't even need to be an explicit check
that this was caused by a trigger, because AFAICS there isn't any other
possibility.
        regards, tom lane


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I suggest that the current behavior was designed for the case of
> independent concurrent updates, and you have not made a good
> argument for changing that.  What does make sense to me, in light
> of these examples, is to complain if a BEFORE trigger modifies the
> row "itself" (including indirect updates).  IOW, at conclusion of
> trigger firing (I see no need to do it for each trigger), check to
> see if the target row has been outdated *by the current
> transaction*, and throw error if not.
>
> And, if you accept the statement of the fix like that, then
> actually there is no performance hit because there is no need to
> introduce new tests.  All we have to do is start treating
> HeapTupleSelfUpdated result from heap_update or heap_delete as an
> error case instead of an okay-do-nothing case.  There doesn't even
> need to be an explicit check that this was caused by a trigger,
> because AFAICS there isn't any other possibility.

I think that's pretty much what my previously posted patches did.
Here's a slightly modified one, based on Florian's feedback.  Is
this what you had in mind, or am I misunderstanding?

-Kevin


Attachment

Re: Remembering bug #6123

From
Florian Pflug
Date:
On Jan12, 2012, at 00:32 , Kevin Grittner wrote:
> Going back through the patches we had to make to 9.0 to move to
> PostgreSQL triggers, I noticed that I let the issues raised as bug
> #6123 lie untouched during the 9.2 development cycle.  In my view,
> the best suggestion for a solution was proposed by Florian here:
> 
> http://archives.postgresql.org/pgsql-hackers/2011-08/msg00388.php

I've recently encountered a related issue, this time not related to
triggers but to FOR UPDATE.

My code declared a cursor which iterates over pairs of parent and
child rows, and updated the parent which values from the child. I
declared the cursor "FOR UPDATE OF parent", and used "WHERE CURRENT OF"
to update the parent row while iterating over the cursor. The code
looked something like this

DECLARE c_parent_child CURSOR FOR   SELECT ... FROM parent JOIN child ON ...   FOR UPDATE OF parent;

BEGIN FOR v_row IN c_parent_child LOOP   ...   UPDATE parent SET ... WHERE CURRENT OF c_parent_child END LOOP

I figured that was all safe and sound, since with the cursor's results
should be unaffected by the later UPDATES - after all, it's cmax is
smaller than any of the UPDATEs command ids. What I didn't take into
account, however, is that "FOR UPDATE OF" will silently skip rows which
have been updated (by the same transaction) after the cursor's snapshot
was established.

Thus, what happened was that things worked fine for parents with only
one child, but for parents with multiple children only the first child
got be processed. Once I realized that source of that problem, the fix was
easy - simply using the parent table's primary key to do the update, and
dropping the "FOR UPDATE OF" clause fixed the problem.

So, I guess my question is, if we add safeguards against these sorts of
bugs for triggers, should we also add them to FOR UPDATE? Historically,
we seem to have taken the stand that modifications of self-updated tuples
should be ignored. If we're going to reverse that decision (which I think
Kevin has argued for quite convincingly), it seems consistent to complain
about all modifications to self-updated tuples, not only to those involving
triggers.

OTOH, the more cases we complain, the larger the chance that we cause serious
issues for people who upgrade to 9.2. (or 9.3, whatever).

best regards,
Florian Pflug



Re: Remembering bug #6123

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ...  All we have to do is start treating
>> HeapTupleSelfUpdated result from heap_update or heap_delete as an
>> error case instead of an okay-do-nothing case.  There doesn't even
>> need to be an explicit check that this was caused by a trigger,
>> because AFAICS there isn't any other possibility.
> I think that's pretty much what my previously posted patches did. 
> Here's a slightly modified one, based on Florian's feedback.  Is
> this what you had in mind, or am I misunderstanding?
Actually, I was just about to follow up with a comment that that was too
simplistic: it would break the current behavior of join updates/deletes
that join to the same target row more than once.  (There might be an
argument for restricting those, but this discussion isn't about that.)
So what we need to do is check whether the outdate was done by a later
CommandId than current.  I see that your patch is attempting to deal
with these issues by testing GetCurrentCommandId(false) !=
estate->es_output_cid, but that seems completely wrong to me, as what it
does is complain if *any* additional command has been executed in the
transaction, regardless of what changed the target tuple.  It ought to
be comparing the tuple's xmax to es_output_cid.  And the comment needs
to cover why it's worrying about that.

Also, what's the point of testing update_ctid?  I don't see that it
matters whether the outdate was a delete or an update.
        regards, tom lane


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So what we need to do is check whether the outdate was done by a
> later CommandId than current.  I see that your patch is attempting
> to deal with these issues by testing GetCurrentCommandId(false) !=
> estate->es_output_cid, but that seems completely wrong to me, as
> what it does is complain if *any* additional command has been
> executed in the transaction, regardless of what changed the target
> tuple.  It ought to be comparing the tuple's xmax to
> es_output_cid.  And the comment needs to cover why it's worrying
> about that.
OK.  I'll rework based on your comments.
> Also, what's the point of testing update_ctid?  I don't see that
> it matters whether the outdate was a delete or an update.
The update_ctid code was a carry-over from my old, slightly
different approach, which I failed to change as I should have.  I'll
fix that along with the other.
Thanks,
-Kevin


Re: Remembering bug #6123

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> So, I guess my question is, if we add safeguards against these sorts of
> bugs for triggers, should we also add them to FOR UPDATE? Historically,
> we seem to have taken the stand that modifications of self-updated tuples
> should be ignored. If we're going to reverse that decision (which I think
> Kevin has argued for quite convincingly), it seems consistent to complain
> about all modifications to self-updated tuples, not only to those involving
> triggers.

I'm not very convinced, except for the specific case of updates caused
by cascaded triggers.
        regards, tom lane


Re: Remembering bug #6123

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, what's the point of testing update_ctid?  I don't see that
>> it matters whether the outdate was a delete or an update.
> The update_ctid code was a carry-over from my old, slightly
> different approach, which I failed to change as I should have.  I'll
> fix that along with the other.

Actually, on reflection there might be a reason for checking
update_ctid, with a view to allowing "harmless" cases.  I see
these cases:

* UPDATE finds a trigger already updated the row: must throw error
since we can't apply the update.

* UPDATE finds a trigger already deleted the row: arguably, we could
let the deletion stand and ignore the update action.

* DELETE finds a trigger already updated the row: must throw error
since we can't apply the delete.

* DELETE finds a trigger already deleted the row: arguably, there's
no reason to complain.

Don't know if that was your reasoning as well.  But if it is, then again
the comment needs to cover that.
        regards, tom lane


Re: Remembering bug #6123

From
Tom Lane
Date:
I wrote:
>> ... It ought to be comparing the tuple's xmax to
>> es_output_cid.

Sigh, need to go find some caffeine.  Obviously, it needs to check the
tuple's cmax, not xmax.  And that means the patch will be a bit more
invasive than this, because heap_update and heap_delete don't return
that information at present.
        regards, tom lane


Re: Remembering bug #6123

From
Florian Pflug
Date:
On Jan12, 2012, at 17:30 , Tom Lane wrote:
> Actually, on reflection there might be a reason for checking
> update_ctid, with a view to allowing "harmless" cases.  I see
> these cases:
> 
> * UPDATE finds a trigger already updated the row: must throw error
> since we can't apply the update.
> 
> * UPDATE finds a trigger already deleted the row: arguably, we could
> let the deletion stand and ignore the update action.

I've argued against that in the past, and I still think it's a bad idea.
The BEFORE UPDATE trigger might have done some actions which aren't valid
now that the row has been deleted. If it actually *is* safe to let a
self-delete take precent, the BEFORE UPDATE trigger ought to check whether
the row still exists, and return NULL if it doesn't.

> * DELETE finds a trigger already updated the row: must throw error
> since we can't apply the delete.
> 
> * DELETE finds a trigger already deleted the row: arguably, there's
> no reason to complain.

I'm not convinced that is a a good idea, either. If we do that, there will
essentially be two BEFORE DELETE trigger invocations for a single deleted
row, which seems wrong. And again - if a BEFORE DELETE trigger *does* deal
with this case nicely, it simply has to check whether the row still exists
before returning non-NULL.

Also, without these exceptions, the behaviour (post-patch) is simply
explain by
 Either don't cause recursive same-row updates from BEFORE trigger, or have your trigger return NULL.

and except for the case of multiple BEFORE triggers on the same table
you can always assume that
 A BEFORE trigger's view of a tuple modifications always reflects the actual modification that will eventually happen
(oran error is thrown)
 

Adding a lists of "buts" and "ifs" to these rules has a higher chance of
adding confusion and causing bugs than to actually help people, IMHO.

Especially since (judged from the number of years the current behaviour
as stood undisputed) these cases seems to arise quite infrequently in
practice.

best regards,
Florian Pflug



Re: Remembering bug #6123

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> On Jan12, 2012, at 17:30 , Tom Lane wrote:
>> Actually, on reflection there might be a reason for checking
>> update_ctid, with a view to allowing "harmless" cases.

> I've argued against that in the past, and I still think it's a bad idea.

Well, the main argument for it in my mind is that we could avoid
breaking existing behavior in cases where it's not clearly essential
to do so.  Perhaps that's less compelling than keeping the behavior
simple, since it's true that the specific cases we could preserve the old
behavior in are pretty infrequent.  I'm not wedded to the idea, but
would like to hear a few other peoples' opinions.
        regards, tom lane


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> On Jan12, 2012, at 17:30 , Tom Lane wrote:
>>> Actually, on reflection there might be a reason for checking
>>> update_ctid, with a view to allowing "harmless" cases.
> 
>> I've argued against that in the past, and I still think it's a
>> bad idea.
> 
> Well, the main argument for it in my mind is that we could avoid
> breaking existing behavior in cases where it's not clearly
> essential to do so.  Perhaps that's less compelling than keeping
> the behavior simple, since it's true that the specific cases we
> could preserve the old behavior in are pretty infrequent.  I'm not
> wedded to the idea, but would like to hear a few other peoples'
> opinions.
FWIW, I started from the position that the "harmless" cases should
be quietly handled.  But Florian made what, to me, were persuasive
arguments against that.  A DELETE trigger might fire twice for one
delete, mucking up data integrity, for example.  His proposal seemed
to make my use case hard to handle, but then he pointed out how
triggers could be coded to allow that -- it's just that you would
need to go out of your way to do so, reducing the chance of falling
into bad behavior accidentally.
So count me as a convert to Florian's POV, even if I didn't succeed
in ripping out all the code from my former POV from the v3 patch.
-Kevin


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> it needs to check the tuple's cmax [...]  And that means the patch
> will be a bit more invasive than this, because heap_update and
> heap_delete don't return that information at present.
I'm thinking that I could keep the test for: GetCurrentCommandId(false) != estate->es_output_cid
as a "first pass".  If that's true, I could use EvalPlanQualFetch()
to find the last version of the tuple, and generate the error if the
tuple's cmax != estate->es_output_cid.  I think, although I'm not
entirely sure, that EvalPlanQualFetch needs a little work to support
this usage.
Attached is a patch based on these thoughts.  Is it on the right
track?  I suspect I haven't got everything covered, but thought a
reality check was in order at this point.
It does pass regression tests, including the new ones I added.
-Kevin


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

> Attached is a patch based on these thoughts.

OK, really attached this time.

-Kevin


Attachment

Re: Remembering bug #6123

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> it needs to check the tuple's cmax [...]  And that means the patch
>> will be a bit more invasive than this, because heap_update and
>> heap_delete don't return that information at present.
> I'm thinking that I could keep the test for:
>   GetCurrentCommandId(false) != estate->es_output_cid
> as a "first pass".  If that's true, I could use EvalPlanQualFetch()
> to find the last version of the tuple, and generate the error if the
> tuple's cmax != estate->es_output_cid.  I think, although I'm not
> entirely sure, that EvalPlanQualFetch needs a little work to support
> this usage.
> Attached is a patch based on these thoughts.  Is it on the right
> track?  I suspect I haven't got everything covered, but thought a
> reality check was in order at this point.

You forgot to attach the patch, but the approach seems totally Rube
Goldberg to me anyway.  Why not just fix heap_update/heap_delete to
return the additional information?  It's not like we don't whack their
parameter lists around regularly.

Rather than having three output parameters to support the case, I'm
a bit inclined to merge them into a single-purpose struct type.
But that's mostly cosmetic.
        regards, tom lane


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You forgot to attach the patch, but the approach seems totally
> Rube Goldberg to me anyway.  Why not just fix heap_update/
> heap_delete to return the additional information?  It's not like
> we don't whack their parameter lists around regularly.
OK.
> Rather than having three output parameters to support the case,
> I'm a bit inclined to merge them into a single-purpose struct
> type.  But that's mostly cosmetic.
OK.
Working on v5.
-Kevin


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> You forgot to attach the patch, but the approach seems totally
> Rube Goldberg to me anyway.  Why not just fix heap_update/
> heap_delete to return the additional information?  It's not like
> we don't whack their parameter lists around regularly.
>
> Rather than having three output parameters to support the case,
> I'm a bit inclined to merge them into a single-purpose struct
> type.  But that's mostly cosmetic.

OK, I got rid of the parrots and candles and added a structure to
hold the data returned only on failure.

Am I getting closer?

-Kevin


Attachment

Re: Remembering bug #6123

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> OK, I got rid of the parrots and candles and added a structure to
> hold the data returned only on failure.
> Am I getting closer?

Hmm, I would think you'd get assertion failures from calling
HeapTupleHeaderGetCmax when xmax isn't the current transaction.
(But I'm not sure that the regression tests really exercise such
cases ... did you try the isolation tests with this?)  I was thinking
we should probably define the cmax as being returned only in SelfUpdated
cases.

You failed to update assorted relevant comments, too.  But I can take
it from here.
        regards, tom lane


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Am I getting closer?
> 
> Hmm, I would think you'd get assertion failures from calling
> HeapTupleHeaderGetCmax when xmax isn't the current transaction.
> (But I'm not sure that the regression tests really exercise such
> cases ... did you try the isolation tests with this?)
I didn't go farther than a `make check`, but I'm doing a more
thorough set of tests now.
> I was thinking we should probably define the cmax as being
> returned only in SelfUpdated cases.
I thought about that, but didn't see how it could be other than
self-updated.  If you do, I guess I missed something.
> You failed to update assorted relevant comments, too.  But I can
> take it from here.
I can take another pass to polish it if you'd like.  I didn't expect
this to be the last version I posted; I was afraid I might still
have some restructuring to do.
-Kevin


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
>>  I was thinking we should probably define the cmax as being
>> returned only in SelfUpdated cases.
>  
> I thought about that, but didn't see how it could be other than
> self-updated.  If you do, I guess I missed something.
Oh, I see it now.  Oops.
I can fix that and the comments if you like.
-Kevin


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Hmm, I would think you'd get assertion failures from calling
> HeapTupleHeaderGetCmax when xmax isn't the current transaction.
> (But I'm not sure that the regression tests really exercise such
> cases ... did you try the isolation tests with this?)  I was
> thinking we should probably define the cmax as being returned only
> in SelfUpdated cases.

You were right.  One of the isolation tests failed an assertion;
things pass with the attached change, setting the cmax
conditionally.  Some comments updated.  I hope this is helpful.  I
can't take more time right now, because we're getting major snowfall
and I've got to leave now to get home safely.

-Kevin


Attachment

Re: Remembering bug #6123

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> You were right.  One of the isolation tests failed an assertion;
> things pass with the attached change, setting the cmax
> conditionally.  Some comments updated.  I hope this is helpful.

I worked over this patch a bit, mostly cosmetically; updated version
attached.  However, we're not there yet.  I have a couple of remaining
concerns:

1. I think the error message needs more thought.  I believe it's
possible to trigger this error not only with BEFORE triggers, but also
with a volatile function used in the query that reaches around and
updates row(s) of the target table.  Now, I don't have the slightest
qualms about breaking any apps that do anything so dirty, but should
we try to generalize the message text to cope with that?

2. The HINT needs work too, as it seems pretty useless as it stands.
I'd have expected some short reference to the technique of re-executing
the update in the trigger and then returning NULL.  (BTW, does this
patch require any documentation changes, and if so where?)

3. I modified heap_lock_tuple to also return a HeapUpdateFailureData,
principally on the grounds that its API should be largely parallel to
heap_update, but having done that I can't help wondering whether its
callers are okay with skipping self-updated tuples.  I see that you
changed EvalPlanQualFetch, but I'm not entirely sure that that is right;
shouldn't it continue to ignore rows modified by the current command
itself?  And you did not touch the other two callers, which definitely
have got issues.  Here is an example, which is basically the reference
count case refactored into a single self-referencing table, so that we
can hit both parent and child rows in one operation:

create table test (
    id int primary key,
    parent int references test,
    data text,
    nchildren int not null default 0
);

create function test_ins_func()
  returns trigger language plpgsql as
$$
begin
  if new.parent is not null then
    update test set nchildren = nchildren + 1 where id = new.parent;
  end if;
  return new;
end;
$$;
create trigger test_ins_trig before insert on test
  for each row execute procedure test_ins_func();

create function test_del_func()
  returns trigger language plpgsql as
$$
begin
  if old.parent is not null then
    update test set nchildren = nchildren - 1 where id = old.parent;
  end if;
  return old;
end;
$$;
create trigger test_del_trig before delete on test
  for each row execute procedure test_del_func();

insert into test values (1, null, 'root');
insert into test values (2, 1, 'root child A');
insert into test values (3, 1, 'root child B');
insert into test values (4, 2, 'grandchild 1');
insert into test values (5, 3, 'grandchild 2');

update test set data = 'root!' where id = 1;

select * from test;

delete from test;

select * from test;

drop table test;
drop function test_ins_func();
drop function test_del_func();

When you run this, with or without the current patch, you get:

 id | parent |     data     | nchildren
----+--------+--------------+-----------
  2 |      1 | root child A |         1
  4 |      2 | grandchild 1 |         0
  3 |      1 | root child B |         1
  5 |      3 | grandchild 2 |         0
  1 |        | root!        |         2
(5 rows)

DELETE 4
 id | parent | data  | nchildren
----+--------+-------+-----------
  1 |        | root! |         0
(1 row)

the reason being that when the outer delete arrives at the last (root!)
row, GetTupleForTrigger fails because the tuple is already self-updated,
and we treat that as canceling the outer delete action.

I'm not sure what to do about this.  If we throw an error there, there
will be no way that the trigger can override the error because it will
never get to run.  Possibly we could plow ahead with the expectation
of throwing an error later if the trigger doesn't cancel the
update/delete, but is it safe to do so if we don't hold lock on the
tuple?  In any case that idea doesn't help with the remaining caller,
ExecLockRows.

            regards, tom lane

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5f6ac2ec1fdaf7598084150d017d8c85bfeeebe5..eb94060371a41913c63b2187fb4b66012ebfaed3 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*************** simple_heap_insert(Relation relation, He
*** 2317,2343 ****
   *
   *    relation - table to be modified (caller must hold suitable lock)
   *    tid - TID of tuple to be deleted
-  *    ctid - output parameter, used only for failure case (see below)
-  *    update_xmax - output parameter, used only for failure case (see below)
   *    cid - delete command ID (used for visibility test, and stored into
   *        cmax if successful)
   *    crosscheck - if not InvalidSnapshot, also check tuple against this
   *    wait - true if should wait for any conflicting update to commit/abort
   *
   * Normal, successful return value is HeapTupleMayBeUpdated, which
   * actually means we did delete it.  Failure return codes are
   * HeapTupleSelfUpdated, HeapTupleUpdated, or HeapTupleBeingUpdated
   * (the last only possible if wait == false).
   *
!  * In the failure cases, the routine returns the tuple's t_ctid and t_xmax.
!  * If t_ctid is the same as tid, the tuple was deleted; if different, the
!  * tuple was updated, and t_ctid is the location of the replacement tuple.
!  * (t_xmax is needed to verify that the replacement tuple matches.)
   */
  HTSU_Result
  heap_delete(Relation relation, ItemPointer tid,
!             ItemPointer ctid, TransactionId *update_xmax,
!             CommandId cid, Snapshot crosscheck, bool wait)
  {
      HTSU_Result result;
      TransactionId xid = GetCurrentTransactionId();
--- 2317,2342 ----
   *
   *    relation - table to be modified (caller must hold suitable lock)
   *    tid - TID of tuple to be deleted
   *    cid - delete command ID (used for visibility test, and stored into
   *        cmax if successful)
   *    crosscheck - if not InvalidSnapshot, also check tuple against this
   *    wait - true if should wait for any conflicting update to commit/abort
+  *    hufd - output parameter, filled in failure cases (see below)
   *
   * Normal, successful return value is HeapTupleMayBeUpdated, which
   * actually means we did delete it.  Failure return codes are
   * HeapTupleSelfUpdated, HeapTupleUpdated, or HeapTupleBeingUpdated
   * (the last only possible if wait == false).
   *
!  * In the failure cases, the routine fills *hufd with the tuple's t_ctid,
!  * t_xmax, and t_cmax (the last only for HeapTupleSelfUpdated, since we
!  * cannot obtain cmax from a combocid generated by another transaction).
!  * See comments for struct HeapUpdateFailureData for additional info.
   */
  HTSU_Result
  heap_delete(Relation relation, ItemPointer tid,
!             CommandId cid, Snapshot crosscheck, bool wait,
!             HeapUpdateFailureData *hufd)
  {
      HTSU_Result result;
      TransactionId xid = GetCurrentTransactionId();
*************** l1:
*** 2498,2505 ****
                 result == HeapTupleUpdated ||
                 result == HeapTupleBeingUpdated);
          Assert(!(tp.t_data->t_infomask & HEAP_XMAX_INVALID));
!         *ctid = tp.t_data->t_ctid;
!         *update_xmax = HeapTupleHeaderGetXmax(tp.t_data);
          UnlockReleaseBuffer(buffer);
          if (have_tuple_lock)
              UnlockTuple(relation, &(tp.t_self), ExclusiveLock);
--- 2497,2508 ----
                 result == HeapTupleUpdated ||
                 result == HeapTupleBeingUpdated);
          Assert(!(tp.t_data->t_infomask & HEAP_XMAX_INVALID));
!         hufd->ctid = tp.t_data->t_ctid;
!         hufd->xmax = HeapTupleHeaderGetXmax(tp.t_data);
!         if (result == HeapTupleSelfUpdated)
!             hufd->cmax = HeapTupleHeaderGetCmax(tp.t_data);
!         else
!             hufd->cmax = 0;        /* for lack of an InvalidCommandId value */
          UnlockReleaseBuffer(buffer);
          if (have_tuple_lock)
              UnlockTuple(relation, &(tp.t_self), ExclusiveLock);
*************** void
*** 2631,2643 ****
  simple_heap_delete(Relation relation, ItemPointer tid)
  {
      HTSU_Result result;
!     ItemPointerData update_ctid;
!     TransactionId update_xmax;

      result = heap_delete(relation, tid,
-                          &update_ctid, &update_xmax,
                           GetCurrentCommandId(true), InvalidSnapshot,
!                          true /* wait for commit */ );
      switch (result)
      {
          case HeapTupleSelfUpdated:
--- 2634,2645 ----
  simple_heap_delete(Relation relation, ItemPointer tid)
  {
      HTSU_Result result;
!     HeapUpdateFailureData hufd;

      result = heap_delete(relation, tid,
                           GetCurrentCommandId(true), InvalidSnapshot,
!                          true /* wait for commit */,
!                          &hufd);
      switch (result)
      {
          case HeapTupleSelfUpdated:
*************** simple_heap_delete(Relation relation, It
*** 2668,2679 ****
   *    relation - table to be modified (caller must hold suitable lock)
   *    otid - TID of old tuple to be replaced
   *    newtup - newly constructed tuple data to store
-  *    ctid - output parameter, used only for failure case (see below)
-  *    update_xmax - output parameter, used only for failure case (see below)
   *    cid - update command ID (used for visibility test, and stored into
   *        cmax/cmin if successful)
   *    crosscheck - if not InvalidSnapshot, also check old tuple against this
   *    wait - true if should wait for any conflicting update to commit/abort
   *
   * Normal, successful return value is HeapTupleMayBeUpdated, which
   * actually means we *did* update it.  Failure return codes are
--- 2670,2680 ----
   *    relation - table to be modified (caller must hold suitable lock)
   *    otid - TID of old tuple to be replaced
   *    newtup - newly constructed tuple data to store
   *    cid - update command ID (used for visibility test, and stored into
   *        cmax/cmin if successful)
   *    crosscheck - if not InvalidSnapshot, also check old tuple against this
   *    wait - true if should wait for any conflicting update to commit/abort
+  *    hufd - output parameter, filled in failure cases (see below)
   *
   * Normal, successful return value is HeapTupleMayBeUpdated, which
   * actually means we *did* update it.  Failure return codes are
*************** simple_heap_delete(Relation relation, It
*** 2686,2700 ****
   * update was done.  However, any TOAST changes in the new tuple's
   * data are not reflected into *newtup.
   *
!  * In the failure cases, the routine returns the tuple's t_ctid and t_xmax.
!  * If t_ctid is the same as otid, the tuple was deleted; if different, the
!  * tuple was updated, and t_ctid is the location of the replacement tuple.
!  * (t_xmax is needed to verify that the replacement tuple matches.)
   */
  HTSU_Result
  heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
!             ItemPointer ctid, TransactionId *update_xmax,
!             CommandId cid, Snapshot crosscheck, bool wait)
  {
      HTSU_Result result;
      TransactionId xid = GetCurrentTransactionId();
--- 2687,2701 ----
   * update was done.  However, any TOAST changes in the new tuple's
   * data are not reflected into *newtup.
   *
!  * In the failure cases, the routine fills *hufd with the tuple's t_ctid,
!  * t_xmax, and t_cmax (the last only for HeapTupleSelfUpdated, since we
!  * cannot obtain cmax from a combocid generated by another transaction).
!  * See comments for struct HeapUpdateFailureData for additional info.
   */
  HTSU_Result
  heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
!             CommandId cid, Snapshot crosscheck, bool wait,
!             HeapUpdateFailureData *hufd)
  {
      HTSU_Result result;
      TransactionId xid = GetCurrentTransactionId();
*************** l2:
*** 2873,2880 ****
                 result == HeapTupleUpdated ||
                 result == HeapTupleBeingUpdated);
          Assert(!(oldtup.t_data->t_infomask & HEAP_XMAX_INVALID));
!         *ctid = oldtup.t_data->t_ctid;
!         *update_xmax = HeapTupleHeaderGetXmax(oldtup.t_data);
          UnlockReleaseBuffer(buffer);
          if (have_tuple_lock)
              UnlockTuple(relation, &(oldtup.t_self), ExclusiveLock);
--- 2874,2885 ----
                 result == HeapTupleUpdated ||
                 result == HeapTupleBeingUpdated);
          Assert(!(oldtup.t_data->t_infomask & HEAP_XMAX_INVALID));
!         hufd->ctid = oldtup.t_data->t_ctid;
!         hufd->xmax = HeapTupleHeaderGetXmax(oldtup.t_data);
!         if (result == HeapTupleSelfUpdated)
!             hufd->cmax = HeapTupleHeaderGetCmax(oldtup.t_data);
!         else
!             hufd->cmax = 0;        /* for lack of an InvalidCommandId value */
          UnlockReleaseBuffer(buffer);
          if (have_tuple_lock)
              UnlockTuple(relation, &(oldtup.t_self), ExclusiveLock);
*************** void
*** 3344,3356 ****
  simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup)
  {
      HTSU_Result result;
!     ItemPointerData update_ctid;
!     TransactionId update_xmax;

      result = heap_update(relation, otid, tup,
-                          &update_ctid, &update_xmax,
                           GetCurrentCommandId(true), InvalidSnapshot,
!                          true /* wait for commit */ );
      switch (result)
      {
          case HeapTupleSelfUpdated:
--- 3349,3360 ----
  simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup)
  {
      HTSU_Result result;
!     HeapUpdateFailureData hufd;

      result = heap_update(relation, otid, tup,
                           GetCurrentCommandId(true), InvalidSnapshot,
!                          true /* wait for commit */,
!                          &hufd);
      switch (result)
      {
          case HeapTupleSelfUpdated:
*************** simple_heap_update(Relation relation, It
*** 3388,3405 ****
   * Output parameters:
   *    *tuple: all fields filled in
   *    *buffer: set to buffer holding tuple (pinned but not locked at exit)
!  *    *ctid: set to tuple's t_ctid, but only in failure cases
!  *    *update_xmax: set to tuple's xmax, but only in failure cases
   *
   * Function result may be:
   *    HeapTupleMayBeUpdated: lock was successfully acquired
   *    HeapTupleSelfUpdated: lock failed because tuple updated by self
   *    HeapTupleUpdated: lock failed because tuple updated by other xact
   *
!  * In the failure cases, the routine returns the tuple's t_ctid and t_xmax.
!  * If t_ctid is the same as t_self, the tuple was deleted; if different, the
!  * tuple was updated, and t_ctid is the location of the replacement tuple.
!  * (t_xmax is needed to verify that the replacement tuple matches.)
   *
   *
   * NOTES: because the shared-memory lock table is of finite size, but users
--- 3392,3408 ----
   * Output parameters:
   *    *tuple: all fields filled in
   *    *buffer: set to buffer holding tuple (pinned but not locked at exit)
!  *    *hufd: filled in failure cases (see below)
   *
   * Function result may be:
   *    HeapTupleMayBeUpdated: lock was successfully acquired
   *    HeapTupleSelfUpdated: lock failed because tuple updated by self
   *    HeapTupleUpdated: lock failed because tuple updated by other xact
   *
!  * In the failure cases, the routine fills *hufd with the tuple's t_ctid,
!  * t_xmax, and t_cmax (the last only for HeapTupleSelfUpdated, since we
!  * cannot obtain cmax from a combocid generated by another transaction).
!  * See comments for struct HeapUpdateFailureData for additional info.
   *
   *
   * NOTES: because the shared-memory lock table is of finite size, but users
*************** simple_heap_update(Relation relation, It
*** 3435,3443 ****
   * conflict for a tuple, we don't incur any extra overhead.
   */
  HTSU_Result
! heap_lock_tuple(Relation relation, HeapTuple tuple, Buffer *buffer,
!                 ItemPointer ctid, TransactionId *update_xmax,
!                 CommandId cid, LockTupleMode mode, bool nowait)
  {
      HTSU_Result result;
      ItemPointer tid = &(tuple->t_self);
--- 3438,3446 ----
   * conflict for a tuple, we don't incur any extra overhead.
   */
  HTSU_Result
! heap_lock_tuple(Relation relation, HeapTuple tuple,
!                 CommandId cid, LockTupleMode mode, bool nowait,
!                 Buffer *buffer, HeapUpdateFailureData *hufd)
  {
      HTSU_Result result;
      ItemPointer tid = &(tuple->t_self);
*************** l3:
*** 3622,3629 ****
      {
          Assert(result == HeapTupleSelfUpdated || result == HeapTupleUpdated);
          Assert(!(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));
!         *ctid = tuple->t_data->t_ctid;
!         *update_xmax = HeapTupleHeaderGetXmax(tuple->t_data);
          LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
          if (have_tuple_lock)
              UnlockTuple(relation, tid, tuple_lock_type);
--- 3625,3636 ----
      {
          Assert(result == HeapTupleSelfUpdated || result == HeapTupleUpdated);
          Assert(!(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));
!         hufd->ctid = tuple->t_data->t_ctid;
!         hufd->xmax = HeapTupleHeaderGetXmax(tuple->t_data);
!         if (result == HeapTupleSelfUpdated)
!             hufd->cmax = HeapTupleHeaderGetCmax(tuple->t_data);
!         else
!             hufd->cmax = 0;        /* for lack of an InvalidCommandId value */
          LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
          if (have_tuple_lock)
              UnlockTuple(relation, tid, tuple_lock_type);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index f5e12e5681ddfbcc041bf636140d5179d0e45152..083095db0a93365cd4a253990cafbae77fa7aaac 100644
*** a/src/backend/commands/trigger.c
--- b/src/backend/commands/trigger.c
*************** GetTupleForTrigger(EState *estate,
*** 2565,2572 ****
      if (newSlot != NULL)
      {
          HTSU_Result test;
!         ItemPointerData update_ctid;
!         TransactionId update_xmax;

          *newSlot = NULL;

--- 2565,2571 ----
      if (newSlot != NULL)
      {
          HTSU_Result test;
!         HeapUpdateFailureData hufd;

          *newSlot = NULL;

*************** GetTupleForTrigger(EState *estate,
*** 2578,2587 ****
           */
  ltrmark:;
          tuple.t_self = *tid;
!         test = heap_lock_tuple(relation, &tuple, &buffer,
!                                &update_ctid, &update_xmax,
                                 estate->es_output_cid,
!                                LockTupleExclusive, false);
          switch (test)
          {
              case HeapTupleSelfUpdated:
--- 2577,2586 ----
           */
  ltrmark:;
          tuple.t_self = *tid;
!         test = heap_lock_tuple(relation, &tuple,
                                 estate->es_output_cid,
!                                LockTupleExclusive, false /* wait */,
!                                &buffer, &hufd);
          switch (test)
          {
              case HeapTupleSelfUpdated:
*************** ltrmark:;
*** 2598,2604 ****
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (!ItemPointerEquals(&update_ctid, &tuple.t_self))
                  {
                      /* it was updated, so look at the updated version */
                      TupleTableSlot *epqslot;
--- 2597,2603 ----
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (!ItemPointerEquals(&hufd.ctid, &tuple.t_self))
                  {
                      /* it was updated, so look at the updated version */
                      TupleTableSlot *epqslot;
*************** ltrmark:;
*** 2607,2617 ****
                                             epqstate,
                                             relation,
                                             relinfo->ri_RangeTableIndex,
!                                            &update_ctid,
!                                            update_xmax);
                      if (!TupIsNull(epqslot))
                      {
!                         *tid = update_ctid;
                          *newSlot = epqslot;

                          /*
--- 2606,2616 ----
                                             epqstate,
                                             relation,
                                             relinfo->ri_RangeTableIndex,
!                                            &hufd.ctid,
!                                            hufd.xmax);
                      if (!TupIsNull(epqslot))
                      {
!                         *tid = hufd.ctid;
                          *newSlot = epqslot;

                          /*
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 569d0ba7ade1e0cd64c5566f2ecbbd38773f2b25..0b9e1e48901424e942c6d971952de3bd8d9d0374 100644
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
*************** EvalPlanQualFetch(EState *estate, Relati
*** 1857,1864 ****
          if (heap_fetch(relation, &SnapshotDirty, &tuple, &buffer, true, NULL))
          {
              HTSU_Result test;
!             ItemPointerData update_ctid;
!             TransactionId update_xmax;

              /*
               * If xmin isn't what we're expecting, the slot must have been
--- 1857,1863 ----
          if (heap_fetch(relation, &SnapshotDirty, &tuple, &buffer, true, NULL))
          {
              HTSU_Result test;
!             HeapUpdateFailureData hufd;

              /*
               * If xmin isn't what we're expecting, the slot must have been
*************** EvalPlanQualFetch(EState *estate, Relati
*** 1911,1928 ****
              /*
               * This is a live tuple, so now try to lock it.
               */
!             test = heap_lock_tuple(relation, &tuple, &buffer,
!                                    &update_ctid, &update_xmax,
                                     estate->es_output_cid,
!                                    lockmode, false);
              /* We now have two pins on the buffer, get rid of one */
              ReleaseBuffer(buffer);

              switch (test)
              {
                  case HeapTupleSelfUpdated:
-                     /* treat it as deleted; do not process */
                      ReleaseBuffer(buffer);
                      return NULL;

                  case HeapTupleMayBeUpdated:
--- 1910,1935 ----
              /*
               * This is a live tuple, so now try to lock it.
               */
!             test = heap_lock_tuple(relation, &tuple,
                                     estate->es_output_cid,
!                                    lockmode, false /* wait */,
!                                    &buffer, &hufd);
              /* We now have two pins on the buffer, get rid of one */
              ReleaseBuffer(buffer);

              switch (test)
              {
                  case HeapTupleSelfUpdated:
                      ReleaseBuffer(buffer);
+                     if (!ItemPointerEquals(&hufd.ctid, &tuple.t_self))
+                     {
+                         /* it was updated, so look at the updated version */
+                         tuple.t_self = hufd.ctid;
+                         /* updated row should have xmin matching this xmax */
+                         priorXmax = hufd.xmax;
+                         continue;
+                     }
+                     /* tuple was deleted, so give up */
                      return NULL;

                  case HeapTupleMayBeUpdated:
*************** EvalPlanQualFetch(EState *estate, Relati
*** 1935,1946 ****
                          ereport(ERROR,
                                  (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                                   errmsg("could not serialize access due to concurrent update")));
!                     if (!ItemPointerEquals(&update_ctid, &tuple.t_self))
                      {
                          /* it was updated, so look at the updated version */
!                         tuple.t_self = update_ctid;
                          /* updated row should have xmin matching this xmax */
!                         priorXmax = update_xmax;
                          continue;
                      }
                      /* tuple was deleted, so give up */
--- 1942,1953 ----
                          ereport(ERROR,
                                  (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                                   errmsg("could not serialize access due to concurrent update")));
!                     if (!ItemPointerEquals(&hufd.ctid, &tuple.t_self))
                      {
                          /* it was updated, so look at the updated version */
!                         tuple.t_self = hufd.ctid;
                          /* updated row should have xmin matching this xmax */
!                         priorXmax = hufd.xmax;
                          continue;
                      }
                      /* tuple was deleted, so give up */
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index d2a33cb35682e23168d509cffe5f5f179f25249b..9893d767a12d584486adf91a6a0572e63cabc288 100644
*** a/src/backend/executor/nodeLockRows.c
--- b/src/backend/executor/nodeLockRows.c
*************** lnext:
*** 70,77 ****
          bool        isNull;
          HeapTupleData tuple;
          Buffer        buffer;
!         ItemPointerData update_ctid;
!         TransactionId update_xmax;
          LockTupleMode lockmode;
          HTSU_Result test;
          HeapTuple    copyTuple;
--- 70,76 ----
          bool        isNull;
          HeapTupleData tuple;
          Buffer        buffer;
!         HeapUpdateFailureData hufd;
          LockTupleMode lockmode;
          HTSU_Result test;
          HeapTuple    copyTuple;
*************** lnext:
*** 116,125 ****
          else
              lockmode = LockTupleShared;

!         test = heap_lock_tuple(erm->relation, &tuple, &buffer,
!                                &update_ctid, &update_xmax,
                                 estate->es_output_cid,
!                                lockmode, erm->noWait);
          ReleaseBuffer(buffer);
          switch (test)
          {
--- 115,124 ----
          else
              lockmode = LockTupleShared;

!         test = heap_lock_tuple(erm->relation, &tuple,
                                 estate->es_output_cid,
!                                lockmode, erm->noWait,
!                                &buffer, &hufd);
          ReleaseBuffer(buffer);
          switch (test)
          {
*************** lnext:
*** 136,142 ****
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (ItemPointerEquals(&update_ctid,
                                        &tuple.t_self))
                  {
                      /* Tuple was deleted, so don't return it */
--- 135,141 ----
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (ItemPointerEquals(&hufd.ctid,
                                        &tuple.t_self))
                  {
                      /* Tuple was deleted, so don't return it */
*************** lnext:
*** 145,151 ****

                  /* updated, so fetch and lock the updated version */
                  copyTuple = EvalPlanQualFetch(estate, erm->relation, lockmode,
!                                               &update_ctid, update_xmax);

                  if (copyTuple == NULL)
                  {
--- 144,150 ----

                  /* updated, so fetch and lock the updated version */
                  copyTuple = EvalPlanQualFetch(estate, erm->relation, lockmode,
!                                               &hufd.ctid, hufd.xmax);

                  if (copyTuple == NULL)
                  {
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 37b70b88d53f6f8589c2fbb3036e5f1dc27f95ad..30ff02a45697320aeffc75155bcdbfeb733747db 100644
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
*************** ExecDelete(ItemPointer tupleid,
*** 294,301 ****
      ResultRelInfo *resultRelInfo;
      Relation    resultRelationDesc;
      HTSU_Result result;
!     ItemPointerData update_ctid;
!     TransactionId update_xmax;

      /*
       * get information on the (current) result relation
--- 294,300 ----
      ResultRelInfo *resultRelInfo;
      Relation    resultRelationDesc;
      HTSU_Result result;
!     HeapUpdateFailureData hufd;

      /*
       * get information on the (current) result relation
*************** ExecDelete(ItemPointer tupleid,
*** 347,360 ****
           */
  ldelete:;
          result = heap_delete(resultRelationDesc, tupleid,
-                              &update_ctid, &update_xmax,
                               estate->es_output_cid,
                               estate->es_crosscheck_snapshot,
!                              true /* wait for commit */ );
          switch (result)
          {
              case HeapTupleSelfUpdated:
!                 /* already deleted by self; nothing to do */
                  return NULL;

              case HeapTupleMayBeUpdated:
--- 346,389 ----
           */
  ldelete:;
          result = heap_delete(resultRelationDesc, tupleid,
                               estate->es_output_cid,
                               estate->es_crosscheck_snapshot,
!                              true /* wait for commit */,
!                              &hufd);
          switch (result)
          {
              case HeapTupleSelfUpdated:
!                 /*
!                  * The target tuple was already updated or deleted by the
!                  * current command, or by a later command in the current
!                  * transaction.  The former case is possible in a join DELETE
!                  * where multiple tuples join to the same target tuple.
!                  * This is somewhat questionable, but Postgres has always
!                  * allowed it: we just ignore additional deletion attempts.
!                  *
!                  * The latter case arises if the tuple is modified by a
!                  * command in a BEFORE trigger, or perhaps by a command in a
!                  * volatile function used in the query.  In such situations we
!                  * should not ignore the deletion, but it is equally unsafe to
!                  * proceed.  We don't want to discard the original DELETE
!                  * while keeping the triggered actions based on its deletion;
!                  * and it would be no better to allow the original DELETE
!                  * while discarding updates that it triggered.  The row update
!                  * carries some information that might be important according
!                  * to business rules; so throwing an error is the only safe
!                  * course.
!                  *
!                  * If a trigger actually intends this type of interaction,
!                  * it can re-execute the DELETE and then return NULL to
!                  * cancel the outer delete.
!                  */
!                 if (hufd.cmax != estate->es_output_cid)
!                     ereport(ERROR,
!                             (errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
!                              errmsg("cannot update or delete a row from its BEFORE DELETE trigger"),
!                              errhint("Consider moving code to an AFTER DELETE trigger.")));
!
!                 /* Else, already deleted by self; nothing to do */
                  return NULL;

              case HeapTupleMayBeUpdated:
*************** ldelete:;
*** 365,371 ****
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (!ItemPointerEquals(tupleid, &update_ctid))
                  {
                      TupleTableSlot *epqslot;

--- 394,400 ----
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (!ItemPointerEquals(tupleid, &hufd.ctid))
                  {
                      TupleTableSlot *epqslot;

*************** ldelete:;
*** 373,383 ****
                                             epqstate,
                                             resultRelationDesc,
                                             resultRelInfo->ri_RangeTableIndex,
!                                            &update_ctid,
!                                            update_xmax);
                      if (!TupIsNull(epqslot))
                      {
!                         *tupleid = update_ctid;
                          goto ldelete;
                      }
                  }
--- 402,412 ----
                                             epqstate,
                                             resultRelationDesc,
                                             resultRelInfo->ri_RangeTableIndex,
!                                            &hufd.ctid,
!                                            hufd.xmax);
                      if (!TupIsNull(epqslot))
                      {
!                         *tupleid = hufd.ctid;
                          goto ldelete;
                      }
                  }
*************** ExecUpdate(ItemPointer tupleid,
*** 481,488 ****
      ResultRelInfo *resultRelInfo;
      Relation    resultRelationDesc;
      HTSU_Result result;
!     ItemPointerData update_ctid;
!     TransactionId update_xmax;
      List       *recheckIndexes = NIL;

      /*
--- 510,516 ----
      ResultRelInfo *resultRelInfo;
      Relation    resultRelationDesc;
      HTSU_Result result;
!     HeapUpdateFailureData hufd;
      List       *recheckIndexes = NIL;

      /*
*************** lreplace:;
*** 563,576 ****
           * mode transactions.
           */
          result = heap_update(resultRelationDesc, tupleid, tuple,
-                              &update_ctid, &update_xmax,
                               estate->es_output_cid,
                               estate->es_crosscheck_snapshot,
!                              true /* wait for commit */ );
          switch (result)
          {
              case HeapTupleSelfUpdated:
!                 /* already deleted by self; nothing to do */
                  return NULL;

              case HeapTupleMayBeUpdated:
--- 591,633 ----
           * mode transactions.
           */
          result = heap_update(resultRelationDesc, tupleid, tuple,
                               estate->es_output_cid,
                               estate->es_crosscheck_snapshot,
!                              true /* wait for commit */,
!                              &hufd);
          switch (result)
          {
              case HeapTupleSelfUpdated:
!                 /*
!                  * The target tuple was already updated or deleted by the
!                  * current command, or by a later command in the current
!                  * transaction.  The former case is possible in a join UPDATE
!                  * where multiple tuples join to the same target tuple.
!                  * This is pretty questionable, but Postgres has always
!                  * allowed it: we just execute the first update action and
!                  * ignore additional update attempts.
!                  *
!                  * The latter case arises if the tuple is modified by a
!                  * command in a BEFORE trigger, or perhaps by a command in a
!                  * volatile function used in the query.  In such situations we
!                  * should not ignore the update, but it is equally unsafe to
!                  * proceed.  We don't want to discard the original UPDATE
!                  * while keeping the triggered actions based on it; and we
!                  * have no principled way to merge this update with the
!                  * previous ones.  So throwing an error is the only safe
!                  * course.
!                  *
!                  * If a trigger actually intends this type of interaction,
!                  * it can re-execute the UPDATE (assuming it can figure out
!                  * how) and then return NULL to cancel the outer update.
!                  */
!                 if (hufd.cmax != estate->es_output_cid)
!                     ereport(ERROR,
!                             (errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
!                              errmsg("cannot update or delete a row from its BEFORE UPDATE trigger"),
!                              errhint("Consider moving code to an AFTER UPDATE trigger.")));
!
!                 /* Else, already updated by self; nothing to do */
                  return NULL;

              case HeapTupleMayBeUpdated:
*************** lreplace:;
*** 581,587 ****
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (!ItemPointerEquals(tupleid, &update_ctid))
                  {
                      TupleTableSlot *epqslot;

--- 638,644 ----
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (!ItemPointerEquals(tupleid, &hufd.ctid))
                  {
                      TupleTableSlot *epqslot;

*************** lreplace:;
*** 589,599 ****
                                             epqstate,
                                             resultRelationDesc,
                                             resultRelInfo->ri_RangeTableIndex,
!                                            &update_ctid,
!                                            update_xmax);
                      if (!TupIsNull(epqslot))
                      {
!                         *tupleid = update_ctid;
                          slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
                          tuple = ExecMaterializeSlot(slot);
                          goto lreplace;
--- 646,656 ----
                                             epqstate,
                                             resultRelationDesc,
                                             resultRelInfo->ri_RangeTableIndex,
!                                            &hufd.ctid,
!                                            hufd.xmax);
                      if (!TupIsNull(epqslot))
                      {
!                         *tupleid = hufd.ctid;
                          slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
                          tuple = ExecMaterializeSlot(slot);
                          goto lreplace;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fa38803b249ffd2407d77f6efd4e0203975f2349..25f3b988e0e3de7d1d24d6402d22bae8f67f9945 100644
*** a/src/include/access/heapam.h
--- b/src/include/access/heapam.h
*************** typedef enum
*** 35,40 ****
--- 35,63 ----
      LockTupleExclusive
  } LockTupleMode;

+ /*
+  * When heap_update, heap_delete, or heap_lock_tuple fails because the target
+  * tuple is already outdated, it fills in this struct to provide information
+  * to the caller about what happened.
+  * ctid is the target's ctid link: it is the same as the target's TID if the
+  * target was deleted, or the location of the replacement tuple if the target
+  * was updated.
+  * xmax is the outdating transaction's XID.  If the caller wants to visit the
+  * replacement tuple, it must check that this matches before believing the
+  * replacement is really a match.
+  * cmax is the outdating command's CID, but only when the failure code is
+  * HeapTupleSelfUpdated (i.e., something in the current transaction outdated
+  * the tuple); otherwise cmax is zero.  (We make this restriction because
+  * HeapTupleHeaderGetCmax doesn't work for tuples outdated in other
+  * transactions.)
+  */
+ typedef struct HeapUpdateFailureData
+ {
+     ItemPointerData        ctid;
+     TransactionId        xmax;
+     CommandId            cmax;
+ } HeapUpdateFailureData;
+

  /* ----------------
   *        function prototypes for heap access method
*************** extern Oid heap_insert(Relation relation
*** 100,115 ****
  extern void heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
                    CommandId cid, int options, BulkInsertState bistate);
  extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
!             ItemPointer ctid, TransactionId *update_xmax,
!             CommandId cid, Snapshot crosscheck, bool wait);
  extern HTSU_Result heap_update(Relation relation, ItemPointer otid,
              HeapTuple newtup,
!             ItemPointer ctid, TransactionId *update_xmax,
!             CommandId cid, Snapshot crosscheck, bool wait);
  extern HTSU_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
!                 Buffer *buffer, ItemPointer ctid,
!                 TransactionId *update_xmax, CommandId cid,
!                 LockTupleMode mode, bool nowait);
  extern void heap_inplace_update(Relation relation, HeapTuple tuple);
  extern bool heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
                    Buffer buf);
--- 123,137 ----
  extern void heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
                    CommandId cid, int options, BulkInsertState bistate);
  extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
!             CommandId cid, Snapshot crosscheck, bool wait,
!             HeapUpdateFailureData *hufd);
  extern HTSU_Result heap_update(Relation relation, ItemPointer otid,
              HeapTuple newtup,
!             CommandId cid, Snapshot crosscheck, bool wait,
!             HeapUpdateFailureData *hufd);
  extern HTSU_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
!                 CommandId cid, LockTupleMode mode, bool nowait,
!                 Buffer *buffer, HeapUpdateFailureData *hufd);
  extern void heap_inplace_update(Relation relation, HeapTuple tuple);
  extern bool heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
                    Buffer buf);
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index b4d391974d21e9e94e75124c1cd0e022527f68e9..2392598061ff648c36fc863d908e7ff38ff2b727 100644
*** a/src/test/regress/expected/triggers.out
--- b/src/test/regress/expected/triggers.out
*************** NOTICE:  drop cascades to 2 other object
*** 1443,1445 ****
--- 1443,1575 ----
  DETAIL:  drop cascades to view city_view
  drop cascades to view european_city_view
  DROP TABLE country_table;
+ --
+ -- Test updates to rows during firing of BEFORE ROW triggers.
+ -- As of 9.2, such cases should be rejected (see bug #6123).
+ --
+ create temp table parent (
+     aid int not null primary key,
+     val1 text,
+     val2 text,
+     val3 text,
+     val4 text,
+     bcnt int not null default 0);
+ NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "parent_pkey" for table "parent"
+ create temp table child (
+     bid int not null primary key,
+     aid int not null,
+     val1 text);
+ NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "child_pkey" for table "child"
+ create function parent_upd_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   if old.val1 <> new.val1 then
+     new.val2 = new.val1;
+     delete from child where child.aid = new.aid and child.val1 = new.val1;
+   end if;
+   return new;
+ end;
+ $$;
+ create trigger parent_upd_trig before update on parent
+   for each row execute procedure parent_upd_func();
+ create function parent_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   delete from child where aid = old.aid;
+   return old;
+ end;
+ $$;
+ create trigger parent_del_trig before delete on parent
+   for each row execute procedure parent_del_func();
+ create function child_ins_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   update parent set bcnt = bcnt + 1 where aid = new.aid;
+   return new;
+ end;
+ $$;
+ create trigger child_ins_trig after insert on child
+   for each row execute procedure child_ins_func();
+ create function child_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   update parent set bcnt = bcnt - 1 where aid = old.aid;
+   return old;
+ end;
+ $$;
+ create trigger child_del_trig after delete on child
+   for each row execute procedure child_del_func();
+ insert into parent values (1, 'a', 'a', 'a', 'a', 0);
+ insert into child values (10, 1, 'b');
+ select * from parent; select * from child;
+  aid | val1 | val2 | val3 | val4 | bcnt
+ -----+------+------+------+------+------
+    1 | a    | a    | a    | a    |    1
+ (1 row)
+
+  bid | aid | val1
+ -----+-----+------
+   10 |   1 | b
+ (1 row)
+
+ update parent set val1 = 'b' where aid = 1; -- should fail
+ ERROR:  cannot update or delete a row from its BEFORE UPDATE trigger
+ HINT:  Consider moving code to an AFTER UPDATE trigger.
+ select * from parent; select * from child;
+  aid | val1 | val2 | val3 | val4 | bcnt
+ -----+------+------+------+------+------
+    1 | a    | a    | a    | a    |    1
+ (1 row)
+
+  bid | aid | val1
+ -----+-----+------
+   10 |   1 | b
+ (1 row)
+
+ delete from parent where aid = 1; -- should fail
+ ERROR:  cannot update or delete a row from its BEFORE DELETE trigger
+ HINT:  Consider moving code to an AFTER DELETE trigger.
+ select * from parent; select * from child;
+  aid | val1 | val2 | val3 | val4 | bcnt
+ -----+------+------+------+------+------
+    1 | a    | a    | a    | a    |    1
+ (1 row)
+
+  bid | aid | val1
+ -----+-----+------
+   10 |   1 | b
+ (1 row)
+
+ -- replace the trigger function with one that restarts the deletion after
+ -- having modified a child
+ create or replace function parent_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   delete from child where aid = old.aid;
+   if found then
+     delete from parent where aid = old.aid;
+     return null; -- cancel outer deletion
+   end if;
+   return old;
+ end;
+ $$;
+ delete from parent where aid = 1;
+ select * from parent; select * from child;
+  aid | val1 | val2 | val3 | val4 | bcnt
+ -----+------+------+------+------+------
+ (0 rows)
+
+  bid | aid | val1
+ -----+-----+------
+ (0 rows)
+
+ drop table parent, child;
+ drop function parent_upd_func();
+ drop function parent_del_func();
+ drop function child_ins_func();
+ drop function child_del_func();
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index e52131dbba2d7a45c5f9365f24721de8a07e3253..1848868eca8e4d852ca614ad0336e1168a028539 100644
*** a/src/test/regress/sql/triggers.sql
--- b/src/test/regress/sql/triggers.sql
*************** SELECT * FROM city_view;
*** 961,963 ****
--- 961,1062 ----

  DROP TABLE city_table CASCADE;
  DROP TABLE country_table;
+
+ --
+ -- Test updates to rows during firing of BEFORE ROW triggers.
+ -- As of 9.2, such cases should be rejected (see bug #6123).
+ --
+
+ create temp table parent (
+     aid int not null primary key,
+     val1 text,
+     val2 text,
+     val3 text,
+     val4 text,
+     bcnt int not null default 0);
+ create temp table child (
+     bid int not null primary key,
+     aid int not null,
+     val1 text);
+
+ create function parent_upd_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   if old.val1 <> new.val1 then
+     new.val2 = new.val1;
+     delete from child where child.aid = new.aid and child.val1 = new.val1;
+   end if;
+   return new;
+ end;
+ $$;
+ create trigger parent_upd_trig before update on parent
+   for each row execute procedure parent_upd_func();
+
+ create function parent_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   delete from child where aid = old.aid;
+   return old;
+ end;
+ $$;
+ create trigger parent_del_trig before delete on parent
+   for each row execute procedure parent_del_func();
+
+ create function child_ins_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   update parent set bcnt = bcnt + 1 where aid = new.aid;
+   return new;
+ end;
+ $$;
+ create trigger child_ins_trig after insert on child
+   for each row execute procedure child_ins_func();
+
+ create function child_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   update parent set bcnt = bcnt - 1 where aid = old.aid;
+   return old;
+ end;
+ $$;
+ create trigger child_del_trig after delete on child
+   for each row execute procedure child_del_func();
+
+ insert into parent values (1, 'a', 'a', 'a', 'a', 0);
+ insert into child values (10, 1, 'b');
+ select * from parent; select * from child;
+
+ update parent set val1 = 'b' where aid = 1; -- should fail
+ select * from parent; select * from child;
+
+ delete from parent where aid = 1; -- should fail
+ select * from parent; select * from child;
+
+ -- replace the trigger function with one that restarts the deletion after
+ -- having modified a child
+ create or replace function parent_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   delete from child where aid = old.aid;
+   if found then
+     delete from parent where aid = old.aid;
+     return null; -- cancel outer deletion
+   end if;
+   return old;
+ end;
+ $$;
+
+ delete from parent where aid = 1;
+ select * from parent; select * from child;
+
+ drop table parent, child;
+
+ drop function parent_upd_func();
+ drop function parent_del_func();
+ drop function child_ins_func();
+ drop function child_del_func();

Re: Remembering bug #6123

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 3. I modified heap_lock_tuple to also return a
>> HeapUpdateFailureData, principally on the grounds that its API
>> should be largely parallel to heap_update, but having done that I
>> can't help wondering whether its callers are okay with skipping
>> self-updated tuples.  I see that you changed EvalPlanQualFetch,
>> but I'm not entirely sure that that is right; shouldn't it
>> continue to ignore rows modified by the current command itself?
> I made that change when working on the approach where "safe"
> conflicts (like a DELETE from within the BEFORE DELETE trigger) were
> quietly ignored.  Without that change, it didn't see the end of the
> ctid chain, and didn't behave correctly.  I've been wondering if it
> is still needed.  I had been planning to create a test case to try
> to show that it was.  Maybe an UPDATE with a join to force multiple
> row updates from the "primary" statement, followed by a triggered
> UPDATE to the row.  If that doesn't need the EvalPlanQualFetch
> change, I don't know what would.

The EvalPlanQual code isn't really exercised by any existing regression
tests, because it is only triggered when two sessions concurrently
update the same row, something that we avoid for reproducibility's sake
in the regression tests.  I think we could now test it using the
isolation test scaffolding, and maybe it would be a good idea to add
some tests there.  I did verify that whether that part of your patch
is included or not makes no difference to any existing test.
        regards, tom lane


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I worked over this patch a bit, mostly cosmetically; updated
> version attached.
Thanks!
> However, we're not there yet.  I have a couple of remaining
> concerns:
> 
> 1. I think the error message needs more thought.  I believe it's
> possible to trigger this error not only with BEFORE triggers, but
> also with a volatile function used in the query that reaches
> around and updates row(s) of the target table.  Now, I don't have
> the slightest qualms about breaking any apps that do anything so
> dirty, but should we try to generalize the message text to cope
> with that?
I hadn't thought of that, but it does seem possible.  Maybe after
dealing with the other points, I'll work on a test case to show that
behavior.
I'm also fine with generating an error for such dirty tricks, and I
agree that if that's indeed possible we should make the message
general enough to cover that case.  Nothing comes to mind at the
moment, but I'll think on it.
> 2. The HINT needs work too, as it seems pretty useless as it
> stands.  I'd have expected some short reference to the technique
> of re-executing the update in the trigger and then returning NULL.
In the previous (rather long) thread on the topic, it seemed that
most cases where people have hit this, the problem was easily solved
by moving the offending code to the AFTER trigger.  The technique of
re-issuing the command didn't turn up until much later.  I would bet
it will be the right thing to do 20% of the time when people get
such an error.  I don't want to leave the 80% solution out of the
hint, but if you don't think it's too verbose, I guess it would be a
good thing to mention the 20% solution, too.
> (BTW, does this patch require any documentation changes, and if so
> where?)
I've been wondering that.  Perhaps a paragraph or two with an
example on this page?:
http://www.postgresql.org/docs/devel/static/trigger-definition.html
> 3. I modified heap_lock_tuple to also return a
> HeapUpdateFailureData, principally on the grounds that its API
> should be largely parallel to heap_update, but having done that I
> can't help wondering whether its callers are okay with skipping
> self-updated tuples.  I see that you changed EvalPlanQualFetch,
> but I'm not entirely sure that that is right; shouldn't it
> continue to ignore rows modified by the current command itself?
I made that change when working on the approach where "safe"
conflicts (like a DELETE from within the BEFORE DELETE trigger) were
quietly ignored.  Without that change, it didn't see the end of the
ctid chain, and didn't behave correctly.  I've been wondering if it
is still needed.  I had been planning to create a test case to try
to show that it was.  Maybe an UPDATE with a join to force multiple
row updates from the "primary" statement, followed by a triggered
UPDATE to the row.  If that doesn't need the EvalPlanQualFetch
change, I don't know what would.
> And you did not touch the other two callers, which definitely
> have got issues.  Here is an example
> [example]
> I'm not sure what to do about this.  If we throw an error there,
> there will be no way that the trigger can override the error
> because it will never get to run.  Possibly we could plow ahead
> with the expectation of throwing an error later if the trigger
> doesn't cancel the update/delete, but is it safe to do so if we
> don't hold lock on the tuple?  In any case that idea doesn't help
> with the remaining caller, ExecLockRows.
It will take me some time to work though the example and review the
relevant code.
-Kevin


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
[rearranging so results directly follow statements]
> select * from test;
>  id | parent |     data     | nchildren 
> ----+--------+--------------+-----------
>   2 |      1 | root child A |         1
>   4 |      2 | grandchild 1 |         0
>   3 |      1 | root child B |         1
>   5 |      3 | grandchild 2 |         0
>   1 |        | root!        |         2
> (5 rows)
> delete from test;
> DELETE 4
> select * from test;
>  id | parent | data  | nchildren 
> ----+--------+-------+-----------
>   1 |        | root! |         0
> (1 row)
And other minor updates to the data column can result in totally
different sets of rows left over after a DELETE from the table with
no WHERE clause.  It makes me pretty queasy whenever the semantics
of a statement can depend on the order of tuples in the heap.  It's
pretty hard to view this particular case as anything other than a
bug.
> the reason being that when the outer delete arrives at the last
> (root!) row, GetTupleForTrigger fails because the tuple is already
> self-updated, and we treat that as canceling the outer delete
> action.
> 
> I'm not sure what to do about this.  If we throw an error there,
> there will be no way that the trigger can override the error
> because it will never get to run.  Possibly we could plow ahead
> with the expectation of throwing an error later if the trigger
> doesn't cancel the update/delete, but is it safe to do so if we
> don't hold lock on the tuple?  In any case that idea doesn't help
> with the remaining caller, ExecLockRows.
I'm still trying to sort through what could be done at the source
code level, but from a user level I would much rather see an error
than such surprising and unpredictable behavior.
-Kevin


Re: Remembering bug #6123

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not sure what to do about this.  If we throw an error there,
>> there will be no way that the trigger can override the error
>> because it will never get to run.  Possibly we could plow ahead
>> with the expectation of throwing an error later if the trigger
>> doesn't cancel the update/delete, but is it safe to do so if we
>> don't hold lock on the tuple?  In any case that idea doesn't help
>> with the remaining caller, ExecLockRows.
> I'm still trying to sort through what could be done at the source
> code level, but from a user level I would much rather see an error
> than such surprising and unpredictable behavior.

I don't object to throwing an error by default.  What I'm wondering is
what would have to be done to make such updates work safely.  AFAICT,
throwing an error in GetTupleForTrigger would preclude any chance of
coding a trigger to make this work, which IMO greatly weakens the
argument that this whole approach is acceptable.

In this particular example, I think it would work just as well to do the
reference-count updates in AFTER triggers, and maybe the short answer
is to tell people they have to do it like that instead of in BEFORE
triggers.  However, I wonder what use-case led you to file bug #6123 to
begin with.
        regards, tom lane


Re: Remembering bug #6123

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> I'm also fine with generating an error for such dirty tricks, and I
> agree that if that's indeed possible we should make the message
> general enough to cover that case.  Nothing comes to mind at the
> moment, but I'll think on it.

What do you think of

ERROR: tuple to be updated was already modified by an operation triggered by the UPDATE command
HINT: Consider using an AFTER trigger instead of a BEFORE trigger to propagate changes to other rows.

(s/update/delete/ for the DELETE case of course)

The phrase "triggered by" seems slippery enough to cover cases such as a
volatile function executed by the UPDATE.  The HINT doesn't cover that
case of course, but we have a ground rule that HINTs can be wrong.
        regards, tom lane


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What do you think of
> 
> ERROR: tuple to be updated was already modified by an operation
> triggered by the UPDATE command
> HINT: Consider using an AFTER trigger instead of a BEFORE trigger
> to propagate changes to other rows.
> 
> (s/update/delete/ for the DELETE case of course)
> 
> The phrase "triggered by" seems slippery enough to cover cases
> such as a volatile function executed by the UPDATE.  The HINT
> doesn't cover that case of course, but we have a ground rule that
> HINTs can be wrong.
Looks good to me.
-Kevin


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In this particular example, I think it would work just as well to
> do the reference-count updates in AFTER triggers, and maybe the
> short answer is to tell people they have to do it like that
> instead of in BEFORE triggers.
I think that is quite often the right answer.
> However, I wonder what use-case led you to file bug #6123 to begin
> with.
In our Circuit Court software, we have about 1600 trigger functions
on about 400 tables, and this summer we converted them from our Java
middle tier framework to native PostgreSQL triggers.  Since we had
been writing them in our interpretation of ANSI SQL trigger code,
parsing that, and using the parse tree to build a Java class for
each trigger, we were able to generate the PostgreSQL trigger
functions and CREATE TRIGGER statement mechanically (from the parse
tree), with pretty good success.  In testing, though, our business
analysts noticed a number of situations where an attempt to delete a
row actually deleted related rows which should have gone away with
the row they were directly trying to delete, but the target row was
still there.  A few days of investigation, including stepping
through query execution in gdb, turned up this issue.
Having identified (at least one flavor of) the problem, we grepped
the source code for the BEFORE triggers for UPDATE and DELETE
statements, and were able to fix a number of them by moving code to
AFTER triggers or setting values into NEW fields rather than running
an UPDATE.  So far, so good.
But there were a number of situations where the DELETE of a row
needed to cause related rows in other tables to be deleted, and for
one reason or another a foreign key with ON DELETE CASCADE was not
an option.  At the same time, triggers on some of those related
tables needed to update summary or redundant data in other tables
for performance reasons.  Because a number of tables could be
involved, and some of the triggers (at the "lower" levels) could be
AFTER triggers and still contribute to the problem, (1) we had no
reliable way to ensure we would find all of the cases of this on all
code paths, and (2) due to referential integrity and other trigger-
based validations, it would be hard to restructure such that the
DELETE of the "child" rows was not done on the BEFORE DELETE trigger
of the "parent".
The patch we've been using in production throws errors if the row
for a BEFORE UPDATE trigger is updated by another statement.  (Well,
OK, you showed me that it really is throwing an error if the row is
updated and there has been another statement executed, but as long
as it is *more* strict than we actually need, we won't corrupt data
-- and the current rule hasn't been hard for us to live with.)  It
allows the DELETE to proceed if the tuple is updated from within the
BEFORE DELETE trigger.  We would need to tweak some triggers to move
to the approach embodied in the recent patch drafts, but the IF
FOUND logic suggested by Florian looks like it will cover all of our
use cases, and they should be fairly easy to find with grep.
Hopefully this answers your question.  I went looking for details on
particular failures here, but didn't have luck with so far.  I can
try again if more detail like that would help.
-Kevin


Re: Remembering bug #6123

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> However, I wonder what use-case led you to file bug #6123 to begin
>> with.

> [ details ]
> Hopefully this answers your question.  I went looking for details on
> particular failures here, but didn't have luck with so far.  I can
> try again if more detail like that would help.

Well, the bottom line that's concerning me here is whether throwing
errors is going to push anyone's application into an unfixable corner.
I'm somewhat encouraged that your Circuit Courts software can adapt
to it, since that's certainly one of the larger and more complex
applications out there.  Or at least I would be if you had actually
verified that the CC code was okay with the recently-proposed patch
versions.  Do you have any thorough tests you can run against whatever
we end up with?
        regards, tom lane


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane  wrote:
> Well, the bottom line that's concerning me here is whether throwing
> errors is going to push anyone's application into an unfixable
> corner.  I'm somewhat encouraged that your Circuit Courts software
> can adapt to it, since that's certainly one of the larger and more
> complex applications out there. Or at least I would be if you had
> actually verified that the CC code was okay with the recently-
> proposed patch versions. Do you have any thorough tests you can run
> against whatever we end up with?
In spite of several attempts over the years to come up with automated
tests of our applications at a level that would show these issues,
we're still dependent on business analysts to run through a standard
list of tests for each release, plus tests designed to exercise code
modified for the release under test.  For the release where we went
to PostgreSQL triggers, that included running lists against the
statistics tables to see which trigger functions had not yet been
exercised in testing, until we had everything covered.
To test the new version of this patch, we would need to pick an
application release, and use the patch through the development,
testing, and staging cycles,  We would need to look for all triggers
needing adjustment, and make the necessary changes.  We would need to
figure out which triggers were important to cover, and ensure that
testing covered all of them.
Given the discussions with my new manager this past week, I'm pretty
sure we can work this into a release that would complete testing and
hit pilot deployment in something like three months, give or take a
little.  I can't actually make any promises on that until I talk to
her next week.
From working through all the BEFORE triggers with UPDATE or DELETE
statements this summer, I'm pretty confident that the ones which
remain can be handled by reissuing the DELETE (we didn't keep any of
the UPDATEs with this pattern) and returning NULL.  The most
complicated and troublesome trigger code has to do with purging old
data.  I suspect that if we include testing of all purge processes in
that release cycle, we'll shake out just about any issues we have.
-Kevin


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> Tom Lane  wrote:
>> Well, the bottom line that's concerning me here is whether
>> throwing errors is going to push anyone's application into an
>> unfixable corner.  I'm somewhat encouraged that your Circuit
>> Courts software can adapt to it, since that's certainly one of
>> the larger and more complex applications out there. Or at least I
>> would be if you had actually verified that the CC code was okay
>> with the recently-proposed patch versions. Do you have any
>> thorough tests you can run against whatever we end up with?
> To test the new version of this patch, we would need to pick an
> application release, and use the patch through the development,
> testing, and staging cycles,  We would need to look for all
> triggers needing adjustment, and make the necessary changes.  We
> would need to figure out which triggers were important to cover,
> and ensure that testing covered all of them.
> 
> Given the discussions with my new manager this past week, I'm
> pretty sure we can work this into a release that would complete
> testing and hit pilot deployment in something like three months,
> give or take a little.  I can't actually make any promises on that
> until I talk to her next week.
After a couple meetings, I have approval to get this into an
application release currently in development.  Assuming that your
patch from the 13th is good for doing the testing, I think I can
post test results in about three weeks.  I'll also work on a
follow-on patch to add couple paragraphs and an example of the issue
to the docs by then.
-Kevin


Re: Remembering bug #6123

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> After a couple meetings, I have approval to get this into an
> application release currently in development.  Assuming that your
> patch from the 13th is good for doing the testing, I think I can
> post test results in about three weeks.  I'll also work on a
> follow-on patch to add couple paragraphs and an example of the issue
> to the docs by then.

Well, the issues about wording of the error message are obviously just
cosmetic, but I think we'd better do whatever we intend to do to the
callers of heap_lock_tuple before putting the patch through testing.
I'm inclined to go ahead and make them throw errors, just to see if
that has any effects we don't like.

I'm up to my elbows in planner guts at the moment, but will try to
fix up the patch this weekend if you want.
        regards, tom lane


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm up to my elbows in planner guts at the moment, but will try to
> fix up the patch this weekend if you want.
They have scheduled testers to check on this issue next week, so it
would be great to get as close as we can on the stuff that matters.
-Kevin


Re: Remembering bug #6123

From
Tom Lane
Date:
OK, here's an updated version of the patch.  I changed the error message
texts as per discussion (except I decided to use one message string for
all the cases rather than saddle translators with several variants).
Also, I put in an error in GetTupleForTrigger, which fixes the
self-reference case I illustrated before (now added to the regression
test).  However, I found out that changing the other two callers of
heap_lock_tuple would open an entirely new can of worms, so for now
they still have the historical behavior of ignoring self-updated tuples.

The problem with changing ExecLockRows or EvalPlanQualFetch can be
illustrated by the regression test case it breaks, which basically is

BEGIN;
DECLARE c1 CURSOR FOR SELECT * FROM table FOR UPDATE;
UPDATE table SET ...;
FETCH ALL FROM c1;
COMMIT;

When the FETCH comes to a row that's been updated by the UPDATE, it sees
that row as HeapTupleSelfUpdated with a cmax greater than es_output_cid
(which is the CID assigned to the DECLARE).  So if we make these callers
throw an error for the case, coding like the above will fail, which
seems to me to be pretty darn hard to justify.  It is not a corner case
that could be caused only by questionable use of trigger side effects.
So that seems to leave us with two choices: (1) ignore the row, or
(2) attempt to lock the latest version instead of the visible version.
(1) is our historical behavior but seems arguably wrong.  I tried to
make the patch do (2) but it crashed and burned because heap_lock_tuple
spits up if asked to lock an "invisible" row.  We could possibly finesse
that by having EvalPlanQualFetch sometimes pass a CID later than
es_output_cid to heap_lock_tuple, but it seems ticklish.  More, I think
it would also take some adjustments to the behavior of
HeapTupleSatisfiesDirty, else we'll not see such tuples in the first
place.  So this looks messy, and also rather orthogonal to the current
goals of the patch.

Also, I'm not sure that your testing would exercise such cases at all,
as you have to be using SELECT FOR UPDATE and/or READ COMMITTED mode to
get to any of the relevant code.  I gather your software mostly relies
on SERIALIZABLE mode to avoid such issues.  So I stopped with this.

            regards, tom lane

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 99a431a95ff0662c82de5dfbd253f00777ca2c7c..25cbbd3ef283022d4fa7634dc79537efe9dfb735 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*************** simple_heap_insert(Relation relation, He
*** 2317,2343 ****
   *
   *    relation - table to be modified (caller must hold suitable lock)
   *    tid - TID of tuple to be deleted
-  *    ctid - output parameter, used only for failure case (see below)
-  *    update_xmax - output parameter, used only for failure case (see below)
   *    cid - delete command ID (used for visibility test, and stored into
   *        cmax if successful)
   *    crosscheck - if not InvalidSnapshot, also check tuple against this
   *    wait - true if should wait for any conflicting update to commit/abort
   *
   * Normal, successful return value is HeapTupleMayBeUpdated, which
   * actually means we did delete it.  Failure return codes are
   * HeapTupleSelfUpdated, HeapTupleUpdated, or HeapTupleBeingUpdated
   * (the last only possible if wait == false).
   *
!  * In the failure cases, the routine returns the tuple's t_ctid and t_xmax.
!  * If t_ctid is the same as tid, the tuple was deleted; if different, the
!  * tuple was updated, and t_ctid is the location of the replacement tuple.
!  * (t_xmax is needed to verify that the replacement tuple matches.)
   */
  HTSU_Result
  heap_delete(Relation relation, ItemPointer tid,
!             ItemPointer ctid, TransactionId *update_xmax,
!             CommandId cid, Snapshot crosscheck, bool wait)
  {
      HTSU_Result result;
      TransactionId xid = GetCurrentTransactionId();
--- 2317,2342 ----
   *
   *    relation - table to be modified (caller must hold suitable lock)
   *    tid - TID of tuple to be deleted
   *    cid - delete command ID (used for visibility test, and stored into
   *        cmax if successful)
   *    crosscheck - if not InvalidSnapshot, also check tuple against this
   *    wait - true if should wait for any conflicting update to commit/abort
+  *    hufd - output parameter, filled in failure cases (see below)
   *
   * Normal, successful return value is HeapTupleMayBeUpdated, which
   * actually means we did delete it.  Failure return codes are
   * HeapTupleSelfUpdated, HeapTupleUpdated, or HeapTupleBeingUpdated
   * (the last only possible if wait == false).
   *
!  * In the failure cases, the routine fills *hufd with the tuple's t_ctid,
!  * t_xmax, and t_cmax (the last only for HeapTupleSelfUpdated, since we
!  * cannot obtain cmax from a combocid generated by another transaction).
!  * See comments for struct HeapUpdateFailureData for additional info.
   */
  HTSU_Result
  heap_delete(Relation relation, ItemPointer tid,
!             CommandId cid, Snapshot crosscheck, bool wait,
!             HeapUpdateFailureData *hufd)
  {
      HTSU_Result result;
      TransactionId xid = GetCurrentTransactionId();
*************** l1:
*** 2498,2505 ****
                 result == HeapTupleUpdated ||
                 result == HeapTupleBeingUpdated);
          Assert(!(tp.t_data->t_infomask & HEAP_XMAX_INVALID));
!         *ctid = tp.t_data->t_ctid;
!         *update_xmax = HeapTupleHeaderGetXmax(tp.t_data);
          UnlockReleaseBuffer(buffer);
          if (have_tuple_lock)
              UnlockTuple(relation, &(tp.t_self), ExclusiveLock);
--- 2497,2508 ----
                 result == HeapTupleUpdated ||
                 result == HeapTupleBeingUpdated);
          Assert(!(tp.t_data->t_infomask & HEAP_XMAX_INVALID));
!         hufd->ctid = tp.t_data->t_ctid;
!         hufd->xmax = HeapTupleHeaderGetXmax(tp.t_data);
!         if (result == HeapTupleSelfUpdated)
!             hufd->cmax = HeapTupleHeaderGetCmax(tp.t_data);
!         else
!             hufd->cmax = 0;        /* for lack of an InvalidCommandId value */
          UnlockReleaseBuffer(buffer);
          if (have_tuple_lock)
              UnlockTuple(relation, &(tp.t_self), ExclusiveLock);
*************** void
*** 2631,2643 ****
  simple_heap_delete(Relation relation, ItemPointer tid)
  {
      HTSU_Result result;
!     ItemPointerData update_ctid;
!     TransactionId update_xmax;

      result = heap_delete(relation, tid,
-                          &update_ctid, &update_xmax,
                           GetCurrentCommandId(true), InvalidSnapshot,
!                          true /* wait for commit */ );
      switch (result)
      {
          case HeapTupleSelfUpdated:
--- 2634,2645 ----
  simple_heap_delete(Relation relation, ItemPointer tid)
  {
      HTSU_Result result;
!     HeapUpdateFailureData hufd;

      result = heap_delete(relation, tid,
                           GetCurrentCommandId(true), InvalidSnapshot,
!                          true /* wait for commit */,
!                          &hufd);
      switch (result)
      {
          case HeapTupleSelfUpdated:
*************** simple_heap_delete(Relation relation, It
*** 2668,2679 ****
   *    relation - table to be modified (caller must hold suitable lock)
   *    otid - TID of old tuple to be replaced
   *    newtup - newly constructed tuple data to store
-  *    ctid - output parameter, used only for failure case (see below)
-  *    update_xmax - output parameter, used only for failure case (see below)
   *    cid - update command ID (used for visibility test, and stored into
   *        cmax/cmin if successful)
   *    crosscheck - if not InvalidSnapshot, also check old tuple against this
   *    wait - true if should wait for any conflicting update to commit/abort
   *
   * Normal, successful return value is HeapTupleMayBeUpdated, which
   * actually means we *did* update it.  Failure return codes are
--- 2670,2680 ----
   *    relation - table to be modified (caller must hold suitable lock)
   *    otid - TID of old tuple to be replaced
   *    newtup - newly constructed tuple data to store
   *    cid - update command ID (used for visibility test, and stored into
   *        cmax/cmin if successful)
   *    crosscheck - if not InvalidSnapshot, also check old tuple against this
   *    wait - true if should wait for any conflicting update to commit/abort
+  *    hufd - output parameter, filled in failure cases (see below)
   *
   * Normal, successful return value is HeapTupleMayBeUpdated, which
   * actually means we *did* update it.  Failure return codes are
*************** simple_heap_delete(Relation relation, It
*** 2686,2700 ****
   * update was done.  However, any TOAST changes in the new tuple's
   * data are not reflected into *newtup.
   *
!  * In the failure cases, the routine returns the tuple's t_ctid and t_xmax.
!  * If t_ctid is the same as otid, the tuple was deleted; if different, the
!  * tuple was updated, and t_ctid is the location of the replacement tuple.
!  * (t_xmax is needed to verify that the replacement tuple matches.)
   */
  HTSU_Result
  heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
!             ItemPointer ctid, TransactionId *update_xmax,
!             CommandId cid, Snapshot crosscheck, bool wait)
  {
      HTSU_Result result;
      TransactionId xid = GetCurrentTransactionId();
--- 2687,2701 ----
   * update was done.  However, any TOAST changes in the new tuple's
   * data are not reflected into *newtup.
   *
!  * In the failure cases, the routine fills *hufd with the tuple's t_ctid,
!  * t_xmax, and t_cmax (the last only for HeapTupleSelfUpdated, since we
!  * cannot obtain cmax from a combocid generated by another transaction).
!  * See comments for struct HeapUpdateFailureData for additional info.
   */
  HTSU_Result
  heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
!             CommandId cid, Snapshot crosscheck, bool wait,
!             HeapUpdateFailureData *hufd)
  {
      HTSU_Result result;
      TransactionId xid = GetCurrentTransactionId();
*************** l2:
*** 2873,2880 ****
                 result == HeapTupleUpdated ||
                 result == HeapTupleBeingUpdated);
          Assert(!(oldtup.t_data->t_infomask & HEAP_XMAX_INVALID));
!         *ctid = oldtup.t_data->t_ctid;
!         *update_xmax = HeapTupleHeaderGetXmax(oldtup.t_data);
          UnlockReleaseBuffer(buffer);
          if (have_tuple_lock)
              UnlockTuple(relation, &(oldtup.t_self), ExclusiveLock);
--- 2874,2885 ----
                 result == HeapTupleUpdated ||
                 result == HeapTupleBeingUpdated);
          Assert(!(oldtup.t_data->t_infomask & HEAP_XMAX_INVALID));
!         hufd->ctid = oldtup.t_data->t_ctid;
!         hufd->xmax = HeapTupleHeaderGetXmax(oldtup.t_data);
!         if (result == HeapTupleSelfUpdated)
!             hufd->cmax = HeapTupleHeaderGetCmax(oldtup.t_data);
!         else
!             hufd->cmax = 0;        /* for lack of an InvalidCommandId value */
          UnlockReleaseBuffer(buffer);
          if (have_tuple_lock)
              UnlockTuple(relation, &(oldtup.t_self), ExclusiveLock);
*************** void
*** 3344,3356 ****
  simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup)
  {
      HTSU_Result result;
!     ItemPointerData update_ctid;
!     TransactionId update_xmax;

      result = heap_update(relation, otid, tup,
-                          &update_ctid, &update_xmax,
                           GetCurrentCommandId(true), InvalidSnapshot,
!                          true /* wait for commit */ );
      switch (result)
      {
          case HeapTupleSelfUpdated:
--- 3349,3360 ----
  simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup)
  {
      HTSU_Result result;
!     HeapUpdateFailureData hufd;

      result = heap_update(relation, otid, tup,
                           GetCurrentCommandId(true), InvalidSnapshot,
!                          true /* wait for commit */,
!                          &hufd);
      switch (result)
      {
          case HeapTupleSelfUpdated:
*************** simple_heap_update(Relation relation, It
*** 3388,3405 ****
   * Output parameters:
   *    *tuple: all fields filled in
   *    *buffer: set to buffer holding tuple (pinned but not locked at exit)
!  *    *ctid: set to tuple's t_ctid, but only in failure cases
!  *    *update_xmax: set to tuple's xmax, but only in failure cases
   *
   * Function result may be:
   *    HeapTupleMayBeUpdated: lock was successfully acquired
   *    HeapTupleSelfUpdated: lock failed because tuple updated by self
   *    HeapTupleUpdated: lock failed because tuple updated by other xact
   *
!  * In the failure cases, the routine returns the tuple's t_ctid and t_xmax.
!  * If t_ctid is the same as t_self, the tuple was deleted; if different, the
!  * tuple was updated, and t_ctid is the location of the replacement tuple.
!  * (t_xmax is needed to verify that the replacement tuple matches.)
   *
   *
   * NOTES: because the shared-memory lock table is of finite size, but users
--- 3392,3408 ----
   * Output parameters:
   *    *tuple: all fields filled in
   *    *buffer: set to buffer holding tuple (pinned but not locked at exit)
!  *    *hufd: filled in failure cases (see below)
   *
   * Function result may be:
   *    HeapTupleMayBeUpdated: lock was successfully acquired
   *    HeapTupleSelfUpdated: lock failed because tuple updated by self
   *    HeapTupleUpdated: lock failed because tuple updated by other xact
   *
!  * In the failure cases, the routine fills *hufd with the tuple's t_ctid,
!  * t_xmax, and t_cmax (the last only for HeapTupleSelfUpdated, since we
!  * cannot obtain cmax from a combocid generated by another transaction).
!  * See comments for struct HeapUpdateFailureData for additional info.
   *
   *
   * NOTES: because the shared-memory lock table is of finite size, but users
*************** simple_heap_update(Relation relation, It
*** 3435,3443 ****
   * conflict for a tuple, we don't incur any extra overhead.
   */
  HTSU_Result
! heap_lock_tuple(Relation relation, HeapTuple tuple, Buffer *buffer,
!                 ItemPointer ctid, TransactionId *update_xmax,
!                 CommandId cid, LockTupleMode mode, bool nowait)
  {
      HTSU_Result result;
      ItemPointer tid = &(tuple->t_self);
--- 3438,3446 ----
   * conflict for a tuple, we don't incur any extra overhead.
   */
  HTSU_Result
! heap_lock_tuple(Relation relation, HeapTuple tuple,
!                 CommandId cid, LockTupleMode mode, bool nowait,
!                 Buffer *buffer, HeapUpdateFailureData *hufd)
  {
      HTSU_Result result;
      ItemPointer tid = &(tuple->t_self);
*************** l3:
*** 3622,3629 ****
      {
          Assert(result == HeapTupleSelfUpdated || result == HeapTupleUpdated);
          Assert(!(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));
!         *ctid = tuple->t_data->t_ctid;
!         *update_xmax = HeapTupleHeaderGetXmax(tuple->t_data);
          LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
          if (have_tuple_lock)
              UnlockTuple(relation, tid, tuple_lock_type);
--- 3625,3636 ----
      {
          Assert(result == HeapTupleSelfUpdated || result == HeapTupleUpdated);
          Assert(!(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));
!         hufd->ctid = tuple->t_data->t_ctid;
!         hufd->xmax = HeapTupleHeaderGetXmax(tuple->t_data);
!         if (result == HeapTupleSelfUpdated)
!             hufd->cmax = HeapTupleHeaderGetCmax(tuple->t_data);
!         else
!             hufd->cmax = 0;        /* for lack of an InvalidCommandId value */
          LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
          if (have_tuple_lock)
              UnlockTuple(relation, tid, tuple_lock_type);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index f5e12e5681ddfbcc041bf636140d5179d0e45152..f349bedae7721a47dd47cfedb824152a6208c2b9 100644
*** a/src/backend/commands/trigger.c
--- b/src/backend/commands/trigger.c
*************** GetTupleForTrigger(EState *estate,
*** 2565,2572 ****
      if (newSlot != NULL)
      {
          HTSU_Result test;
!         ItemPointerData update_ctid;
!         TransactionId update_xmax;

          *newSlot = NULL;

--- 2565,2571 ----
      if (newSlot != NULL)
      {
          HTSU_Result test;
!         HeapUpdateFailureData hufd;

          *newSlot = NULL;

*************** GetTupleForTrigger(EState *estate,
*** 2578,2590 ****
           */
  ltrmark:;
          tuple.t_self = *tid;
!         test = heap_lock_tuple(relation, &tuple, &buffer,
!                                &update_ctid, &update_xmax,
                                 estate->es_output_cid,
!                                LockTupleExclusive, false);
          switch (test)
          {
              case HeapTupleSelfUpdated:
                  /* treat it as deleted; do not process */
                  ReleaseBuffer(buffer);
                  return NULL;
--- 2577,2603 ----
           */
  ltrmark:;
          tuple.t_self = *tid;
!         test = heap_lock_tuple(relation, &tuple,
                                 estate->es_output_cid,
!                                LockTupleExclusive, false /* wait */,
!                                &buffer, &hufd);
          switch (test)
          {
              case HeapTupleSelfUpdated:
+                 /*
+                  * The target tuple was already updated or deleted by the
+                  * current command, or by a later command in the current
+                  * transaction.  We ignore the tuple in the former case, and
+                  * throw error in the latter case, for the same reasons
+                  * enumerated in ExecUpdate and ExecDelete in
+                  * nodeModifyTable.c.
+                  */
+                 if (hufd.cmax != estate->es_output_cid)
+                     ereport(ERROR,
+                             (errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
+                              errmsg("tuple to be updated was already modified by an operation triggered by the
currentcommand"), 
+                              errhint("Consider using an AFTER trigger instead of a BEFORE trigger to propagate
changesto other rows."))); 
+
                  /* treat it as deleted; do not process */
                  ReleaseBuffer(buffer);
                  return NULL;
*************** ltrmark:;
*** 2598,2604 ****
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (!ItemPointerEquals(&update_ctid, &tuple.t_self))
                  {
                      /* it was updated, so look at the updated version */
                      TupleTableSlot *epqslot;
--- 2611,2617 ----
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (!ItemPointerEquals(&hufd.ctid, &tuple.t_self))
                  {
                      /* it was updated, so look at the updated version */
                      TupleTableSlot *epqslot;
*************** ltrmark:;
*** 2607,2617 ****
                                             epqstate,
                                             relation,
                                             relinfo->ri_RangeTableIndex,
!                                            &update_ctid,
!                                            update_xmax);
                      if (!TupIsNull(epqslot))
                      {
!                         *tid = update_ctid;
                          *newSlot = epqslot;

                          /*
--- 2620,2630 ----
                                             epqstate,
                                             relation,
                                             relinfo->ri_RangeTableIndex,
!                                            &hufd.ctid,
!                                            hufd.xmax);
                      if (!TupIsNull(epqslot))
                      {
!                         *tid = hufd.ctid;
                          *newSlot = epqslot;

                          /*
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 422f737e82dcb1202d3f9e927fe87f06f6b33c7b..f71784ccb51c9d0c58668f5d803f6dc00c1e5c92 100644
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
*************** EvalPlanQualFetch(EState *estate, Relati
*** 1857,1864 ****
          if (heap_fetch(relation, &SnapshotDirty, &tuple, &buffer, true, NULL))
          {
              HTSU_Result test;
!             ItemPointerData update_ctid;
!             TransactionId update_xmax;

              /*
               * If xmin isn't what we're expecting, the slot must have been
--- 1857,1863 ----
          if (heap_fetch(relation, &SnapshotDirty, &tuple, &buffer, true, NULL))
          {
              HTSU_Result test;
!             HeapUpdateFailureData hufd;

              /*
               * If xmin isn't what we're expecting, the slot must have been
*************** EvalPlanQualFetch(EState *estate, Relati
*** 1893,1905 ****
              /*
               * If tuple was inserted by our own transaction, we have to check
               * cmin against es_output_cid: cmin >= current CID means our
!              * command cannot see the tuple, so we should ignore it.  Without
!              * this we are open to the "Halloween problem" of indefinitely
!              * re-updating the same tuple. (We need not check cmax because
!              * HeapTupleSatisfiesDirty will consider a tuple deleted by our
!              * transaction dead, regardless of cmax.)  We just checked that
!              * priorXmax == xmin, so we can test that variable instead of
!              * doing HeapTupleHeaderGetXmin again.
               */
              if (TransactionIdIsCurrentTransactionId(priorXmax) &&
                  HeapTupleHeaderGetCmin(tuple.t_data) >= estate->es_output_cid)
--- 1892,1904 ----
              /*
               * If tuple was inserted by our own transaction, we have to check
               * cmin against es_output_cid: cmin >= current CID means our
!              * command cannot see the tuple, so we should ignore it.
!              * Otherwise heap_lock_tuple() will throw an error, and so would
!              * any later attempt to update or delete the tuple.  (We need not
!              * check cmax because HeapTupleSatisfiesDirty will consider a
!              * tuple deleted by our transaction dead, regardless of cmax.)
!              * Wee just checked that priorXmax == xmin, so we can test that
!              * variable instead of doing HeapTupleHeaderGetXmin again.
               */
              if (TransactionIdIsCurrentTransactionId(priorXmax) &&
                  HeapTupleHeaderGetCmin(tuple.t_data) >= estate->es_output_cid)
*************** EvalPlanQualFetch(EState *estate, Relati
*** 1911,1927 ****
              /*
               * This is a live tuple, so now try to lock it.
               */
!             test = heap_lock_tuple(relation, &tuple, &buffer,
!                                    &update_ctid, &update_xmax,
                                     estate->es_output_cid,
!                                    lockmode, false);
              /* We now have two pins on the buffer, get rid of one */
              ReleaseBuffer(buffer);

              switch (test)
              {
                  case HeapTupleSelfUpdated:
!                     /* treat it as deleted; do not process */
                      ReleaseBuffer(buffer);
                      return NULL;

--- 1910,1938 ----
              /*
               * This is a live tuple, so now try to lock it.
               */
!             test = heap_lock_tuple(relation, &tuple,
                                     estate->es_output_cid,
!                                    lockmode, false /* wait */,
!                                    &buffer, &hufd);
              /* We now have two pins on the buffer, get rid of one */
              ReleaseBuffer(buffer);

              switch (test)
              {
                  case HeapTupleSelfUpdated:
!                     /*
!                      * The target tuple was already updated or deleted by the
!                      * current command, or by a later command in the current
!                      * transaction.  We *must* ignore the tuple in the former
!                      * case, so as to avoid the "Halloween problem" of
!                      * repeated update attempts.  In the latter case it might
!                      * be sensible to fetch the updated tuple instead, but
!                      * doing so would require changing heap_lock_tuple as well
!                      * as heap_update and heap_delete to not complain about
!                      * updating "invisible" tuples, which seems pretty scary.
!                      * So for now, treat the tuple as deleted and do not
!                      * process.
!                      */
                      ReleaseBuffer(buffer);
                      return NULL;

*************** EvalPlanQualFetch(EState *estate, Relati
*** 1935,1946 ****
                          ereport(ERROR,
                                  (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                                   errmsg("could not serialize access due to concurrent update")));
!                     if (!ItemPointerEquals(&update_ctid, &tuple.t_self))
                      {
                          /* it was updated, so look at the updated version */
!                         tuple.t_self = update_ctid;
                          /* updated row should have xmin matching this xmax */
!                         priorXmax = update_xmax;
                          continue;
                      }
                      /* tuple was deleted, so give up */
--- 1946,1957 ----
                          ereport(ERROR,
                                  (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                                   errmsg("could not serialize access due to concurrent update")));
!                     if (!ItemPointerEquals(&hufd.ctid, &tuple.t_self))
                      {
                          /* it was updated, so look at the updated version */
!                         tuple.t_self = hufd.ctid;
                          /* updated row should have xmin matching this xmax */
!                         priorXmax = hufd.xmax;
                          continue;
                      }
                      /* tuple was deleted, so give up */
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index d2a33cb35682e23168d509cffe5f5f179f25249b..04ce18b0f5737ef9a55a1d88f9fb4b58a07df9ac 100644
*** a/src/backend/executor/nodeLockRows.c
--- b/src/backend/executor/nodeLockRows.c
*************** lnext:
*** 70,77 ****
          bool        isNull;
          HeapTupleData tuple;
          Buffer        buffer;
!         ItemPointerData update_ctid;
!         TransactionId update_xmax;
          LockTupleMode lockmode;
          HTSU_Result test;
          HeapTuple    copyTuple;
--- 70,76 ----
          bool        isNull;
          HeapTupleData tuple;
          Buffer        buffer;
!         HeapUpdateFailureData hufd;
          LockTupleMode lockmode;
          HTSU_Result test;
          HeapTuple    copyTuple;
*************** lnext:
*** 116,130 ****
          else
              lockmode = LockTupleShared;

!         test = heap_lock_tuple(erm->relation, &tuple, &buffer,
!                                &update_ctid, &update_xmax,
                                 estate->es_output_cid,
!                                lockmode, erm->noWait);
          ReleaseBuffer(buffer);
          switch (test)
          {
              case HeapTupleSelfUpdated:
!                 /* treat it as deleted; do not process */
                  goto lnext;

              case HeapTupleMayBeUpdated:
--- 115,140 ----
          else
              lockmode = LockTupleShared;

!         test = heap_lock_tuple(erm->relation, &tuple,
                                 estate->es_output_cid,
!                                lockmode, erm->noWait,
!                                &buffer, &hufd);
          ReleaseBuffer(buffer);
          switch (test)
          {
              case HeapTupleSelfUpdated:
!                 /*
!                  * The target tuple was already updated or deleted by the
!                  * current command, or by a later command in the current
!                  * transaction.  We *must* ignore the tuple in the former
!                  * case, so as to avoid the "Halloween problem" of repeated
!                  * update attempts.  In the latter case it might be sensible
!                  * to fetch the updated tuple instead, but doing so would
!                  * require changing heap_lock_tuple as well as heap_update and
!                  * heap_delete to not complain about updating "invisible"
!                  * tuples, which seems pretty scary.  So for now, treat the
!                  * tuple as deleted and do not process.
!                  */
                  goto lnext;

              case HeapTupleMayBeUpdated:
*************** lnext:
*** 136,143 ****
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (ItemPointerEquals(&update_ctid,
!                                       &tuple.t_self))
                  {
                      /* Tuple was deleted, so don't return it */
                      goto lnext;
--- 146,152 ----
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (ItemPointerEquals(&hufd.ctid, &tuple.t_self))
                  {
                      /* Tuple was deleted, so don't return it */
                      goto lnext;
*************** lnext:
*** 145,151 ****

                  /* updated, so fetch and lock the updated version */
                  copyTuple = EvalPlanQualFetch(estate, erm->relation, lockmode,
!                                               &update_ctid, update_xmax);

                  if (copyTuple == NULL)
                  {
--- 154,160 ----

                  /* updated, so fetch and lock the updated version */
                  copyTuple = EvalPlanQualFetch(estate, erm->relation, lockmode,
!                                               &hufd.ctid, hufd.xmax);

                  if (copyTuple == NULL)
                  {
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 37b70b88d53f6f8589c2fbb3036e5f1dc27f95ad..cf51941f88748fbd80f7e586484ce6e82bcf592b 100644
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
*************** ExecDelete(ItemPointer tupleid,
*** 294,301 ****
      ResultRelInfo *resultRelInfo;
      Relation    resultRelationDesc;
      HTSU_Result result;
!     ItemPointerData update_ctid;
!     TransactionId update_xmax;

      /*
       * get information on the (current) result relation
--- 294,300 ----
      ResultRelInfo *resultRelInfo;
      Relation    resultRelationDesc;
      HTSU_Result result;
!     HeapUpdateFailureData hufd;

      /*
       * get information on the (current) result relation
*************** ExecDelete(ItemPointer tupleid,
*** 347,360 ****
           */
  ldelete:;
          result = heap_delete(resultRelationDesc, tupleid,
-                              &update_ctid, &update_xmax,
                               estate->es_output_cid,
                               estate->es_crosscheck_snapshot,
!                              true /* wait for commit */ );
          switch (result)
          {
              case HeapTupleSelfUpdated:
!                 /* already deleted by self; nothing to do */
                  return NULL;

              case HeapTupleMayBeUpdated:
--- 346,389 ----
           */
  ldelete:;
          result = heap_delete(resultRelationDesc, tupleid,
                               estate->es_output_cid,
                               estate->es_crosscheck_snapshot,
!                              true /* wait for commit */,
!                              &hufd);
          switch (result)
          {
              case HeapTupleSelfUpdated:
!                 /*
!                  * The target tuple was already updated or deleted by the
!                  * current command, or by a later command in the current
!                  * transaction.  The former case is possible in a join DELETE
!                  * where multiple tuples join to the same target tuple.
!                  * This is somewhat questionable, but Postgres has always
!                  * allowed it: we just ignore additional deletion attempts.
!                  *
!                  * The latter case arises if the tuple is modified by a
!                  * command in a BEFORE trigger, or perhaps by a command in a
!                  * volatile function used in the query.  In such situations we
!                  * should not ignore the deletion, but it is equally unsafe to
!                  * proceed.  We don't want to discard the original DELETE
!                  * while keeping the triggered actions based on its deletion;
!                  * and it would be no better to allow the original DELETE
!                  * while discarding updates that it triggered.  The row update
!                  * carries some information that might be important according
!                  * to business rules; so throwing an error is the only safe
!                  * course.
!                  *
!                  * If a trigger actually intends this type of interaction,
!                  * it can re-execute the DELETE and then return NULL to
!                  * cancel the outer delete.
!                  */
!                 if (hufd.cmax != estate->es_output_cid)
!                     ereport(ERROR,
!                             (errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
!                              errmsg("tuple to be updated was already modified by an operation triggered by the
currentcommand"), 
!                              errhint("Consider using an AFTER trigger instead of a BEFORE trigger to propagate
changesto other rows."))); 
!
!                 /* Else, already deleted by self; nothing to do */
                  return NULL;

              case HeapTupleMayBeUpdated:
*************** ldelete:;
*** 365,371 ****
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (!ItemPointerEquals(tupleid, &update_ctid))
                  {
                      TupleTableSlot *epqslot;

--- 394,400 ----
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (!ItemPointerEquals(tupleid, &hufd.ctid))
                  {
                      TupleTableSlot *epqslot;

*************** ldelete:;
*** 373,383 ****
                                             epqstate,
                                             resultRelationDesc,
                                             resultRelInfo->ri_RangeTableIndex,
!                                            &update_ctid,
!                                            update_xmax);
                      if (!TupIsNull(epqslot))
                      {
!                         *tupleid = update_ctid;
                          goto ldelete;
                      }
                  }
--- 402,412 ----
                                             epqstate,
                                             resultRelationDesc,
                                             resultRelInfo->ri_RangeTableIndex,
!                                            &hufd.ctid,
!                                            hufd.xmax);
                      if (!TupIsNull(epqslot))
                      {
!                         *tupleid = hufd.ctid;
                          goto ldelete;
                      }
                  }
*************** ExecUpdate(ItemPointer tupleid,
*** 481,488 ****
      ResultRelInfo *resultRelInfo;
      Relation    resultRelationDesc;
      HTSU_Result result;
!     ItemPointerData update_ctid;
!     TransactionId update_xmax;
      List       *recheckIndexes = NIL;

      /*
--- 510,516 ----
      ResultRelInfo *resultRelInfo;
      Relation    resultRelationDesc;
      HTSU_Result result;
!     HeapUpdateFailureData hufd;
      List       *recheckIndexes = NIL;

      /*
*************** lreplace:;
*** 563,576 ****
           * mode transactions.
           */
          result = heap_update(resultRelationDesc, tupleid, tuple,
-                              &update_ctid, &update_xmax,
                               estate->es_output_cid,
                               estate->es_crosscheck_snapshot,
!                              true /* wait for commit */ );
          switch (result)
          {
              case HeapTupleSelfUpdated:
!                 /* already deleted by self; nothing to do */
                  return NULL;

              case HeapTupleMayBeUpdated:
--- 591,633 ----
           * mode transactions.
           */
          result = heap_update(resultRelationDesc, tupleid, tuple,
                               estate->es_output_cid,
                               estate->es_crosscheck_snapshot,
!                              true /* wait for commit */,
!                              &hufd);
          switch (result)
          {
              case HeapTupleSelfUpdated:
!                 /*
!                  * The target tuple was already updated or deleted by the
!                  * current command, or by a later command in the current
!                  * transaction.  The former case is possible in a join UPDATE
!                  * where multiple tuples join to the same target tuple.
!                  * This is pretty questionable, but Postgres has always
!                  * allowed it: we just execute the first update action and
!                  * ignore additional update attempts.
!                  *
!                  * The latter case arises if the tuple is modified by a
!                  * command in a BEFORE trigger, or perhaps by a command in a
!                  * volatile function used in the query.  In such situations we
!                  * should not ignore the update, but it is equally unsafe to
!                  * proceed.  We don't want to discard the original UPDATE
!                  * while keeping the triggered actions based on it; and we
!                  * have no principled way to merge this update with the
!                  * previous ones.  So throwing an error is the only safe
!                  * course.
!                  *
!                  * If a trigger actually intends this type of interaction,
!                  * it can re-execute the UPDATE (assuming it can figure out
!                  * how) and then return NULL to cancel the outer update.
!                  */
!                 if (hufd.cmax != estate->es_output_cid)
!                     ereport(ERROR,
!                             (errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
!                              errmsg("tuple to be updated was already modified by an operation triggered by the
currentcommand"), 
!                              errhint("Consider using an AFTER trigger instead of a BEFORE trigger to propagate
changesto other rows."))); 
!
!                 /* Else, already updated by self; nothing to do */
                  return NULL;

              case HeapTupleMayBeUpdated:
*************** lreplace:;
*** 581,587 ****
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (!ItemPointerEquals(tupleid, &update_ctid))
                  {
                      TupleTableSlot *epqslot;

--- 638,644 ----
                      ereport(ERROR,
                              (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
                               errmsg("could not serialize access due to concurrent update")));
!                 if (!ItemPointerEquals(tupleid, &hufd.ctid))
                  {
                      TupleTableSlot *epqslot;

*************** lreplace:;
*** 589,599 ****
                                             epqstate,
                                             resultRelationDesc,
                                             resultRelInfo->ri_RangeTableIndex,
!                                            &update_ctid,
!                                            update_xmax);
                      if (!TupIsNull(epqslot))
                      {
!                         *tupleid = update_ctid;
                          slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
                          tuple = ExecMaterializeSlot(slot);
                          goto lreplace;
--- 646,656 ----
                                             epqstate,
                                             resultRelationDesc,
                                             resultRelInfo->ri_RangeTableIndex,
!                                            &hufd.ctid,
!                                            hufd.xmax);
                      if (!TupIsNull(epqslot))
                      {
!                         *tupleid = hufd.ctid;
                          slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
                          tuple = ExecMaterializeSlot(slot);
                          goto lreplace;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fa38803b249ffd2407d77f6efd4e0203975f2349..80d60403481c99d9fab3b4fd157f708269f43a1f 100644
*** a/src/include/access/heapam.h
--- b/src/include/access/heapam.h
*************** typedef enum
*** 35,40 ****
--- 35,63 ----
      LockTupleExclusive
  } LockTupleMode;

+ /*
+  * When heap_update, heap_delete, or heap_lock_tuple fail because the target
+  * tuple is already outdated, they fill in this struct to provide information
+  * to the caller about what happened.
+  * ctid is the target's ctid link: it is the same as the target's TID if the
+  * target was deleted, or the location of the replacement tuple if the target
+  * was updated.
+  * xmax is the outdating transaction's XID.  If the caller wants to visit the
+  * replacement tuple, it must check that this matches before believing the
+  * replacement is really a match.
+  * cmax is the outdating command's CID, but only when the failure code is
+  * HeapTupleSelfUpdated (i.e., something in the current transaction outdated
+  * the tuple); otherwise cmax is zero.  (We make this restriction because
+  * HeapTupleHeaderGetCmax doesn't work for tuples outdated in other
+  * transactions.)
+  */
+ typedef struct HeapUpdateFailureData
+ {
+     ItemPointerData        ctid;
+     TransactionId        xmax;
+     CommandId            cmax;
+ } HeapUpdateFailureData;
+

  /* ----------------
   *        function prototypes for heap access method
*************** extern Oid heap_insert(Relation relation
*** 100,115 ****
  extern void heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
                    CommandId cid, int options, BulkInsertState bistate);
  extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
!             ItemPointer ctid, TransactionId *update_xmax,
!             CommandId cid, Snapshot crosscheck, bool wait);
  extern HTSU_Result heap_update(Relation relation, ItemPointer otid,
              HeapTuple newtup,
!             ItemPointer ctid, TransactionId *update_xmax,
!             CommandId cid, Snapshot crosscheck, bool wait);
  extern HTSU_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
!                 Buffer *buffer, ItemPointer ctid,
!                 TransactionId *update_xmax, CommandId cid,
!                 LockTupleMode mode, bool nowait);
  extern void heap_inplace_update(Relation relation, HeapTuple tuple);
  extern bool heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
                    Buffer buf);
--- 123,137 ----
  extern void heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
                    CommandId cid, int options, BulkInsertState bistate);
  extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
!             CommandId cid, Snapshot crosscheck, bool wait,
!             HeapUpdateFailureData *hufd);
  extern HTSU_Result heap_update(Relation relation, ItemPointer otid,
              HeapTuple newtup,
!             CommandId cid, Snapshot crosscheck, bool wait,
!             HeapUpdateFailureData *hufd);
  extern HTSU_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
!                 CommandId cid, LockTupleMode mode, bool nowait,
!                 Buffer *buffer, HeapUpdateFailureData *hufd);
  extern void heap_inplace_update(Relation relation, HeapTuple tuple);
  extern bool heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
                    Buffer buf);
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index b4d391974d21e9e94e75124c1cd0e022527f68e9..2e18c9275c0984a0ac290b830b407b1080b13f01 100644
*** a/src/test/regress/expected/triggers.out
--- b/src/test/regress/expected/triggers.out
*************** NOTICE:  drop cascades to 2 other object
*** 1443,1445 ****
--- 1443,1642 ----
  DETAIL:  drop cascades to view city_view
  drop cascades to view european_city_view
  DROP TABLE country_table;
+ --
+ -- Test updates to rows during firing of BEFORE ROW triggers.
+ -- As of 9.2, such cases should be rejected (see bug #6123).
+ --
+ create temp table parent (
+     aid int not null primary key,
+     val1 text,
+     val2 text,
+     val3 text,
+     val4 text,
+     bcnt int not null default 0);
+ NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "parent_pkey" for table "parent"
+ create temp table child (
+     bid int not null primary key,
+     aid int not null,
+     val1 text);
+ NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "child_pkey" for table "child"
+ create function parent_upd_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   if old.val1 <> new.val1 then
+     new.val2 = new.val1;
+     delete from child where child.aid = new.aid and child.val1 = new.val1;
+   end if;
+   return new;
+ end;
+ $$;
+ create trigger parent_upd_trig before update on parent
+   for each row execute procedure parent_upd_func();
+ create function parent_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   delete from child where aid = old.aid;
+   return old;
+ end;
+ $$;
+ create trigger parent_del_trig before delete on parent
+   for each row execute procedure parent_del_func();
+ create function child_ins_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   update parent set bcnt = bcnt + 1 where aid = new.aid;
+   return new;
+ end;
+ $$;
+ create trigger child_ins_trig after insert on child
+   for each row execute procedure child_ins_func();
+ create function child_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   update parent set bcnt = bcnt - 1 where aid = old.aid;
+   return old;
+ end;
+ $$;
+ create trigger child_del_trig after delete on child
+   for each row execute procedure child_del_func();
+ insert into parent values (1, 'a', 'a', 'a', 'a', 0);
+ insert into child values (10, 1, 'b');
+ select * from parent; select * from child;
+  aid | val1 | val2 | val3 | val4 | bcnt
+ -----+------+------+------+------+------
+    1 | a    | a    | a    | a    |    1
+ (1 row)
+
+  bid | aid | val1
+ -----+-----+------
+   10 |   1 | b
+ (1 row)
+
+ update parent set val1 = 'b' where aid = 1; -- should fail
+ ERROR:  tuple to be updated was already modified by an operation triggered by the current command
+ HINT:  Consider using an AFTER trigger instead of a BEFORE trigger to propagate changes to other rows.
+ select * from parent; select * from child;
+  aid | val1 | val2 | val3 | val4 | bcnt
+ -----+------+------+------+------+------
+    1 | a    | a    | a    | a    |    1
+ (1 row)
+
+  bid | aid | val1
+ -----+-----+------
+   10 |   1 | b
+ (1 row)
+
+ delete from parent where aid = 1; -- should fail
+ ERROR:  tuple to be updated was already modified by an operation triggered by the current command
+ HINT:  Consider using an AFTER trigger instead of a BEFORE trigger to propagate changes to other rows.
+ select * from parent; select * from child;
+  aid | val1 | val2 | val3 | val4 | bcnt
+ -----+------+------+------+------+------
+    1 | a    | a    | a    | a    |    1
+ (1 row)
+
+  bid | aid | val1
+ -----+-----+------
+   10 |   1 | b
+ (1 row)
+
+ -- replace the trigger function with one that restarts the deletion after
+ -- having modified a child
+ create or replace function parent_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   delete from child where aid = old.aid;
+   if found then
+     delete from parent where aid = old.aid;
+     return null; -- cancel outer deletion
+   end if;
+   return old;
+ end;
+ $$;
+ delete from parent where aid = 1;
+ select * from parent; select * from child;
+  aid | val1 | val2 | val3 | val4 | bcnt
+ -----+------+------+------+------+------
+ (0 rows)
+
+  bid | aid | val1
+ -----+-----+------
+ (0 rows)
+
+ drop table parent, child;
+ drop function parent_upd_func();
+ drop function parent_del_func();
+ drop function child_ins_func();
+ drop function child_del_func();
+ -- similar case, but with a self-referencing FK so that parent and child
+ -- rows can be affected by a single operation
+ create temp table self_ref_trigger (
+     id int primary key,
+     parent int references self_ref_trigger,
+     data text,
+     nchildren int not null default 0
+ );
+ NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "self_ref_trigger_pkey" for table "self_ref_trigger"
+ create function self_ref_trigger_ins_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   if new.parent is not null then
+     update self_ref_trigger set nchildren = nchildren + 1
+       where id = new.parent;
+   end if;
+   return new;
+ end;
+ $$;
+ create trigger self_ref_trigger_ins_trig before insert on self_ref_trigger
+   for each row execute procedure self_ref_trigger_ins_func();
+ create function self_ref_trigger_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   if old.parent is not null then
+     update self_ref_trigger set nchildren = nchildren - 1
+       where id = old.parent;
+   end if;
+   return old;
+ end;
+ $$;
+ create trigger self_ref_trigger_del_trig before delete on self_ref_trigger
+   for each row execute procedure self_ref_trigger_del_func();
+ insert into self_ref_trigger values (1, null, 'root');
+ insert into self_ref_trigger values (2, 1, 'root child A');
+ insert into self_ref_trigger values (3, 1, 'root child B');
+ insert into self_ref_trigger values (4, 2, 'grandchild 1');
+ insert into self_ref_trigger values (5, 3, 'grandchild 2');
+ update self_ref_trigger set data = 'root!' where id = 1;
+ select * from self_ref_trigger;
+  id | parent |     data     | nchildren
+ ----+--------+--------------+-----------
+   2 |      1 | root child A |         1
+   4 |      2 | grandchild 1 |         0
+   3 |      1 | root child B |         1
+   5 |      3 | grandchild 2 |         0
+   1 |        | root!        |         2
+ (5 rows)
+
+ delete from self_ref_trigger;
+ ERROR:  tuple to be updated was already modified by an operation triggered by the current command
+ HINT:  Consider using an AFTER trigger instead of a BEFORE trigger to propagate changes to other rows.
+ select * from self_ref_trigger;
+  id | parent |     data     | nchildren
+ ----+--------+--------------+-----------
+   2 |      1 | root child A |         1
+   4 |      2 | grandchild 1 |         0
+   3 |      1 | root child B |         1
+   5 |      3 | grandchild 2 |         0
+   1 |        | root!        |         2
+ (5 rows)
+
+ drop table self_ref_trigger;
+ drop function self_ref_trigger_ins_func();
+ drop function self_ref_trigger_del_func();
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index e52131dbba2d7a45c5f9365f24721de8a07e3253..f282e800e34c4a7e1982619d8b5bfbe8a5f58794 100644
*** a/src/test/regress/sql/triggers.sql
--- b/src/test/regress/sql/triggers.sql
*************** SELECT * FROM city_view;
*** 961,963 ****
--- 961,1118 ----

  DROP TABLE city_table CASCADE;
  DROP TABLE country_table;
+
+ --
+ -- Test updates to rows during firing of BEFORE ROW triggers.
+ -- As of 9.2, such cases should be rejected (see bug #6123).
+ --
+
+ create temp table parent (
+     aid int not null primary key,
+     val1 text,
+     val2 text,
+     val3 text,
+     val4 text,
+     bcnt int not null default 0);
+ create temp table child (
+     bid int not null primary key,
+     aid int not null,
+     val1 text);
+
+ create function parent_upd_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   if old.val1 <> new.val1 then
+     new.val2 = new.val1;
+     delete from child where child.aid = new.aid and child.val1 = new.val1;
+   end if;
+   return new;
+ end;
+ $$;
+ create trigger parent_upd_trig before update on parent
+   for each row execute procedure parent_upd_func();
+
+ create function parent_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   delete from child where aid = old.aid;
+   return old;
+ end;
+ $$;
+ create trigger parent_del_trig before delete on parent
+   for each row execute procedure parent_del_func();
+
+ create function child_ins_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   update parent set bcnt = bcnt + 1 where aid = new.aid;
+   return new;
+ end;
+ $$;
+ create trigger child_ins_trig after insert on child
+   for each row execute procedure child_ins_func();
+
+ create function child_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   update parent set bcnt = bcnt - 1 where aid = old.aid;
+   return old;
+ end;
+ $$;
+ create trigger child_del_trig after delete on child
+   for each row execute procedure child_del_func();
+
+ insert into parent values (1, 'a', 'a', 'a', 'a', 0);
+ insert into child values (10, 1, 'b');
+ select * from parent; select * from child;
+
+ update parent set val1 = 'b' where aid = 1; -- should fail
+ select * from parent; select * from child;
+
+ delete from parent where aid = 1; -- should fail
+ select * from parent; select * from child;
+
+ -- replace the trigger function with one that restarts the deletion after
+ -- having modified a child
+ create or replace function parent_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   delete from child where aid = old.aid;
+   if found then
+     delete from parent where aid = old.aid;
+     return null; -- cancel outer deletion
+   end if;
+   return old;
+ end;
+ $$;
+
+ delete from parent where aid = 1;
+ select * from parent; select * from child;
+
+ drop table parent, child;
+
+ drop function parent_upd_func();
+ drop function parent_del_func();
+ drop function child_ins_func();
+ drop function child_del_func();
+
+ -- similar case, but with a self-referencing FK so that parent and child
+ -- rows can be affected by a single operation
+
+ create temp table self_ref_trigger (
+     id int primary key,
+     parent int references self_ref_trigger,
+     data text,
+     nchildren int not null default 0
+ );
+
+ create function self_ref_trigger_ins_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   if new.parent is not null then
+     update self_ref_trigger set nchildren = nchildren + 1
+       where id = new.parent;
+   end if;
+   return new;
+ end;
+ $$;
+ create trigger self_ref_trigger_ins_trig before insert on self_ref_trigger
+   for each row execute procedure self_ref_trigger_ins_func();
+
+ create function self_ref_trigger_del_func()
+   returns trigger language plpgsql as
+ $$
+ begin
+   if old.parent is not null then
+     update self_ref_trigger set nchildren = nchildren - 1
+       where id = old.parent;
+   end if;
+   return old;
+ end;
+ $$;
+ create trigger self_ref_trigger_del_trig before delete on self_ref_trigger
+   for each row execute procedure self_ref_trigger_del_func();
+
+ insert into self_ref_trigger values (1, null, 'root');
+ insert into self_ref_trigger values (2, 1, 'root child A');
+ insert into self_ref_trigger values (3, 1, 'root child B');
+ insert into self_ref_trigger values (4, 2, 'grandchild 1');
+ insert into self_ref_trigger values (5, 3, 'grandchild 2');
+
+ update self_ref_trigger set data = 'root!' where id = 1;
+
+ select * from self_ref_trigger;
+
+ delete from self_ref_trigger;
+
+ select * from self_ref_trigger;
+
+ drop table self_ref_trigger;
+ drop function self_ref_trigger_ins_func();
+ drop function self_ref_trigger_del_func();

Re: Remembering bug #6123

From
Simon Riggs
Date:
On Thu, Jan 12, 2012 at 4:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Also, what's the point of testing update_ctid?  I don't see that
>>> it matters whether the outdate was a delete or an update.
>
>> The update_ctid code was a carry-over from my old, slightly
>> different approach, which I failed to change as I should have.  I'll
>> fix that along with the other.
>
> Actually, on reflection there might be a reason for checking
> update_ctid, with a view to allowing "harmless" cases.  I see
> these cases:
>
> * UPDATE finds a trigger already updated the row: must throw error
> since we can't apply the update.
>
> * UPDATE finds a trigger already deleted the row: arguably, we could
> let the deletion stand and ignore the update action.
>
> * DELETE finds a trigger already updated the row: must throw error
> since we can't apply the delete.
>
> * DELETE finds a trigger already deleted the row: arguably, there's
> no reason to complain.
>
> Don't know if that was your reasoning as well.  But if it is, then again
> the comment needs to cover that.


Just been reading this thread a little.

ISTM that seeing a SelfUpdated row on a cursor when we didn't use
WHERE CURRENT OF really ought to raise some kind of exception
condition like a WARNING or a NOTICE. That situation seems most likely
to be a bug or at least an unintended consequence.

As Tom points out, the multiple flavours of weirdness that can result
even if we do fix this are not things we should do silently. I think
we should do the best job we can without throwing an error, but we
must make some attempt to let the developer know we did that for them
so they can investigate.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Remembering bug #6123

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

http://archives.postgresql.org/pgsql-hackers/2012-01/msg01241.php
> OK, here's an updated version of the patch.
I was on vacation after PGCon and just got back to the office
yesterday, so I have just now been able to check on the status of
our testing of this after being asked about it by Tom at PGCon.
He has this listed as an open bug, with testing of his fix by my
organization as the hold-up.
There was some testing of this in January while I was on another
vacation, some triggers were found which worked as intended with the
patch I had hacked together, but which got errors with the stricter
(and safer) patch created by Tom.  I pointed out to the developers
some triggers which needed to be changed, and what should be
considered safe coding techniques, but I was then assigned to
several urgent issues and lost track of this one.  I have arranged
for testing over the next few days.  I will post again with results
when I have them.
-Kevin