Thread: WIP fix proposal for bug #6123

WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

20 July 2011, 15:58:23

We're nearing completion of testing the migration of a lot of code
which used our custom Java framework into PostgreSQL functions and
triggers.  Yesterday our testers ran into surprising behavior
related to delete triggers.  A test case is presented on the -bugs
list, but basically it amounts to this:

(1)  We have some detail which is summarized by triggers into
related higher-level tables for performance reasons.

(2)  On delete, some of the higher level tables should cascade the
delete down to the lower levels.

(3)  Sometimes the same tables are involved in both.

This is complicated by the foreign key situation -- due to
conversion of less-than-perfect data and the fact that there is a
legal concept of the elected Clerk of Court in each county being the
"custodian of the data" we have orphaned detail in some tables which
we don't have authority to delete or create bogus parent rows for.
(It would actually be a felony for us to do so, I think.)  Until 9.2
we won't be able to create foreign keys for these relationships, but
in each county we've created foreign keys for all relationships
without orphans.  So, this is one reason we can't count on foreign
key declarations, with the ON DELETE CASCADE option, yet we don't
want to drop the foreign keys where they exist, as they help prevent
further degradation of the data integrity.  So the DELETE from the
children should occur in the BEFORE trigger to avoid upsetting FK
logic.  Otherwise we could move the cascading deletes to the AFTER
DELETE trigger, where this odd behavior doesn't occur.

So basically, the goal of this patch is to ensure that a BEFORE
DELETE trigger doesn't silently fail to delete a row because that
row was updated during the BEFORE DELETE trigger firing (in our case
by secondary trigger firing).

If that description was too hard to follow, let me know and I'll try
again.  :-/

[Summarizing discussion on the -bugs list,] Tom didn't feel that
there was a need to support application code which does what I
describe above, and he felt that fixing it would open a can of
worms, with logical quandaries about correct behavior. Basically,
the changes I made were within switch statements where if the row
was found to be HeapTupleUpdated in READ COMMITTED, it would follow
the ctid chain; I used similar logic for HeapTupleSelfUpdated
regardless of transaction isolation level.  The reasons for not
re-firing delete triggers here is the same for why delete triggers
are not fired in the existing case -- it's just one delete.

No claims are made for completeness of this patch -- it's a "proof of
concept" on which I hope to get comments. Before this patch would be
production ready I would need to check for similar needs on UPDATE,
and would need to check to make sure there is no resource leakage.
It passes `make check-world`, `make installcheck-world` with a
database set for serializable transaction isolation, and the
isolation tests.  And of course, it doesn't show the behavior which
we found so astonishing -- we no longer see an attempted delete of a
parent succeed in deleting the children but leave the parent sitting
there.

A patch against 9.0 based on this approach may be find its way into
production here in about two weeks if there are no
contra-indications, so any review would be very much appreciated.

-Kevin

Attachment

delete-before-fix-wip1.patch

Re: WIP fix proposal for bug #6123

From

Robert Haas

Date:

20 July 2011, 16:23:15

On Wed, Jul 20, 2011 at 2:58 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> So basically, the goal of this patch is to ensure that a BEFORE
> DELETE trigger doesn't silently fail to delete a row because that
> row was updated during the BEFORE DELETE trigger firing (in our case
> by secondary trigger firing).

I've run across this problem before while writing application code and
have found it surprising, unfortunate, and difficult to work around.
It's not so bad when it only bites you once, but as things get more
complicated and you have more and more triggers floating around, it
gets pretty darn horrible.  One of the things I've done to mitigate
the impact of this is to write as many triggers as possible as AFTER
triggers, but that's sometimes difficult to accomplish.

Your scenario is a BEFORE DELETE trigger that does an UPDATE on the
same row, but I think this problem also occurs if you have a BEFORE
UPDATE trigger that does an UPDATE on the same row.  I believe the
second update gets silently ignored.

I'm unfortunately unqualified to speak to the correctness of your
patch, but I concur with your view that the current behavior is lousy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

20 July 2011, 16:34:07

Robert Haas <robertmhaas@gmail.com> wrote:
> I think this problem also occurs if you have a BEFORE
> UPDATE trigger that does an UPDATE on the same row.
I'm pretty sure you're right, and I do intend to cover that in the
final patch.
-Kevin

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

22 July 2011, 18:01:36

Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jul 20, 2011 at 2:58 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> So basically, the goal of this patch is to ensure that a BEFORE
>> DELETE trigger doesn't silently fail to delete a row because that
>> row was updated during the BEFORE DELETE trigger firing (in our
>> case by secondary trigger firing).
> 
> I've run across this problem before while writing application code
> and have found it surprising, unfortunate, and difficult to work
> around. It's not so bad when it only bites you once, but as things
> get more complicated and you have more and more triggers floating
> around, it gets pretty darn horrible.  One of the things I've done
> to mitigate the impact of this is to write as many triggers as
> possible as AFTER triggers
Yeah, this is not an issue in AFTER triggers, so moving any updating
to those is a solution.  In most cases that's where you want
triggered modifications to take place anyway.  The cascading delete
situation is the most obvious exception, although there certainly
could be others.
> but that's sometimes difficult to accomplish.
Yeah, sometimes.
> Your scenario is a BEFORE DELETE trigger that does an UPDATE on
> the same row, but I think this problem also occurs if you have a
> BEFORE UPDATE trigger that does an UPDATE on the same row.  I
> believe the second update gets silently ignored.
My testing shows that the primary update gets ignored, while all the
triggered effects of that update are persisted.  Yuck.  :-(  It
certainly seems possible to turn that around, but that hardly seems
better.  In asking application programmers here what they would
*expect* to happen, they all seem to think that it is surprising
that the BEFORE trigger functions *return a record*, rather than a
boolean to say whether to proceed with the operation.  They feel it
would be less confusing if a value set into NEW was effective if the
operation does take effect, and the boolean controls whether or not
that happens.  They rather expect that if an update happens from the
same transaction while a before trigger is running, that the NEW
record will reflect the change.
I recognize how hard it would be to create that expected behavior,
and how unlikely it is that the community would accept such a change
at this point.  But current behavior is to silently do something
really dumb, so I think some change should be considered -- even if
that change is to throw an error where we now allow nonsense.
INSERT is not a problem -- if a BEFORE INSERT trigger inserts a row
with a conflicting primary key (or other unique index key), the
operation will be rolled back.  That's fine.
I think DELETE can be cleanly fixed with a patch similar to what I
posted earlier in the thread.  I found one more value that looks
like it should be set, and it could use some comments, but I believe
that we can get DELETE behavior which is every bit as sensible as
INSERT behavior with a very small change.
The worms do come crawling out of the can on BEFORE UPDATE triggers,
though.  When faced with an UPDATE which hasn't yet been applied,
and other UPDATEs triggering from within the BEFORE UPDATE trigger
which touch the same row, it doesn't seem like you can honor both
the original UPDATE which was requested *and* the triggered UPDATEs.
Of course, if you discard the original UPDATE, you can't very well
do anything with the record returned from the BEFORE UPDATE trigger
for that update.  Since it seems equally evil to allow the update
while ignoring some of the work caused by its trigger functions as
to show the work of the triggered updates while suppressing the
original update, I think the right thing is to throw an error if the
old row for a BEFORE UPDATE is updated by the same transaction and
the trigger function ultimately returns a non-NULL value.
Thoughts?
-Kevin

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

22 July 2011, 18:52:06

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

> I believe that we can get DELETE behavior which is every bit as
> sensible as INSERT behavior with a very small change.

> I think the right thing is to throw an error if the old row for a
> BEFORE UPDATE is updated by the same transaction and the trigger
> function ultimately returns a non-NULL value.

And to make this a bit less hand-wavy, a rough patch attached.  I
expect the error message could use some word-smithing, and it could
use comments; but it seemed like something concrete might speed
things along.

-Kevin

Attachment

bug6123-v1.patch

Re: WIP fix proposal for bug #6123

From

Robert Haas

Date:

25 July 2011, 12:40:17

On Fri, Jul 22, 2011 at 5:01 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
>> Your scenario is a BEFORE DELETE trigger that does an UPDATE on
>> the same row, but I think this problem also occurs if you have a
>> BEFORE UPDATE trigger that does an UPDATE on the same row.  I
>> believe the second update gets silently ignored.
>
> My testing shows that the primary update gets ignored, while all the
> triggered effects of that update are persisted.  Yuck.  :-(

That was my recollection...

> It
> certainly seems possible to turn that around, but that hardly seems
> better.

Agreed.

> In asking application programmers here what they would
> *expect* to happen, they all seem to think that it is surprising
> that the BEFORE trigger functions *return a record*, rather than a
> boolean to say whether to proceed with the operation.  They feel it
> would be less confusing if a value set into NEW was effective if the
> operation does take effect, and the boolean controls whether or not
> that happens.  They rather expect that if an update happens from the
> same transaction while a before trigger is running, that the NEW
> record will reflect the change.

I think this is mostly a matter of what you get familiar with, and, as
you say, not worth breaking compatibility for.

> I recognize how hard it would be to create that expected behavior,
> and how unlikely it is that the community would accept such a change
> at this point.  But current behavior is to silently do something
> really dumb, so I think some change should be considered -- even if
> that change is to throw an error where we now allow nonsense.
>
> INSERT is not a problem -- if a BEFORE INSERT trigger inserts a row
> with a conflicting primary key (or other unique index key), the
> operation will be rolled back.  That's fine.
>
> I think DELETE can be cleanly fixed with a patch similar to what I
> posted earlier in the thread.  I found one more value that looks
> like it should be set, and it could use some comments, but I believe
> that we can get DELETE behavior which is every bit as sensible as
> INSERT behavior with a very small change.
>
> The worms do come crawling out of the can on BEFORE UPDATE triggers,
> though.  When faced with an UPDATE which hasn't yet been applied,
> and other UPDATEs triggering from within the BEFORE UPDATE trigger
> which touch the same row, it doesn't seem like you can honor both
> the original UPDATE which was requested *and* the triggered UPDATEs.
> Of course, if you discard the original UPDATE, you can't very well
> do anything with the record returned from the BEFORE UPDATE trigger
> for that update.  Since it seems equally evil to allow the update
> while ignoring some of the work caused by its trigger functions as
> to show the work of the triggered updates while suppressing the
> original update, I think the right thing is to throw an error if the
> old row for a BEFORE UPDATE is updated by the same transaction and
> the trigger function ultimately returns a non-NULL value.
>
> Thoughts?

Well, it seems to me that if the trigger update and the main update
were executed as separate commands (with no triggers involved) it
would often be the case that they'd dovetail nicely.  When this has
come up for me, it's usually been the case that the sets of fields
being updated are completely non-overlapping.  So ideally what I'd
like to happen is to have EPQ, or something like it, test whether the
newest version of the row still satisfies the UPDATE criteria.  If so,
it applies the update to the new row version; if not, it either
discards the main UPDATE or throws an error.  There's still some room
here for surprising results, but I think they would be surprising
results arising out of having done something intrinsically
complicated, rather than surprising results arising out of an odd
implementation artifact.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

25 July 2011, 13:27:08

Robert Haas <robertmhaas@gmail.com> wrote:
> Well, it seems to me that if the trigger update and the main
> update were executed as separate commands (with no triggers
> involved) it would often be the case that they'd dovetail nicely. 
> When this has come up for me, it's usually been the case that the
> sets of fields being updated are completely non-overlapping.
Agreed that this is typically the case -- that's why the application
programmers here expected NEW to be effectively a dynamic
representation of the WIP state of the row.  A lot of things would
"just work" that way.  Of course, they're blissfully unaware of what
a huge revamp of the guts of PostgreSQL that would be.
> So ideally what I'd like to happen is to have EPQ, or something
> like it, test whether the newest version of the row still
> satisfies the UPDATE criteria.  If so, it applies the update to
> the new row version; if not, it either discards the main UPDATE or
> throws an error.  There's still some room here for surprising
> results, but I think they would be surprising results arising out
> of having done something intrinsically complicated, rather than
> surprising results arising out of an odd implementation artifact.
So, you're advocating a "logical merge" of the results with
something exceptional done on a conflicting update of the same
columns?  That would effectively get you to the same end result as a
"live" NEW tuple, but without such a radical revamp of the guts of
things.  Still, not trivial to do properly, and I would argue for
throwing an error rather than silently doing something surprising on
conflict.
This issue has already forced the rearrangement of our release
schedule here, so I'm going to do the simple fix of just throwing an
error on update from the BEFORE UPDATE trigger (of the row for with
the trigger is firing).  That fix is very simple and seems very safe
to me, and should allow us to deploy without further schedule
slippage; then I'll see if I can code up what you're suggesting.  I
had a new patch I was about to post with new error language, a
different SQLSTATE, comments, and regression test changes; but
unless someone wants to see that I won't clutter the list with it
until I've had a chance to see if I can manage to handle it the way
you're requesting.
There's no doubt that it would be better the way you're suggesting;
but it looks to me like about five times as many lines of code,
harder to be sure it's right, and probably forcing me to learn a few
new subsystems of PostgreSQL internals to accomplish.
Thanks for the feedback.
-Kevin

Re: WIP fix proposal for bug #6123

From

Robert Haas

Date:

25 July 2011, 13:57:19

On Mon, Jul 25, 2011 at 12:26 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> There's no doubt that it would be better the way you're suggesting;
> but it looks to me like about five times as many lines of code,
> harder to be sure it's right, and probably forcing me to learn a few
> new subsystems of PostgreSQL internals to accomplish.

Sorry, I didn't mean to make homework for you.  Nor am I sure that the
solution will pass must all around even if I think it's the best thing
since sliced bread.  I was just throwing it out there as what I would
like to have happen in an ideal world...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

25 July 2011, 14:12:43

Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jul 25, 2011 at 12:26 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> There's no doubt that it would be better the way you're
>> suggesting; but it looks to me like about five times as many
>> lines of code, harder to be sure it's right, and probably forcing
>> me to learn a few new subsystems of PostgreSQL internals to
>> accomplish.
> 
> Sorry, I didn't mean to make homework for you.  Nor am I sure that
> the solution will pass must all around even if I think it's the
> best thing since sliced bread.  I was just throwing it out there
> as what I would like to have happen in an ideal world...
Well, if it can be done, it will be better and less likely to break
existing code, so it's at least worth looking at.  I don't object to
broadening my horizons.  ;-)  Sorry if it sounded like a complaint;
my intention was to communicate that I'm going to be looking at it,
but I've got a few more urgent tasks to deal with first to get our
application release out the door.
By the way, my current patch does break two existing UPDATE
statements in the regression test misc.sql file:
| -- This non-func update stuff needs to be examined
| -- more closely.          - jolly (2/22/96)
| --
| UPDATE tmp
|    SET stringu1 = reverse_name(onek.stringu1)
|    FROM onek
|    WHERE onek.stringu1 = 'JBAAAA' and
|       onek.stringu1 = tmp.stringu1;
| 
| UPDATE tmp
|    SET stringu1 = reverse_name(onek2.stringu1)
|    FROM onek2
|    WHERE onek2.stringu1 = 'JCAAAA' and
|       onek2.stringu1 = tmp.stringu1;
Perhaps it's time....
-Kevin

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

01 August 2011, 15:03:06

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

> By the way, my current patch does break two existing UPDATE
> statements in the regression test misc.sql file:

Fixed in the attached version of the patch.

I consider the trigger behavior addressed by this patch to be a bug,
and serious enough to merit inclusion of a fix in the 9.1 release,
even at this late stage.  I don't think it should be back-patched,
because it changes behavior that there is some remote chance that
somebody somewhere understands and is intentionally using.  While I
agree that Robert's suggested approach (or something functionally
equivalent) would be the ideal long-term solution, I think that it
would be too large of a change to consider for 9.1.  I am suggesting
a minimal "defensive" change for 9.1, with possible development of a
fancier approach in a later release.

To recap: a trigger for UPDATE BEFORE EACH ROW or DELETE BEFORE EACH
ROW, which directly or indirectly causes update of the OLD row in
the trigger, can cause the triggering operation to be silently
ignored even though the trigger returns a non-NULL record, and even
though all triggered modifications are persisted.  The only clue
that an incomplete and incongruous set of operations has been
performed is that the UPDATE or DELETE count from the statement is
reduced by the number of rows on which this occurred: if an UPDATE
would have affected 5 rows, but one of them just fired triggers *as
though* it had been updated but was actually left untouched by the
original UPDATE statement, you would get "UPDATE 4" rather than
"UPDATE 5".

This patch causes DELETE to behave as most people would expect, and
throws and ERROR on the conflicting UPDATE case.

So far, beyond my confirmation that it passes all PostgreSQL
regression tests and works as expected in a few ad hoc tests I've
done, there have been six full-time days of business analysts here
testing our applications with a version of 9.0.4 patched this way,
confirming that the application works as expected.  They have a
standard set of testing scripts they use for every release, and
programmers have added tests specifically targeted at this area.
Plus the analysts are trying to exercise as many paths to data
modification as possible, to stress the triggers, including
interfaces with other agencies.  They are scheduled to do a minimum
of 20 more full-time days of testing in the next two weeks, so if
someone wants to suggest changes or alternatives to this particular
patch, there would still be time to get over 100 hours of
professional testing against it -- if it comes through soon enough.

-Kevin

Attachment

bug6123-v2.patch

Re: WIP fix proposal for bug #6123

From

Florian Pflug

Date:

02 August 2011, 05:33:50

On Aug1, 2011, at 20:02 , Kevin Grittner wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> I consider the trigger behavior addressed by this patch to be a bug,
> and serious enough to merit inclusion of a fix in the 9.1 release,
> even at this late stage.

I'm sorry but I disagree, on multiple grounds.

First, I'm not sure this is even a bug. To me, it seems that postgres
currently resolves an inherently ambiguous situation in one possible
way, while you expect it to pick another. It might be that the behaviour
that you suggest is more in line with people's expectations (More on
that below), but that still doesn't make the existing behaviour wrong.

Secondly, even if one considers the current behaviour to be wrong,
that still doesn't prove that your proposed behaviour is any better
(More on that below too). There might very well be situations in which
the current behaviour is preferable over the behaviour you suggest,
and maybe these situations turn out to be much more common than anyone
anticipates right now. But if we apply the patch now, the chance of that
being discovered before 9.1 is released is virtually zero. And after
the release there's no going back, so we might end up changing the
behaviour to the worse.

And lastly, I believe that your ultimate goal of guaranteeing that
a row is actually deleted (updated) after a BEFORE DELETE (BEFORE UPDATE)
trigger has fired is actually insurmountable anyway, at least for
isolation level READ COMMITTED. We don't seem to lock rows before firing
BEFORE DELETE or BEFORE UPDATE triggers, so a row may very well be
updated by a concurrent transaction between the firing of these triggers
and the actual DELETE or UPDATE. In READ COMMITTED mode, we'll then
re-check if the original WHERE condition still applies (using the
EvalPlanQual machinery), and only go forward with the modification
if it does.

Having said all that, I don't doubt that the current behaviour is
source of pain for you, and I do believe that we ought to improve things -
but not by replacing one unsatisfactory behaviour with another. The
root of the issue seems to be that you're trying to do things in a
BEFORE trigger that really belong into an AFTER trigger. If you could
explain why using an AFTER trigger doesn't work for you, maybe we could
improve them to be suitable for your purposes (and please forgive me
if you already did that).

> To recap: a trigger for UPDATE BEFORE EACH ROW or DELETE BEFORE EACH
> ROW, which directly or indirectly causes update of the OLD row in
> the trigger, can cause the triggering operation to be silently
> ignored even though the trigger returns a non-NULL record, and even
> though all triggered modifications are persisted.  The only clue
> that an incomplete and incongruous set of operations has been
> performed is that the UPDATE or DELETE count from the statement is
> reduced by the number of rows on which this occurred: if an UPDATE
> would have affected 5 rows, but one of them just fired triggers *as
> though* it had been updated but was actually left untouched by the
> original UPDATE statement, you would get "UPDATE 4" rather than
> "UPDATE 5".
> 
> This patch causes DELETE to behave as most people would expect, and
> throws and ERROR on the conflicting UPDATE case.

I'm prepared to argue that whether or not people expect the behaviour
that patch implements depends entirely on how you phrase the question.
If you ask "do you expect a row to be actually deleted after BEFORE
DELETE triggers have run", you'll undoubtedly hear "yes". But if you
ask "do you expect a row to be deleted even though it didn't match
the DELETE's WHERE condition" I bet you'll hear a very firm "no". And
equally firm do I expect to be the "no" if you ask "do you expect the
row that is actually deleted and the contents of the variable OLD in
the delete trigger to differ".

But the behaviour indicated in the second and third question *is* what
happens with your patch applied. Here's an example
 create table t (id int); create or replace function on_t_delete() returns trigger as $$ begin   raise notice 'deleting
row%', old.id;   update t set id = -id where id = old.id;   return old; end; $$ language plpgsql volatile; create
triggert_delete   before delete on t   for each row execute procedure on_t_delete();

 insert into t (id) values (1); insert into t (id) values (2);
 delete from t where id > 0;

Without your patch, the DELETE doesn't remove any rows, because
they're updated in on_t_delete(). With your patch applied, the
rows are removed - even though, due to the UPDATE in on_t_delete(),
they *don't* match the DELETE's WHERE condition (id > 0) at the time
they're deleted.

best regards,
Florian Pflug

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

02 August 2011, 12:03:30

Florian Pflug <fgp@phlo.org> wrote:
> On Aug1, 2011, at 20:02 , Kevin Grittner wrote:
>> I consider the trigger behavior addressed by this patch to be a
>> bug, and serious enough to merit inclusion of a fix in the 9.1
>> release, even at this late stage.
> 
> I'm sorry but I disagree, on multiple grounds.
Thanks for offering your thoughts!
> First, I'm not sure this is even a bug. To me, it seems that
> postgres currently resolves an inherently ambiguous situation in
> one possible way, while you expect it to pick another. It might be
> that the behaviour that you suggest is more in line with people's
> expectations (More on that below), but that still doesn't make the
> existing behaviour wrong.
That point would be a hard sell for me.  To silently turn a BEFORE
trigger into an INSTEAD OF trigger invites data corruption, in the
sense that business rules that you thought were unconditionally
enforced in triggers can be broken with no obvious hint that it
happened.
> Secondly, even if one considers the current behaviour to be wrong,
> that still doesn't prove that your proposed behaviour is any
> better
Granted.  I tried to phrase my post to not preclude other solutions.
> There might very well be situations in which the current behaviour
> is preferable over the behaviour you suggest, and maybe these
> situations turn out to be much more common than anyone anticipates
> right now.
My imagination is not up to the task of putting any plausible
situations forward where current behavior is a good thing.  Of
course I realize that's not proof of anything, but it leaves me
skeptical.
> But if we apply the patch now, the chance of that being discovered
> before 9.1 is released is virtually zero. And after the release
> there's no going back, so we might end up changing the behaviour
> to the worse.
Perhaps throwing an error on the DELETE case as well as the UPDATE
case would address this concern.  That would have saved weeks of
staff time here when we started seeing the mysterious behavior which
currently exists.  The only way we finally got to the bottom of it
was to step through execution in gdb; those users not equipped to do
so might have bigger problems than we did.
We would probably still be patching the version we use in our shop
to actually *do* the delete if no error is thrown and no BEFORE
trigger returns NULL, but we would have been able to get there a lot
faster.
> And lastly, I believe that your ultimate goal of guaranteeing that
> a row is actually deleted (updated) after a BEFORE DELETE (BEFORE
> UPDATE) trigger has fired is actually insurmountable anyway, at
> least for isolation level READ COMMITTED.
Hmm.  That's a very potent point.  I had not considered READ
COMMITTED because we have never, ever used it under PostgreSQL. 
Obviously, any solution has to work for those who *do* use it, too;
and the behavior between different isolation levels can't be too
convoluted.  So, as you say, the *ultimate* goal may be
unattainable.
> Having said all that, I don't doubt that the current behaviour is
> source of pain for you,
It has thrown off our production development and deployment schedule
by weeks, and management has considered shelving the whole concept
of using PostgreSQL triggers because of it.  I've convinced them
that this is a surmountable obstacle, and am trying to get things
back on track.
> and I do believe that we ought to improve things - but not by
> replacing one unsatisfactory behaviour with another.
I hope you're not suggesting *that's* my goal!  ;-)
> The root of the issue seems to be that you're trying to do things
> in a BEFORE trigger that really belong into an AFTER trigger.
I'll take that as a *collective* pronoun, since I haven't personally
written any of the thousands of triggers.  There are 20 programmers
doing that.  Part of the problem is that they sometimes put things
in a BEFORE trigger that belong in an AFTER trigger.  I caught one
place they were doing an UPDATE of the target row in a BEFORE
trigger rather than setting the values in the NEW record.  My patch
makes the latter throw an error, as I believe it should, rather than
silently leaving the data in a bad state.
> If you could explain why using an AFTER trigger doesn't work for
> you, maybe we could improve them to be suitable for your purposes
> (and please forgive me if you already did that).
I did to some extent, but really perhaps the biggest issue (which I
don't think I've really covered earlier in the thread) is to not
silently corrupt the data.  I would settle for throwing an error on
the DELETE case as well as the UPDATE case, because my data would
then be protected, and programmers would be forced to work around
the issue in a way that PostgreSQL can handle correctly.
Robert said that he has mainly run into this on BEFORE UPDATE
triggers, so perhaps he can elaborate on the usage patterns that run
into it there.
In my case it is mainly an issue in deletes (and the special case of
a "purge" process where deletes are not normally allowed) doing a
depth-first search for dependent records, and deleting from the
bottom up.  In some cases there is redundant information in higher
level tables based on primary data in lower level tables -- in the
form of counts, sums, or status codes.  If you delete a case, and
that "cascades" to a event history table which has a trigger which
updates case status according to certain business rules, the delete
of the case is canceled because of the delete of a status-changing
event.  That's very bad.  If we at least threw an error instead, we
could identify the problem reliably, and code around it somehow. 
That might be by moving the delete of dependent records to a point
after the parent record has already been deleted, but that runs the
risk of triggering other validation failures based on business rules
in the triggers, because the triggers would then be seeing a state
we try very hard to prevent.
>> To recap: a trigger for UPDATE BEFORE EACH ROW or DELETE BEFORE
>> EACH ROW, which directly or indirectly causes update of the OLD
>> row in the trigger, can cause the triggering operation to be
>> silently ignored even though the trigger returns a non-NULL
>> record, and even though all triggered modifications are
>> persisted.  The only clue that an incomplete and incongruous set
>> of operations has been performed is that the UPDATE or DELETE
>> count from the statement is reduced by the number of rows on
>> which this occurred: if an UPDATE would have affected 5 rows, but
>> one of them just fired triggers *as though* it had been updated
>> but was actually left untouched by the original UPDATE statement,
>> you would get "UPDATE 4" rather than "UPDATE 5".
>> 
>> This patch causes DELETE to behave as most people would expect,
>> and throws and ERROR on the conflicting UPDATE case.
> 
> I'm prepared to argue that whether or not people expect the
> behaviour that patch implements depends entirely on how you phrase
> the question. If you ask "do you expect a row to be actually
> deleted after BEFORE DELETE triggers have run", you'll undoubtedly
> hear "yes". But if you ask "do you expect a row to be deleted even
> though it didn't match the DELETE's WHERE condition" I bet you'll
> hear a very firm "no".
OK.  I guess I've neglected that possibility based on my transaction
isolation bias and the fact that the types of changes being made in
the cascade of deletes wouldn't tend to create such a situation. 
But stepping back, I see your point there.
> And equally firm do I expect to be the "no" if you ask "do you
> expect the row that is actually deleted and the contents of the
> variable OLD in the delete trigger to differ".
I'm less certain about that, especially if you phrase it in terms of
denormalization optimizations -- the redundant data in the form of
counts, sums, status codes, or sometimes just copied data from child
records stored in parents to support better performance on common
queries.  That's the main issue for me,
> But the behaviour indicated in the second and third question *is*
> what happens with your patch applied. Here's an example
> 
>   create table t (id int);
>   create or replace function on_t_delete() returns trigger as $$
>   begin
>     raise notice 'deleting row %', old.id;
>     update t set id = -id where id = old.id;
>     return old;
>   end;
>   $$ language plpgsql volatile;
>   create trigger t_delete
>     before delete on t
>     for each row execute procedure on_t_delete();
> 
>   insert into t (id) values (1);
>   insert into t (id) values (2);
> 
>   delete from t where id > 0;
> 
> Without your patch, the DELETE doesn't remove any rows, because
> they're updated in on_t_delete(). With your patch applied, the
> rows are removed - even though, due to the UPDATE in
> on_t_delete(), they *don't* match the DELETE's WHERE condition (id
> > 0) at the time they're deleted.
That's not the sort of situation we're seeing here, but I get your
point.
Would you feel comfortable with a patch which threw an error on the
DELETE case, as it does on the UPDATE case?
-Kevin

Re: WIP fix proposal for bug #6123

From

Florian Pflug

Date:

02 August 2011, 14:52:39

On Aug2, 2011, at 17:03 , Kevin Grittner wrote:
> Florian Pflug <fgp@phlo.org> wrote:
>> First, I'm not sure this is even a bug. To me, it seems that
>> postgres currently resolves an inherently ambiguous situation in
>> one possible way, while you expect it to pick another. It might be
>> that the behaviour that you suggest is more in line with people's
>> expectations (More on that below), but that still doesn't make the
>> existing behaviour wrong.
> 
> That point would be a hard sell for me.  To silently turn a BEFORE
> trigger into an INSTEAD OF trigger invites data corruption, in the
> sense that business rules that you thought were unconditionally
> enforced in triggers can be broken with no obvious hint that it
> happened.

Hm, I can see your point. But I still maintain that a trigger who
acts based on values of OLD which don't reflect the actual tuple
being deleted is an equally dangerous source of data corruption.

>> There might very well be situations in which the current behaviour
>> is preferable over the behaviour you suggest, and maybe these
>> situations turn out to be much more common than anyone anticipates
>> right now.
> 
> My imagination is not up to the task of putting any plausible
> situations forward where current behavior is a good thing.  Of
> course I realize that's not proof of anything, but it leaves me
> skeptical.

One situation I had in mind was a table whose records contain a
kind of reference count. A cleanup process might then issue a DELETE
which removes all entries whose reference count is zero. Now, if such
a table contains a BEFORE DELETE trigger which, through some elaborate
chains of triggers, manages to increment some row's reference count
that was initially zero, it'd be a bad idea for the DELETE to remove it.

But my actual point was not so much that *I* have a reasonable use-case
in mind in which the current behaviour is preferable, but more that
*someone* might.

>> But if we apply the patch now, the chance of that being discovered
>> before 9.1 is released is virtually zero. And after the release
>> there's no going back, so we might end up changing the behaviour
>> to the worse.
> 
> Perhaps throwing an error on the DELETE case as well as the UPDATE
> case would address this concern.

It addresses all concerns except this one, I'm afraid. Whatever we
change the behaviour to, I feel that we need to give ourselves plenty
of time to tweak it, should problems arise. Which just isn't possible
before 9.1 is released. Or at least that my opinion.

> That would have saved weeks of
> staff time here when we started seeing the mysterious behavior which
> currently exists.  The only way we finally got to the bottom of it
> was to step through execution in gdb; those users not equipped to do
> so might have bigger problems than we did.

Ouch.

>> And lastly, I believe that your ultimate goal of guaranteeing that
>> a row is actually deleted (updated) after a BEFORE DELETE (BEFORE
>> UPDATE) trigger has fired is actually insurmountable anyway, at
>> least for isolation level READ COMMITTED.
> 
> Hmm.  That's a very potent point.  I had not considered READ
> COMMITTED because we have never, ever used it under PostgreSQL. 
> Obviously, any solution has to work for those who *do* use it, too;
> and the behavior between different isolation levels can't be too
> convoluted.  So, as you say, the *ultimate* goal may be
> unattainable.

After reading the code again, I take that back. We *do* seem to lock
the tuple before firing BEFORE {UPDATE,DELETE} triggers - trigger.c's
GetTupleForTrigger() takes care of that. I previous missed that because
I only looked at ExecUpdate() and ExecDelete() in nodeModifyTable.c.
Sorry for that.

>> and I do believe that we ought to improve things - but not by
>> replacing one unsatisfactory behaviour with another.
> 
> I hope you're not suggesting *that's* my goal!  ;-)

Nah, more of a case of bad wording on my part...

>> The root of the issue seems to be that you're trying to do things
>> in a BEFORE trigger that really belong into an AFTER trigger.
> 
> I'll take that as a *collective* pronoun, since I haven't personally
> written any of the thousands of triggers.  There are 20 programmers
> doing that.  Part of the problem is that they sometimes put things
> in a BEFORE trigger that belong in an AFTER trigger.  I caught one
> place they were doing an UPDATE of the target row in a BEFORE
> trigger rather than setting the values in the NEW record.  My patch
> makes the latter throw an error, as I believe it should, rather than
> silently leaving the data in a bad state.

>> If you could explain why using an AFTER trigger doesn't work for
>> you, maybe we could improve them to be suitable for your purposes
>> (and please forgive me if you already did that).
> 
> I did to some extent, but really perhaps the biggest issue (which I
> don't think I've really covered earlier in the thread) is to not
> silently corrupt the data.  I would settle for throwing an error on
> the DELETE case as well as the UPDATE case, because my data would
> then be protected, and programmers would be forced to work around
> the issue in a way that PostgreSQL can handle correctly.

Hm, OK I see your point now I believe. This is not so much about
wanting to put things into BEFORe triggers which doesn't really
fit there, but instead avoiding weeks of debugging if someones
messes up. 

> In my case it is mainly an issue in deletes (and the special case of
> a "purge" process where deletes are not normally allowed) doing a
> depth-first search for dependent records, and deleting from the
> bottom up.  In some cases there is redundant information in higher
> level tables based on primary data in lower level tables -- in the
> form of counts, sums, or status codes.  If you delete a case, and
> that "cascades" to a event history table which has a trigger which
> updates case status according to certain business rules, the delete
> of the case is canceled because of the delete of a status-changing
> event.  That's very bad.  If we at least threw an error instead, we
> could identify the problem reliably, and code around it somehow. 
> That might be by moving the delete of dependent records to a point
> after the parent record has already been deleted, but that runs the
> risk of triggering other validation failures based on business rules
> in the triggers, because the triggers would then be seeing a state
> we try very hard to prevent.

If we go with my suggestion below (which entails throwing the error
earlier at the time of the offending update or delete), you could
get the non-throwing behaviour you want by protecting UPDATES and
DELETES which might touch rows which are already in the process of being
updated or deleted by EXCEPTION blocks.

>> And equally firm do I expect to be the "no" if you ask "do you
>> expect the row that is actually deleted and the contents of the
>> variable OLD in the delete trigger to differ".
> 
> I'm less certain about that, especially if you phrase it in terms of
> denormalization optimizations -- the redundant data in the form of
> counts, sums, status codes, or sometimes just copied data from child
> records stored in parents to support better performance on common
> queries.  That's the main issue for me,

Yeah, but the problem is we cannot guarantee that all the "important"
fields will match ;-)

> Would you feel comfortable with a patch which threw an error on the
> DELETE case, as it does on the UPDATE case?

Yes, though with a little additional twist. The twist being that I'd
like the error to be thrown earlier, at the time of the offending
UPDATE or DELETE, not later, when the original UPDATE or DELETE which
caused the BEFORE trigger invocation stumbles over the updated row.
It not only seems more logical that way, but it also makes it possible
for the trigger to catch the error, and react accordingly.

For example, if a BEFORE trigger on table A modifies table B, and
one of B's BEFORE triggers in turn modifies A, B's BEFORE trigger
could then catch the error and e.g. decide to skip the UPDATE.

Implementation-wise, I image we'd set a flag in the tuple header 
after locking the tuple (as I now discovered we do, sorry again)
but before firing BEFORE triggers. We'd clear the flag again
once all BEFORE triggers have run, and let the actual UPDATE
or DELETE proceed. If an UPDATE or DELETE encounters a flagged
tuple, and the transaction (or one of its parents) is the current
lock holder, we'd throw an error. To clean up after aborted
transactions, we'd clear the flag upon acquiring a tuple lock.

Instead of a flag in the tuple header, we could also keep
a (backend-local) list of ctids for which we've fired BEFORE
triggers but haven't done the actual modification yet.

We might also want to make it possible to distinguish pending
UPDATES from pending DELETES (i.e., choose a different error code for
the two cases), because it seems reasonable that code would want
to react differently in those cases (e.g., skip an UPDATE if there's
a pending DELETE).

best regards,
Florian Pflug

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

02 August 2011, 15:32:20

Florian Pflug <fgp@phlo.org> wrote:
> On Aug2, 2011, at 17:03 , Kevin Grittner wrote:
> Hm, OK I see your point now I believe. This is not so much about
> wanting to put things into BEFORe triggers which doesn't really
> fit there, but instead avoiding weeks of debugging if someones
> messes up.
I would say that is the overriding concern.  But there are arguments
for sometimes putting some things in the BEFORE triggers which could
raise the issue, too.  For us, there would be a lot to dance around
if deletes cascade in the AFTER triggers, since we'd have orphans
hanging around during some validations, as described below.
>> In my case it is mainly an issue in deletes (and the special case
>> of a "purge" process where deletes are not normally allowed)
>> doing a depth-first search for dependent records, and deleting
>> from the bottom up.  In some cases there is redundant information
>> in higher level tables based on primary data in lower level
>> tables -- in the form of counts, sums, or status codes.  If you
>> delete a case, and that "cascades" to a event history table which
>> has a trigger which updates case status according to certain
>> business rules, the delete of the case is canceled because of the
>> delete of a status-changing event.  That's very bad.  If we at
>> least threw an error instead, we could identify the problem
>> reliably, and code around it somehow.  That might be by moving
>> the delete of dependent records to a point after the parent
>> record has already been deleted, but that runs the risk of
>> triggering other validation failures based on business rules in
>> the triggers, because the triggers would then be seeing a state
>> we try very hard to prevent.
> 
> If we go with my suggestion below (which entails throwing the
> error earlier at the time of the offending update or delete), you
> could get the non-throwing behaviour you want by protecting
> UPDATES and DELETES which might touch rows which are already in
> the process of being updated or deleted by EXCEPTION blocks.
>> Would you feel comfortable with a patch which threw an error on
>> the DELETE case, as it does on the UPDATE case?
> 
> Yes, though with a little additional twist. The twist being that
> I'd like the error to be thrown earlier, at the time of the
> offending UPDATE or DELETE, not later, when the original UPDATE or
> DELETE which caused the BEFORE trigger invocation stumbles over
> the updated row. It not only seems more logical that way, but it
> also makes it possible for the trigger to catch the error, and
> react accordingly.
I hadn't thought of that.  It does seem better in every way except
for how much work it takes to do it and how much testing it needs to
have confidence in it.  Definitely not 9.1 material.
That's OK -- we have something which should work for us in 9.0 and
9.1.  I'd really prefer not to "fork" this permanently, but if
there's a better way coming in 9.2, we can use our patch for a year
or so and then switch over to the superior capabilities available in
9.2.
> For example, if a BEFORE trigger on table A modifies table B, and
> one of B's BEFORE triggers in turn modifies A, B's BEFORE trigger
> could then catch the error and e.g. decide to skip the UPDATE.
Yeah, that provides a reasonable default but still gives the
application programmer fine-grained control on this.  That's a good
thing.
> Implementation-wise, I image we'd set a flag in the tuple header 
> after locking the tuple (as I now discovered we do, sorry again)
> but before firing BEFORE triggers. We'd clear the flag again
> once all BEFORE triggers have run, and let the actual UPDATE
> or DELETE proceed. If an UPDATE or DELETE encounters a flagged
> tuple, and the transaction (or one of its parents) is the current
> lock holder, we'd throw an error. To clean up after aborted
> transactions, we'd clear the flag upon acquiring a tuple lock.
> 
> Instead of a flag in the tuple header, we could also keep
> a (backend-local) list of ctids for which we've fired BEFORE
> triggers but haven't done the actual modification yet.
> 
> We might also want to make it possible to distinguish pending
> UPDATES from pending DELETES (i.e., choose a different error code
> for the two cases), because it seems reasonable that code would
> want to react differently in those cases (e.g., skip an UPDATE if
> there's a pending DELETE).
Does anyone else want to weigh in on this overall approach or narrow
down the alternatives Florian has sketched out above?  If we can
reach some consensus on an approach, I can work on a new patch to
implement it.
-Kevin

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

02 August 2011, 15:47:04

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> Florian Pflug <fgp@phlo.org> wrote:
>> Hm, OK I see your point now I believe. This is not so much about
>> wanting to put things into BEFORe triggers which doesn't really
>> fit there, but instead avoiding weeks of debugging if someones
>> messes up.
>  
> I would say that is the overriding concern.
I'm going to have to argue with myself -- really, the *biggest*
concern is that the current situation allows data to be quietly
mangled through non-obvious action-at-a-distance.  The damage might
not be discovered for a very long time.  Debugging the cause of the
mangling, once noticed, is the *next* most significant concern.
-Kevin

Re: WIP fix proposal for bug #6123

From

Robert Haas

Date:

02 August 2011, 23:54:56

On Tue, Aug 2, 2011 at 2:32 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
>>> Would you feel comfortable with a patch which threw an error on
>>> the DELETE case, as it does on the UPDATE case?
>>
>> Yes, though with a little additional twist. The twist being that
>> I'd like the error to be thrown earlier, at the time of the
>> offending UPDATE or DELETE, not later, when the original UPDATE or
>> DELETE which caused the BEFORE trigger invocation stumbles over
>> the updated row. It not only seems more logical that way, but it
>> also makes it possible for the trigger to catch the error, and
>> react accordingly.
>
> I hadn't thought of that.  It does seem better in every way except
> for how much work it takes to do it and how much testing it needs to
> have confidence in it.  Definitely not 9.1 material.

IMHO, none of this is 9.1 material.  It's been like this forever, and
it sucks, but it doesn't suck particularly more than any of the other
things that we didn't get around to fixing in 9.1.  In fact, I'd go so
far as to argue that this would be just about the worst imaginable
type of patch to slip into a release at the last minute.  On the one
hand, I think there's a real chance of breaking things in subtle ways
that are difficult to detect; and on the other hand, a significant
percentage of users will get no benefit from any change we make here,
just because this is something that most people don't do.  People who
would otherwise be  tempted to write their applications this way will
end up not doing so precisely because it currently does not work.  So
I think we should focus on getting the right semantics here, rather
than trying to minimize the scope of the change.

I'm not sure I understand the justification for throwing an error.
Updating a row twice is not an error under any other circumstances;
nor is deleting a row you've updated.  Why should it be here?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP fix proposal for bug #6123

From

Florian Pflug

Date:

03 August 2011, 05:50:17

On Aug3, 2011, at 04:54 , Robert Haas wrote:
> On Tue, Aug 2, 2011 at 2:32 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>>>> Would you feel comfortable with a patch which threw an error on
>>>> the DELETE case, as it does on the UPDATE case?
>>> 
>>> Yes, though with a little additional twist. The twist being that
>>> I'd like the error to be thrown earlier, at the time of the
>>> offending UPDATE or DELETE, not later, when the original UPDATE or
>>> DELETE which caused the BEFORE trigger invocation stumbles over
>>> the updated row. It not only seems more logical that way, but it
>>> also makes it possible for the trigger to catch the error, and
>>> react accordingly.
>> 
>> I hadn't thought of that.  It does seem better in every way except
>> for how much work it takes to do it and how much testing it needs to
>> have confidence in it.  Definitely not 9.1 material.
> 
> I'm not sure I understand the justification for throwing an error.
> Updating a row twice is not an error under any other circumstances;
> nor is deleting a row you've updated.  Why should it be here?

Because updating or deleting a row from within a BEFORE trigger fired
upon updating or deleting that very row causes two intermingled 
UPDATE/DELETE executions, not one UPDATE/DELETE followed by another.

Here's a step-by-step description of what happens in the case of two
UPDATES, assuming that we don't ignore any updated and throw no error.

1) UPDATE T SET ... WHERE ID=1
2) Row with ID=1 is found & locked
3) BEFORE UPDATE triggers are fired, with OLD containing the original  row's contents and NEW the values specified in
(1)
4) UPDATE T SET ... WHERE ID=1, executed from within one of the triggers
5) BEFORE UPDATE triggers are fired again, with OLD containing the  original row's contents and NEW the value specified
in(4). 

6) The row is updated with the values specified in (4), possibly changed  by the triggers in (5)
7) The row is updated with the values specified in (2), possibly changed  by the triggers in (3).

Now, in step (7), we're overwriting the UPDATE from steps 4-6 without
even looking at the row that UPDATE produced. If, for example, both UPDATES
do "counter = counter + 1", we'd end up with the counter incremented by
1, *not* by 2, even though there were two distinct UPDATEs. Or even worse,
the inner UPDATE at (4) might have modified the ID. Then, we'll apply the
outer UPDATE in (7), even though the original WHERE condition from (1)
no longer matches the row.

We could attempt to fix that by using the EvalPlanQual machinery to
re-check the WHERE clause and re-compute the new row in (7). However,
EvalPlanQual introduces a lot of conceptional complexity, and can lead
to very surprising results for more complex UPDATES. (There's that whole
self-join problem with EvalPlanQual, for example)

Also, even if we did use EvalPlanQual, we'd still have executed the outer
BEFORE trigger (step 3) with values for OLD and NEW which *don't* match
the row the we actually update in (7). Doing that seems rather hazardous
too - who knows what the BEFORE trigger has used the values for.

The different between Kevin's original patch and my suggestion is, BTW,
that he made step (7) through an error while I suggested the error to
be thrown already at (4).

best regards,
Florian Pflug

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

03 August 2011, 11:48:43

Robert Haas <robertmhaas@gmail.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>>>> Would you feel comfortable with a patch which threw an error on
>>>> the DELETE case, as it does on the UPDATE case?
>>>
>>> Yes, though with a little additional twist. The twist being that
>>> I'd like the error to be thrown earlier, at the time of the
>>> offending UPDATE or DELETE, not later, when the original UPDATE
>>> or DELETE which caused the BEFORE trigger invocation stumbles
>>> over the updated row. It not only seems more logical that way,
>>> but it also makes it possible for the trigger to catch the
>>> error, and react accordingly.
>>
>> I hadn't thought of that.  It does seem better in every way
>> except for how much work it takes to do it and how much testing
>> it needs to have confidence in it.  Definitely not 9.1 material.
> 
> IMHO, none of this is 9.1 material.
Nobody is currently arguing for including anything to fix this in
9.1.  Perhaps I should have been more explicit that in agreeing with
Florian, I'm giving up on the idea of any PostgreSQL release
attempting to do anything about this before 9.2.
That said, this is a data corrupting bug in the view of management
here, based on reports from testers and my explanation of why they
saw that behavior.  They are aware that it can be worked around, but
that's not acceptable if there's no reliable way to find all
occurrences of patterns that hit this bug.  To keep this effort
alive here, I am using my quick hack for 9.0 and 9.1.  Deletes will
behave as they expect, and updates won't quietly discard part of the
work of a transaction -- an error will be thrown in that situation.
> It's been like this forever
As have many other data mangling bugs which we fix in minor
releases.  Our point here is that it's never been like this in any
product that the Wisconsin Courts has used for triggers, and never
will be.
> People who would otherwise be  tempted to write their applications
> this way will end up not doing so precisely because it currently
> does not work.
I can attest to that.
> So I think we should focus on getting the right semantics here,
> rather than trying to minimize the scope of the change.
Agreed.  I thought that's what we were talking about.
> I'm not sure I understand the justification for throwing an error.
Nobody has suggested sane semantics for doing otherwise.  There is
the pipe dream of what application programmers here would like,
described earlier in the thread.  I just don't see that happening,
and I'm not sure it addresses all of Florian's concerns anyway.  You
had a suggestion for how to salvage the update situation, but it was
pretty hand-wavy, and I find it very hard to reconcile to some of
the issues raised by Florian.  Maybe someone can propose sane
semantics which covers all the objections, and maybe that would even
be possible to implement; but right now Florian's approach seems
like the most solid proposal yet put forward.
> Updating a row twice is not an error under any other
> circumstances; nor is deleting a row you've updated.  Why should
> it be here?
As Florian pointed out, it's not exactly a matter of doing those
things.  If you have a proposal which addresses the issues raised
earlier in the thread, please share.
-Kevin

Re: WIP fix proposal for bug #6123

From

Robert Haas

Date:

03 August 2011, 12:55:39

On Wed, Aug 3, 2011 at 10:48 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> As have many other data mangling bugs which we fix in minor
> releases.  Our point here is that it's never been like this in any
> product that the Wisconsin Courts has used for triggers, and never
> will be.

I don't believe this is very similar at all to the sorts of bugs we
fix in minor releases, but let's leave that aside for the moment since
it seems that we're in violent agreement about which release we're
targeting.  With respect to the second part of your comment, it might
be useful to describe how this works in some of those other products.

>> I'm not sure I understand the justification for throwing an error.
>
> Nobody has suggested sane semantics for doing otherwise.  There is
> the pipe dream of what application programmers here would like,
> described earlier in the thread.  I just don't see that happening,
> and I'm not sure it addresses all of Florian's concerns anyway.  You
> had a suggestion for how to salvage the update situation, but it was
> pretty hand-wavy, and I find it very hard to reconcile to some of
> the issues raised by Florian.  Maybe someone can propose sane
> semantics which covers all the objections, and maybe that would even
> be possible to implement; but right now Florian's approach seems
> like the most solid proposal yet put forward.

It probably is, but I'm not sure we've done enough thinking about it
to be certain that it's the right way forward.

On that note, I think in some ways the problems we're hitting here are
very much like serialization anomalies.  If the user updates a tuple
based on its PK and sets some non-PK field to a constant, and while
we're doing that, a BEFORE trigger updates any field in the tuple
other than the PK, then in theory it seems we ought to be able to
reconcile the updates.  It feels like the result ought to be the same
as if we'd simply run the BEFORE-trigger update to completion, and
then run the main update.  However, if the BEFORE-trigger modifies any
columns that the main update examined while constructing its value for
NEW, then the updates can't be serialized.  There's no way to get the
same result as if you'd done either one of them first, because they
are inextricably intertwined.

In practice, my hand-wavy reference to "reconciling the updates" is a
problem because of the way the trigger interface works.  It feels
pretty impossible to decide that we're going to do the update, but
with some other random values we dredged up from some other update
replacing some of the ones the user explicitly handed to us.  But if
the trigger could return an indication of which column values it
wished to override, then it seems to me that we could piece together a
reasonable set of semantics.  It's not exactly clear how to make that
work, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP fix proposal for bug #6123

From

Robert Haas

Date:

03 August 2011, 12:57:19

On Wed, Aug 3, 2011 at 4:50 AM, Florian Pflug <fgp@phlo.org> wrote:
> On Aug3, 2011, at 04:54 , Robert Haas wrote:
>> On Tue, Aug 2, 2011 at 2:32 PM, Kevin Grittner
>> <Kevin.Grittner@wicourts.gov> wrote:
>>>>> Would you feel comfortable with a patch which threw an error on
>>>>> the DELETE case, as it does on the UPDATE case?
>>>>
>>>> Yes, though with a little additional twist. The twist being that
>>>> I'd like the error to be thrown earlier, at the time of the
>>>> offending UPDATE or DELETE, not later, when the original UPDATE or
>>>> DELETE which caused the BEFORE trigger invocation stumbles over
>>>> the updated row. It not only seems more logical that way, but it
>>>> also makes it possible for the trigger to catch the error, and
>>>> react accordingly.
>>>
>>> I hadn't thought of that.  It does seem better in every way except
>>> for how much work it takes to do it and how much testing it needs to
>>> have confidence in it.  Definitely not 9.1 material.
>>
>> I'm not sure I understand the justification for throwing an error.
>> Updating a row twice is not an error under any other circumstances;
>> nor is deleting a row you've updated.  Why should it be here?
>
> Because updating or deleting a row from within a BEFORE trigger fired
> upon updating or deleting that very row causes two intermingled
> UPDATE/DELETE executions, not one UPDATE/DELETE followed by another.
>
> Here's a step-by-step description of what happens in the case of two
> UPDATES, assuming that we don't ignore any updated and throw no error.
>
> 1) UPDATE T SET ... WHERE ID=1
> 2) Row with ID=1 is found & locked
> 3) BEFORE UPDATE triggers are fired, with OLD containing the original
>   row's contents and NEW the values specified in (1)
> 4) UPDATE T SET ... WHERE ID=1, executed from within one of the triggers
> 5) BEFORE UPDATE triggers are fired again, with OLD containing the
>   original row's contents and NEW the value specified in (4).
> 6) The row is updated with the values specified in (4), possibly changed
>   by the triggers in (5)
> 7) The row is updated with the values specified in (2), possibly changed
>   by the triggers in (3).
>
> Now, in step (7), we're overwriting the UPDATE from steps 4-6 without
> even looking at the row that UPDATE produced. If, for example, both UPDATES
> do "counter = counter + 1", we'd end up with the counter incremented by
> 1, *not* by 2, even though there were two distinct UPDATEs. Or even worse,
> the inner UPDATE at (4) might have modified the ID. Then, we'll apply the
> outer UPDATE in (7), even though the original WHERE condition from (1)
> no longer matches the row.
>
> We could attempt to fix that by using the EvalPlanQual machinery to
> re-check the WHERE clause and re-compute the new row in (7). However,
> EvalPlanQual introduces a lot of conceptional complexity, and can lead
> to very surprising results for more complex UPDATES. (There's that whole
> self-join problem with EvalPlanQual, for example)
>
> Also, even if we did use EvalPlanQual, we'd still have executed the outer
> BEFORE trigger (step 3) with values for OLD and NEW which *don't* match
> the row the we actually update in (7). Doing that seems rather hazardous
> too - who knows what the BEFORE trigger has used the values for.
>
> The different between Kevin's original patch and my suggestion is, BTW,
> that he made step (7) through an error while I suggested the error to
> be thrown already at (4).

I think Kevin's proposal is better, because AFAICS if the BEFORE
UPDATE trigger returns NULL then we haven't got a problem; and we
can't know that until we get to step 7.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP fix proposal for bug #6123

From

Florian Pflug

Date:

03 August 2011, 13:27:38

On Aug3, 2011, at 17:55 , Robert Haas wrote:
> On that note, I think in some ways the problems we're hitting here are
> very much like serialization anomalies.

Yeah, I had the same feeling of familiarity ;-)

> If the user updates a tuple
> based on its PK and sets some non-PK field to a constant, and while
> we're doing that, a BEFORE trigger updates any field in the tuple
> other than the PK, then in theory it seems we ought to be able to
> reconcile the updates.  It feels like the result ought to be the same
> as if we'd simply run the BEFORE-trigger update to completion, and
> then run the main update.  However, if the BEFORE-trigger modifies any
> columns that the main update examined while constructing its value for
> NEW, then the updates can't be serialized.

Going down that road opens the door to a *lot* of subtle semantic
differences between currently equivalent queries. For example,
 UPDATE T SET f=f, a=1

would behave differently then
 UPDATE T SET a=1

because in the first case the new row would depend on the old row's
value of "f", while in the second case it doesn't.

> There's no way to get the
> same result as if you'd done either one of them first, because they
> are inextricably intertwined.
> 
> In practice, my hand-wavy reference to "reconciling the updates" is a
> problem because of the way the trigger interface works.  It feels
> pretty impossible to decide that we're going to do the update, but
> with some other random values we dredged up from some other update
> replacing some of the ones the user explicitly handed to us.  But if
> the trigger could return an indication of which column values it
> wished to override, then it seems to me that we could piece together a
> reasonable set of semantics.  It's not exactly clear how to make that
> work, though.

I dunno, that all still feels awfully complex. As you said yourself,
this case is quite similar to a serialization anomaly. Taking that
correspondence further, that reconciliation of updates is pretty much
what the EPQ machinery does in READ COMMITTED mode. Now, we ourselves
have warned users in the past to *not* use READ COMMITTED mode if they
do complex updates (e.g., use UPDATE ... FROM ...), because the behaviour
of that reconciliation machinery in the present of concurrent updates
is extremely hard to predict. I thus don't believe that it's a good idea
to introduce similarly complex behaviour in other parts of the system -
and particularly not if you cannot disable it by switching to another
isolation level.

Simply throwing an error, on the other hand, makes the behaviour simple
to understand and explain.

best regards,
Florian Pflug

Re: WIP fix proposal for bug #6123

From

Florian Pflug

Date:

03 August 2011, 13:56:22

On Aug3, 2011, at 17:57 , Robert Haas wrote:
> On Wed, Aug 3, 2011 at 4:50 AM, Florian Pflug <fgp@phlo.org> wrote:
>> The different between Kevin's original patch and my suggestion is, BTW,
>> that he made step (7) through an error while I suggested the error to
>> be thrown already at (4).
> 
> I think Kevin's proposal is better, because AFAICS if the BEFORE
> UPDATE trigger returns NULL then we haven't got a problem; and we
> can't know that until we get to step 7.

Hm, not sure if I agree. A BEFORE trigger might decide to block an update,
and then take some action based on the assumption that the row is actually
unmodified (i.e., identical to the OLD value observed by the trigger).

To me, it still seems conceptionally cleaner to just decree that a row
must not be modified while BEFORE triggers are running, period.

This, BTW, also matches what Oracle does, only on a per-row instead of
per-table basis. Oracle AFAIR simply forbids touching of the table a BEFORE
trigger is attached to from within that trigger. (They even forbid SELECTS,
which is presumably because they don't have an equivalent of our per-row
command id, i.e. cannot ensure that such a SELECT sees the state the table
was in at the beginning of the statement)

best regards,
Florian Pflug

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

03 August 2011, 14:01:45

Robert Haas <robertmhaas@gmail.com> wrote:
> Florian Pflug <fgp@phlo.org> wrote:
>> Here's a step-by-step description of what happens in the case of
>> two UPDATES, assuming that we don't ignore any updated and throw
>> no error.
>>
>> 1) UPDATE T SET ... WHERE ID=1
>> 2) Row with ID=1 is found & locked
>> 3) BEFORE UPDATE triggers are fired, with OLD containing the
>>    original row's contents and NEW the values specified in (1)
>> 4) UPDATE T SET ... WHERE ID=1, executed from within one of the
>>    triggers
>> 5) BEFORE UPDATE triggers are fired again, with OLD containing
>>    the original row's contents and NEW the value specified in
>>    (4).
>> 6) The row is updated with the values specified in (4), possibly
>>    changed by the triggers in (5)
>> 7) The row is updated with the values specified in (2), possibly
>>    changed by the triggers in (3).
>> The different between Kevin's original patch and my suggestion
>> is, BTW, that he made step (7) through an error while I suggested
>> the error to be thrown already at (4).
> 
> I think Kevin's proposal is better, because AFAICS if the BEFORE
> UPDATE trigger returns NULL then we haven't got a problem; and we
> can't know that until we get to step 7.
Robert makes a good point -- at step 4 we can't know whether the
BEFORE trigger will return an unmodified record, a modified record,
or NULL.  Without some way to deal with that, an throwing the error
earlier seems like a non-starter.
-Kevin

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

03 August 2011, 14:07:40

Florian Pflug <fgp@phlo.org> wrote:
> To me, it still seems conceptionally cleaner to just decree that a
> row must not be modified while BEFORE triggers are running,
> period.
> 
> This, BTW, also matches what Oracle does, only on a per-row
> instead of per-table basis. Oracle AFAIR simply forbids touching
> of the table a BEFORE trigger is attached to from within that
> trigger. (They even forbid SELECTS, which is presumably because
> they don't have an equivalent of our per-row command id, i.e.
> cannot ensure that such a SELECT sees the state the table was in
> at the beginning of the statement)
It appears this was in flight while I was composing my last post. 
This seems pretty strict, but given the trade-offs, perhaps it is
worth it.  I can live with either solution, myself.
-Kevin

Re: WIP fix proposal for bug #6123

From

Robert Haas

Date:

03 August 2011, 14:27:29

On Wed, Aug 3, 2011 at 12:27 PM, Florian Pflug <fgp@phlo.org> wrote:
> Going down that road opens the door to a *lot* of subtle semantic
> differences between currently equivalent queries. For example,
>
>  UPDATE T SET f=f, a=1
>
> would behave differently then
>
>  UPDATE T SET a=1
>
> because in the first case the new row would depend on the old row's
> value of "f", while in the second case it doesn't.

True.  But I'm not really bothered by that.  If the user gets an error
message that says:

ERROR: updated column "f" has already been modified by a BEFORE trigger

...the user will realize the error of their ways.

>> There's no way to get the
>> same result as if you'd done either one of them first, because they
>> are inextricably intertwined.
>>
>> In practice, my hand-wavy reference to "reconciling the updates" is a
>> problem because of the way the trigger interface works.  It feels
>> pretty impossible to decide that we're going to do the update, but
>> with some other random values we dredged up from some other update
>> replacing some of the ones the user explicitly handed to us.  But if
>> the trigger could return an indication of which column values it
>> wished to override, then it seems to me that we could piece together a
>> reasonable set of semantics.  It's not exactly clear how to make that
>> work, though.
>
> I dunno, that all still feels awfully complex. As you said yourself,
> this case is quite similar to a serialization anomaly. Taking that
> correspondence further, that reconciliation of updates is pretty much
> what the EPQ machinery does in READ COMMITTED mode. Now, we ourselves
> have warned users in the past to *not* use READ COMMITTED mode if they
> do complex updates (e.g., use UPDATE ... FROM ...), because the behaviour
> of that reconciliation machinery in the present of concurrent updates
> is extremely hard to predict. I thus don't believe that it's a good idea
> to introduce similarly complex behaviour in other parts of the system -
> and particularly not if you cannot disable it by switching to another
> isolation level.
>
> Simply throwing an error, on the other hand, makes the behaviour simple
> to understand and explain.

True.  I'm coming around to liking that behavior better than I did on
first hearing, but I'm still not crazy about it, because as an app
developer I would really like to have at least the unproblematic cases
actually work.  Throwing an error at least makes it clear that you've
done something which isn't supported, and that's probably an
improvement over the current, somewhat-baffling behavior.  However,
it's not even 25% as nice as having it actually work as intended.
That's why, even if we can't make all the cases work sanely, I'd be a
lot more enthusiastic about it if we could find a way to make at least
some of them work sanely.  The mind-bending cases are unlikely to be
the ones people do on purpose.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

08 August 2011, 12:02:44

Robert Haas <robertmhaas@gmail.com> wrote:
> Florian Pflug <fgp@phlo.org> wrote:
>> Going down that road opens the door to a *lot* of subtle semantic
>> differences between currently equivalent queries. For example,
>>
>>  UPDATE T SET f=f, a=1
>>
>> would behave differently then
>>
>>  UPDATE T SET a=1
>>
>> because in the first case the new row would depend on the old
>> row's value of "f", while in the second case it doesn't.
> 
> True.  But I'm not really bothered by that.  If the user gets an
> error message that says:
> 
> ERROR: updated column "f" has already been modified by a BEFORE
> trigger
> 
> ...the user will realize the error of their ways.
> 
>>> There's no way to get the same result as if you'd done either
>>> one of them first, because >>> they are inextricably
>>> intertwined.
>>>
>>> In practice, my hand-wavy reference to "reconciling the updates"
>>> is a problem because of the way the trigger interface works.  It
>>> feels pretty impossible to decide that we're going to do the
>>> update, but with some other random values we dredged up from
>>> some other update replacing some of the ones the user explicitly
>>> handed to us. But if the trigger could return an indication of
>>> which column values it wished to override, then it seems to me
>>> that we could piece together a reasonable set of semantics. 
>>> It's not exactly clear how to make that work, though.
>>
>> I dunno, that all still feels awfully complex. As you said
>> yourself, this case is quite similar to a serialization anomaly.
>> Taking that correspondence further, that reconciliation of
>> updates is pretty much what the EPQ machinery does in READ
>> COMMITTED mode. Now, we ourselves have warned users in the past
>> to *not* use READ COMMITTED mode if they do complex updates
>> (e.g., use UPDATE ... FROM ...), because the behaviour of that
>> reconciliation machinery in the present of concurrent updates is
>> extremely hard to predict. I thus don't believe that it's a good
>> idea to introduce similarly complex behaviour in other parts of
>> the system - and particularly not if you cannot disable it by
>> switching to another isolation level.
>>
>> Simply throwing an error, on the other hand, makes the behaviour
>> simple to understand and explain.
> 
> True.  I'm coming around to liking that behavior better than I did
> on first hearing, but I'm still not crazy about it, because as an
> app developer I would really like to have at least the
> unproblematic cases actually work.  Throwing an error at least
> makes it clear that you've done something which isn't supported,
> and that's probably an improvement over the current, somewhat-
> baffling behavior. However, it's not even 25% as nice as having it
> actually work as intended.  That's why, even if we can't make all
> the cases work sanely, I'd be a lot more enthusiastic about it if
> we could find a way to make at least some of them work sanely. 
> The mind-bending cases are unlikely to be the ones people do on
> purpose.
So, we have three options on the table here:
(1) We (the Wisconsin Courts) are using a very simple fix to work
around the issue so we can move forward with conversion to
PostgreSQL triggers.  A DELETE is allowed to complete if the BEFORE
trigger doesn't return NULL, even if the row was updated while the
trigger was executing.  An UPDATE fails if the BEFORE trigger
doesn't return NULL and any other update is made to the row while
the trigger was executing.
(2) You (Robert) have proposed (as I understand it) modifying that
approach to allow some UPDATE cases to work, where there are not
conflicting updates to any one column within the row.
(3) Florian has proposed banning all updates to a row which is being
processed by a BEFORE UPDATE or BEFORE DELETE trigger.  As I
understand it, this would be similar to the approach taken by
Oracle, although less strict.
I can live with any of these solutions.  How do we move forward to
consensus so that a patch can be written?  Does anyone else have a
preference for one of these approaches versus another?
-Kevin

Re: WIP fix proposal for bug #6123

From

Florian Pflug

Date:

08 August 2011, 12:35:50

On Aug8, 2011, at 17:02 , Kevin Grittner wrote:
> So, we have three options on the table here:
> 
> (1) We (the Wisconsin Courts) are using a very simple fix to work
> around the issue so we can move forward with conversion to
> PostgreSQL triggers.  A DELETE is allowed to complete if the BEFORE
> trigger doesn't return NULL, even if the row was updated while the
> trigger was executing.  An UPDATE fails if the BEFORE trigger
> doesn't return NULL and any other update is made to the row while
> the trigger was executing.
> 
> (2) You (Robert) have proposed (as I understand it) modifying that
> approach to allow some UPDATE cases to work, where there are not
> conflicting updates to any one column within the row.
> 
> (3) Florian has proposed banning all updates to a row which is being
> processed by a BEFORE UPDATE or BEFORE DELETE trigger.  As I
> understand it, this would be similar to the approach taken by
> Oracle, although less strict.
Yeah, though much, much less strict.

I think it would be helpful if we had a more precise idea about the
intended use-cases. So far, the only use-case that has been described in
detail is the one which made Kevin aware of the problem. But if I
understood Kevin correctly, that fact that they use BEFORE and not AFTER
triggers it more of an accident than the result of a conscious design
decision. Though he did also mention that there might actually be advantages
to using BEFORE instead of AFTER triggers, because that prevents other
triggers from seeing a non-consistent state.

What I can add is that AFAIR all instances of same-row UPDATES from BEFORE
triggers I ever encountered where replaceable by a simple assignment to NEW.
(That, however, is more of an anti-usecase)

best regards,
Florian Pflug

Re: WIP fix proposal for bug #6123

From

Florian Pflug

Date:

08 August 2011, 16:43:12

On Aug8, 2011, at 17:35 , Florian Pflug wrote:
> I think it would be helpful if we had a more precise idea about the
> intended use-cases. So far, the only use-case that has been described in
> detail is the one which made Kevin aware of the problem. But if I
> understood Kevin correctly, that fact that they use BEFORE and not AFTER
> triggers it more of an accident than the result of a conscious design
> decision. Though he did also mention that there might actually be advantages
> to using BEFORE instead of AFTER triggers, because that prevents other
> triggers from seeing a non-consistent state.

I pondered this some more, and came up with the following use-case of
same-row modification from a BEFORE trigger which are not easily moved
to an AFTER trigger.

Assume you have a partitioned table T with two partitions. Partition
Tcompleted contains records which are "completed" in some sense, while
Tcommencing contains records which are not. A record's state is indicated
by a boolean column "completed", and CHECK constraints guarantee that
"completed" is indeed false for all records in Tcommencing and true for all
records in Tcompleted. Now, if the partitioning is supposed to be transparent
to applications, a record must automatically be moved from Tcommencing to
Tcompleted if it's "completed" flag is changed from false to true. The
following BEFORE UPDATE trigger on Tcommencing accomplishes that.
 IF NEW.completed THEN   DELETE FROM Tcommencing WHERE id = OLD.id;   INSERT INTO Tcompleted VALUES (NEW.*); --
Probablya syntax error,                                          -- but intent is clear   RETURN NULL; ELSE   RETURN
NEW;END IF; 

Doing this from within an AFTER trigger doesn't work because the CHECK
constraint on Tcommencing would prevent the UPDATE from occurring.

This would also fail if we did as I suggested, and forbade any modifications
to a row for which a BEFORE trigger was in progress. I take this as evidence
that my suggestion was miss-guided and we should in fact only report an error
if the row was modified since the before trigger started *and* the trigger
returns non-NULL.

I still believe we shouldn't distinguish between UPDATES and DELETES, so
a DELETE should fail if the row is updated from within a BEFORE DELETE trigger
which returns non-NULL.

BTW, the case of multiple BEFORE triggers also raises some interesting questions.
Assume there are two BEFORE UPDATE triggers T1 and T2 defined on a table T, and T1
causes a same-row UPDATE and returns non-NULL. Also assume that T1 is coded in
a way that prevents infinite recursion in that case (i.e., the same-row UPDATE
is skipped if is T1 invocation already as a result of a same-row UPDATE). The
sequence of invocations after an update of a row would be.

1) T1 (for the original UPDATE)
2) T1 (for the same-row UPDATE initiated in (1))
3) T2 (for the same-row UPDATE initiated in (1))
4) T2 (for the original UPDATE)

Now, T2 in invoked in (4) for a row which isn't visible anymore. So I'm thinking
that maybe we should check after every trigger invocation whether or not a same-row
UPDATE occurred, and immediately raise an error if it did and the trigger didn't
return NULL, and not delay the check until all triggers have run. It seems unlikely
that delaying the check until all triggers have run adds any significant functionality.
For that to be the case, T2 would have to know whether T1 did a same-row update, and
return NULL, since otherwise we'd raise an error anyway. That beauty of doing the
check after every trigger lies in the fact that it makes the coding rule "Thou
shall return NULL if thou dared to modify thine own row" local (i.e. per-trigger)
instead of global (i.e. per trigger-chain), and thus less likely to be violated by
some strange interactions of triggers serving distinct purposes.

To summarize, here's what I currently believe would be a sensible approach:
 After every BEFORE trigger invocation, if the trigger returned non-NULL, check if latest row version is still the same
aswhen the trigger started. If not, complain. 

best regards,
Florian Pflug

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

09 August 2011, 10:23:23

Florian Pflug <fgp@phlo.org> wrote:
> I think it would be helpful if we had a more precise idea about
> the intended use-cases. So far, the only use-case that has been
> described in detail is the one which made Kevin aware of the
> problem. But if I understood Kevin correctly, that fact that they
> use BEFORE and not AFTER triggers it more of an accident than the
> result of a conscious design decision.
For the UPDATE case we have so far only identified one place where
this was a problem -- although the grep we used would only show the
simplest form of this (where the UPDATE can be replaced by setting
fields in the NEW record).  We've scheduled people to run through
the system test scripts again with my quick fix in place so that we
get an error rather than silent corruption (which might easily have
been missed the last time), so others could still turn up.  In any
event, I have neither seen nor imagined any use cases in our shop
where we need to allow the UPDATE behavior.
> Though he did also mention that there might actually be advantages
> to using BEFORE instead of AFTER triggers, because that prevents
> other triggers from seeing a non-consistent state.
Right, although I've only seen that for the DELETE triggers.  We are
using the BEFORE triggers to delete from the bottom up, and doing
this in the AFTER trigger would expose states to triggers where
children still exist in the absence of parents, which might fire
validation failures from hard-to-predict places.  It would certainly
be more convenient on this end if the DELETE behavior from my patch
was accepted, but I'm confident that there are workarounds for the
long term if not.
> What I can add is that AFAIR all instances of same-row UPDATES
> from BEFORE triggers I ever encountered where replaceable by a
> simple assignment to NEW. (That, however, is more of an
> anti-usecase)
If there is a valid use-case for UPDATE, it would have to be less
direct -- for example, the BEFORE UPDATE trigger modifies some other
table or row, and the trigger for *that* updates the original row. 
I really can't see any excuse for a BEFORE UPDATE ... FOR EACH ROW
trigger to execute a new UPDATE statement referencing the row for
which it is being fired.
-Kevin

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

09 August 2011, 10:41:39

Florian Pflug <fgp@phlo.org> wrote:
> To summarize, here's what I currently believe would be a sensible
> approach:
> 
>   After every BEFORE trigger invocation, if the trigger returned
>   non-NULL, check if latest row version is still the same as when
>   the trigger started. If not, complain.
That certainly has the advantages of being a small, safe change and
of being easy to explain.  It would certainly prevent the
astonishing sorts of behaviors which now occur and which can leave
people with database contents they thought they had guards against.
The down side is that something this strict does make it hard to
achieve certain behaviors which could be desirable for maintaining
redundant data for performance.  In my bottom-up delete scenario,
there would either need to be somewhere to note that a row was being
deleted so that the delete of the children could skip maintaining
it, or the cascade would need to be implemented in the AFTER
triggers, and validations would need to accept orphans which could
be created.  Either approach can be made to work, but from the
application development side, it's not as clean or easy.
The suggested approach for UPDATE with my original approach to
DELETE would make me happier, but I'm still not clear on Robert's
use cases and how that would affect him.  Can you clarify why you
feel UPDATE and DELETE should both do this?
-Kevin

Re: WIP fix proposal for bug #6123

From

Florian Pflug

Date:

09 August 2011, 11:34:13

On Aug9, 2011, at 15:41 , Kevin Grittner wrote:
> Florian Pflug <fgp@phlo.org> wrote:
> 
>> To summarize, here's what I currently believe would be a sensible
>> approach:
>> 
>>  After every BEFORE trigger invocation, if the trigger returned
>>  non-NULL, check if latest row version is still the same as when
>>  the trigger started. If not, complain.
> 
> That certainly has the advantages of being a small, safe change and
> of being easy to explain.  It would certainly prevent the
> astonishing sorts of behaviors which now occur and which can leave
> people with database contents they thought they had guards against.
> 
> The down side is that something this strict does make it hard to
> achieve certain behaviors which could be desirable for maintaining
> redundant data for performance.  In my bottom-up delete scenario,
> there would either need to be somewhere to note that a row was being
> deleted so that the delete of the children could skip maintaining
> it, or the cascade would need to be implemented in the AFTER
> triggers, and validations would need to accept orphans which could
> be created.  Either approach can be made to work, but from the
> application development side, it's not as clean or easy.

Another option would be to re-issue the DELETE from the BEFORE DELETE
trigger, and then return NULL. It'll cause the BEFORE DELETE trigger
to be invoked recursively, but presumably the second invocation could
easily detect that all required pre-delete actions have already taken
place and exit early (and return OLD). In pseudo-code, something like

BEFORE DELETE ON <parent table>: DELETE FROM <child table> WHERE parent_id = OLD.id; IF FOUND THEN   -- Removing
childrenmight have modified our row,   -- so returning non-NULL is not an option   DELETE FROM <table> WHERE id =
OLD.id;  RETURN NULL; ELSE   -- No children removed, so our row should be unmodified   RETURN OLD; END IF;

Or, more generally, you could check for modifications of the row
to be deleted. I'm not sure if the ctid column is currectly available
for OLD and NEW, but if it is, you could do

BEFORE DELETE ON <table>: <arbitrarily complex child removal logic> IF EXISTS(SELECT 1 FROM <table> WHERE ctid =
OLD.ctid)  -- Original row wasn't unmodified, actual delete should succeed   RETURN OLD; ELSE   -- Original was
modified,actual delete would fail   DELETE FROM <table> WHERE pk = OLD.pk;   RETURN NULL; END IF;

This only requires that the "arbitrarily complex child removal logic"
is smart enough not to update the parent table if no children were
actually removed.

> The suggested approach for UPDATE with my original approach to
> DELETE would make me happier, but I'm still not clear on Robert's
> use cases and how that would affect him.  Can you clarify why you
> feel UPDATE and DELETE should both do this?

I'm concerned that proceeding with a DELETE even though the row
has been modified has an equal chance of cause subtle data-integrity
violations as the current behaviour. We might remove a row, even though
the row, at the time of deletion, does *not* match the DELETE's WHERE
condition. That seems like a violation of very basic guarantees to me.

A real-world use-case where that might happen could consist of a table
with a "reference_count" column, together with a BEFORE DELETE trigger
which (indirectly via other triggers) sometimes causes a reference_count
to be changed from 0 to >0. Simply inserting a new row instead of re-using
the old one may not be an option because of UNIQUE constraints. The
statement
 DELETE FROM reference_counted_table WHERE reference_count = 0

might then delete rows even though they *are* referenced, if the
references where added from the BEFORE DELETE trigger. The result would
probably dangling references, i.e. a data-integrity violation.

One might attempt to solve that by re-checking the DELETE's WHERE
condition before carrying out the DELETE. To avoid ending up where
we started, i.e. with BEFORE triggers firing but no DELETE taking place,
we'd have to error out if the re-check fails instead of silently
ignoring the DELETE. At which point whether or not a DELETE of a single
record succeeds or fails depends on the WHERE condition used to *find*
that record. That, I feel, isn't a road we want to take...

best regards,
Florian Pflug

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

09 August 2011, 11:53:47

Florian Pflug <fgp@phlo.org> wrote:
> Another option would be to re-issue the DELETE from the BEFORE
> DELETE trigger, and then return NULL. It'll cause the BEFORE
> DELETE trigger to be invoked recursively, but presumably the
> second invocation could easily detect that all required pre-delete
> actions have already taken place and exit early (and return OLD).
> In pseudo-code, something like
> 
> BEFORE DELETE ON <parent table>:
>   DELETE FROM <child table> WHERE parent_id = OLD.id;
>   IF FOUND THEN
>     -- Removing children might have modified our row,
>     -- so returning non-NULL is not an option
>     DELETE FROM <table> WHERE id = OLD.id;
>     RETURN NULL;
>   ELSE
>     -- No children removed, so our row should be unmodified
>     RETURN OLD;
>   END IF;
Yeah, that would cover it all right.  That pretty much eliminates my
objections to your "check for error after firing each BEFORE
trigger" approach.
-Kevin

Re: WIP fix proposal for bug #6123

From

Bruce Momjian

Date:

07 August 2012, 20:46:55

Did we ever decide on this?  Is it a TODO?

---------------------------------------------------------------------------

On Fri, Jul 22, 2011 at 04:01:20PM -0500, Kevin Grittner wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
> > On Wed, Jul 20, 2011 at 2:58 PM, Kevin Grittner
> > <Kevin.Grittner@wicourts.gov> wrote:
> >> So basically, the goal of this patch is to ensure that a BEFORE
> >> DELETE trigger doesn't silently fail to delete a row because that
> >> row was updated during the BEFORE DELETE trigger firing (in our
> >> case by secondary trigger firing).
> > 
> > I've run across this problem before while writing application code
> > and have found it surprising, unfortunate, and difficult to work
> > around. It's not so bad when it only bites you once, but as things
> > get more complicated and you have more and more triggers floating
> > around, it gets pretty darn horrible.  One of the things I've done
> > to mitigate the impact of this is to write as many triggers as
> > possible as AFTER triggers
>  
> Yeah, this is not an issue in AFTER triggers, so moving any updating
> to those is a solution.  In most cases that's where you want
> triggered modifications to take place anyway.  The cascading delete
> situation is the most obvious exception, although there certainly
> could be others.
>  
> > but that's sometimes difficult to accomplish.
>  
> Yeah, sometimes.
>  
> > Your scenario is a BEFORE DELETE trigger that does an UPDATE on
> > the same row, but I think this problem also occurs if you have a
> > BEFORE UPDATE trigger that does an UPDATE on the same row.  I
> > believe the second update gets silently ignored.
>  
> My testing shows that the primary update gets ignored, while all the
> triggered effects of that update are persisted.  Yuck.  :-(  It
> certainly seems possible to turn that around, but that hardly seems
> better.  In asking application programmers here what they would
> *expect* to happen, they all seem to think that it is surprising
> that the BEFORE trigger functions *return a record*, rather than a
> boolean to say whether to proceed with the operation.  They feel it
> would be less confusing if a value set into NEW was effective if the
> operation does take effect, and the boolean controls whether or not
> that happens.  They rather expect that if an update happens from the
> same transaction while a before trigger is running, that the NEW
> record will reflect the change.
>  
> I recognize how hard it would be to create that expected behavior,
> and how unlikely it is that the community would accept such a change
> at this point.  But current behavior is to silently do something
> really dumb, so I think some change should be considered -- even if
> that change is to throw an error where we now allow nonsense.
>  
> INSERT is not a problem -- if a BEFORE INSERT trigger inserts a row
> with a conflicting primary key (or other unique index key), the
> operation will be rolled back.  That's fine.
>  
> I think DELETE can be cleanly fixed with a patch similar to what I
> posted earlier in the thread.  I found one more value that looks
> like it should be set, and it could use some comments, but I believe
> that we can get DELETE behavior which is every bit as sensible as
> INSERT behavior with a very small change.
>  
> The worms do come crawling out of the can on BEFORE UPDATE triggers,
> though.  When faced with an UPDATE which hasn't yet been applied,
> and other UPDATEs triggering from within the BEFORE UPDATE trigger
> which touch the same row, it doesn't seem like you can honor both
> the original UPDATE which was requested *and* the triggered UPDATEs.
> Of course, if you discard the original UPDATE, you can't very well
> do anything with the record returned from the BEFORE UPDATE trigger
> for that update.  Since it seems equally evil to allow the update
> while ignoring some of the work caused by its trigger functions as
> to show the work of the triggered updates while suppressing the
> original update, I think the right thing is to throw an error if the
> old row for a BEFORE UPDATE is updated by the same transaction and
> the trigger function ultimately returns a non-NULL value.
>  
> Thoughts?
>  
> -Kevin
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

08 August 2012, 11:27:03

Bruce Momjian <bruce@momjian.us> wrote:
> Did we ever decide on this?
We discussed it to the point of consensus, and Tom wrote a patch to
implement that.  Testing in my shop hit problems for which the cause
was not obvious.  I don't know whether there is a flaw in the
designed approach that we all missed, a simple programming bug of
some sort in the patch, or pilot error on this end.  It's on my list
of things to do, but it's hovering around 4th place on that list,
and new things seem to be appearing at the top of that list at about
the rate that I can clear them.  :-(
> Is it a TODO?
We don't want to lose track of it, but with a pending patch to
debug, I'm not sure the TODO list is the right place to put it.
-Kevin

Re: WIP fix proposal for bug #6123

From

Bruce Momjian

Date:

08 August 2012, 14:28:16

On Wed, Aug  8, 2012 at 09:26:41AM -0500, Kevin Grittner wrote:
> Bruce Momjian <bruce@momjian.us> wrote:
>  
> > Did we ever decide on this?
>  
> We discussed it to the point of consensus, and Tom wrote a patch to
> implement that.  Testing in my shop hit problems for which the cause
> was not obvious.  I don't know whether there is a flaw in the
> designed approach that we all missed, a simple programming bug of
> some sort in the patch, or pilot error on this end.  It's on my list
> of things to do, but it's hovering around 4th place on that list,
> and new things seem to be appearing at the top of that list at about
> the rate that I can clear them.  :-(
>  
> > Is it a TODO?
>  
> We don't want to lose track of it, but with a pending patch to
> debug, I'm not sure the TODO list is the right place to put it.

OK, I will let it linger on your TODO list then.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

11 September 2012, 20:34:52

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> We discussed it to the point of consensus, and Tom wrote a patch
> to implement that.  Testing in my shop hit problems for which the
> cause was not obvious.  I don't know whether there is a flaw in
> the designed approach that we all missed, a simple programming bug
> of some sort in the patch, or pilot error on this end.
It's looking like the last of those.  The problem was in BEFORE
DELETE triggers which, for example, would check that there were the
expected child records (throwing an error if they were missing)
right before deleting them.  When the "reissue the DELETE and then
RETURN NULL" trick was used to avoid errors with this patch, the
trigger would fail the second time through.  Of course, such
triggers were more than a bit silly and clearly The Wrong Thing To
Do in general.  I don't know how widespread such practice is, but it
will need to be fixed where it exists in order to use the proposed
patch and related workaround for "cascade delete"-like triggers.
Before someone says that foreign keys should just be used, I want to
point out the "-like" above.  In my experience, for about every ten
cases where a declarative constraint like UNIQUE or FOREIGN KEY can
cover a business rule, there is about one that is logically very
similar to such a declarative constraint but just different enough
that it must be implemented in custom code.  Jeff Davis has been
improving that ratio, but I doubt that the issue will ever disappear
entirely.
At any rate, I think we might want to apply Tom's patch for this
while 9.3 is still early in development, to see what else might
shake out from it while there is still plenty of time to fix any
issues.  It's now looking good from my perspective.
-Kevin

Re: WIP fix proposal for bug #6123

From

Tom Lane

Date:

13 September 2012, 18:45:52

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> At any rate, I think we might want to apply Tom's patch for this
> while 9.3 is still early in development, to see what else might
> shake out from it while there is still plenty of time to fix any
> issues.  It's now looking good from my perspective.

I just re-read the thread to refresh my memory.  It seems that we pretty
much had consensus on throwing an error if any operation fired by a
BEFORE UPDATE/DELETE trigger changes the target row, unless the trigger
returns NULL to skip the update/delete.  It is not clear to me however
whether we had consensus about what to do with SELECT FOR UPDATE locking
cases.  The last patch posted in the thread was here:

http://archives.postgresql.org/pgsql-hackers/2012-01/msg01241.php

That message includes an example of the FOR UPDATE problem case and
says that fixing it would require significantly more work.  Do we
want to worry about tackling that now, or shall we be satisfied with
fixing the trigger cases?

Also, it doesn't appear that we ever got around to preparing
documentation updates, but I think we definitely need some if we're
going to start throwing errors for things that used to be allowed.
Since Kevin has the most field experience with this problem,
I'd like to nominate him to write some docs ...
        regards, tom lane

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

13 September 2012, 20:54:45

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> At any rate, I think we might want to apply Tom's patch for this
>> while 9.3 is still early in development, to see what else might
>> shake out from it while there is still plenty of time to fix any
>> issues.  It's now looking good from my perspective.
> 
> I just re-read the thread to refresh my memory.  It seems that we
> pretty much had consensus on throwing an error if any operation
> fired by a BEFORE UPDATE/DELETE trigger changes the target row,
> unless the trigger returns NULL to skip the update/delete.  It is
> not clear to me however whether we had consensus about what to do
> with SELECT FOR UPDATE locking cases.  The last patch posted in
> the thread was here:
> 
> http://archives.postgresql.org/pgsql-hackers/2012-01/msg01241.php
> 
> That message includes an example of the FOR UPDATE problem case
> and says that fixing it would require significantly more work.  Do
> we want to worry about tackling that now, or shall we be satisfied
> with fixing the trigger cases?
As you said in the referenced message, the FOR UPDATE issue seems
orthogonal and should probably be addressed separately.
> Also, it doesn't appear that we ever got around to preparing
> documentation updates, but I think we definitely need some if
> we're going to start throwing errors for things that used to be
> allowed.  Since Kevin has the most field experience with this
> problem, I'd like to nominate him to write some docs ...
OK, will do.  The "redo the DELETE and RETURN NULL" workaround is
not at all obvious; we should definitely include an example of that.
-Kevin

Re: WIP fix proposal for bug #6123

From

Alvaro Herrera

Date:

19 October 2012, 18:33:47

Kevin Grittner escribió:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:

> > Also, it doesn't appear that we ever got around to preparing
> > documentation updates, but I think we definitely need some if
> > we're going to start throwing errors for things that used to be
> > allowed.  Since Kevin has the most field experience with this
> > problem, I'd like to nominate him to write some docs ...
>
> OK, will do.  The "redo the DELETE and RETURN NULL" workaround is
> not at all obvious; we should definitely include an example of that.

Any chance this patch could be pushed soon?

The problem is that this patch conflicts rather heavily with my FOR KEY
SHARE patch.  I think it makes sense to commit this one first.

To me, it would be good enough that the code changes go in now; the doc
patch can wait a little longer.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

25 October 2012, 15:20:55

Alvaro Herrera wrote:
> Kevin Grittner escribió:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
>>> Also, it doesn't appear that we ever got around to preparing
>>> documentation updates, but I think we definitely need some if
>>> we're going to start throwing errors for things that used to be
>>> allowed. Since Kevin has the most field experience with this
>>> problem, I'd like to nominate him to write some docs ...
>> 
>> OK, will do. The "redo the DELETE and RETURN NULL" workaround is
>> not at all obvious; we should definitely include an example of
>> that.
> 
> Any chance this patch could be pushed soon?
> 
> The problem is that this patch conflicts rather heavily with my FOR
> KEY SHARE patch. I think it makes sense to commit this one first.
> 
> To me, it would be good enough that the code changes go in now; the
> doc patch can wait a little longer.

Sorry I just got to this in wading through backlog. Will push today
without docs and work on docs soon.

-Kevin

Re: WIP fix proposal for bug #6123

From

"Kevin Grittner"

Date:

26 October 2012, 20:08:17

Alvaro Herrera wrote:
> Kevin Grittner escribió:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
>>> Also, it doesn't appear that we ever got around to preparing
>>> documentation updates, but I think we definitely need some if
>>> we're going to start throwing errors for things that used to be
>>> allowed. Since Kevin has the most field experience with this
>>> problem, I'd like to nominate him to write some docs ...
>> 
>> OK, will do. The "redo the DELETE and RETURN NULL" workaround is
>> not at all obvious; we should definitely include an example of
>> that.
> 
> Any chance this patch could be pushed soon?
> 
> The problem is that this patch conflicts rather heavily with my FOR
> KEY SHARE patch. I think it makes sense to commit this one first.
> 
> To me, it would be good enough that the code changes go in now; the
> doc patch can wait a little longer.

OK, pushed after reviewing the the archive threads, rebasing the
patch, and testing.

Doc patch to follow.

-Kevin