Thread: Trouble with recursive trigger

Trouble with recursive trigger

From
Justin Hawkins
Date:
Hi all,

I am writing a bulletin board style system, which stores posts in a
hierachy.

Each post has a parent (or NULL for a top level post).

For efficiency, I'm storing a number of replies on each post, which
shows the total replies a post has, including all sub (and sub sub,
and sub sub sub etc) posts. I update this number via a trigger.

I'm having trouble with the DELETE. When deleting a row three things
need to happen:

o recursively DELETE all children posts to preserve data integrity
o decrement the number of replies of the parent post (if it exists)
o delete itself

However, only the ultimate child (the post with no children posts)
gets deleted, despite the debugging NOTICE's seeming to indicate that
the right thing is happening.

I've discovered the cause of the problem, but I'm not sure why it's a
problem, or how to solve it.

Here's the stripped back test example:

-- ---------------------------------------------------------------

CREATE TABLE post (
  id         SERIAL NOT NULL PRIMARY KEY,
  parent     INT REFERENCES post(id),

  replies    INT NOT NULL DEFAULT 0
);


CREATE OR REPLACE FUNCTION post_update_replies() RETURNS TRIGGER AS $function$
  BEGIN
    IF (TG_OP = 'DELETE') THEN
      -- delete any children of this post, this will recurse of course
      RAISE NOTICE 'deleting any that have a parent of % so I can delete %', OLD.id, OLD.id;
      DELETE FROM post WHERE parent = OLD.id;

      -- now update the parents replies, if they have any
      IF (OLD.parent IS NOT NULL) THEN
        RAISE NOTICE 'decrementing replies of parent % because of delete of %', OLD.parent, OLD.id;
        UPDATE post SET replies = replies - 1 WHERE id = OLD.parent;
      END IF;

      RAISE NOTICE 'actually deleting % now', OLD.id;
      RETURN OLD;
    END IF;
  END;
$function$ LANGUAGE plpgsql;


CREATE TRIGGER post_update_replies BEFORE DELETE ON post
    FOR EACH ROW EXECUTE PROCEDURE post_update_replies();

COPY post FROM stdin WITH CSV;
3000,,0
3001,3000,0
3002,3001,0
3003,3002,0
3004,3003,0
3005,3004,0
3006,3005,0

-- ---------------------------------------------------------------

(I've trimmed out the UPDATE and INSERT handler parts of the trigger,
 so ignore the fact there is code missing to increment 'replies').

If you do something like:

DELETE FROM post WHERE id = 3002;

You will see that only the post with id of 3006 gets deleted, despite
the NOTICE's indicating the right things are happening.

If you repeat the DELETE, you will delete 3005 (which now has no
children), repeat it again and 3004 disappears, and so on.

I've pinpointed it to the UPDATE. If I comment out the UPDATE in the
trigger function above, the recursive delete works correctly.

It's like the UPDATE stops that iteration of the trigger from working
correctly - it obviously isn't stopping it, as the debugging shows
execution continues, but it somehow makes the 'RETURN OLD;' statement
not have any effect.

Here's the debugging output when I do a DELETE:

justin=> delete from post where id = 3002;
NOTICE:  deleting any that have a parent of 3002 so I can delete 3002
NOTICE:  deleting any that have a parent of 3003 so I can delete 3003
NOTICE:  deleting any that have a parent of 3004 so I can delete 3004
NOTICE:  deleting any that have a parent of 3005 so I can delete 3005
NOTICE:  deleting any that have a parent of 3006 so I can delete 3006
NOTICE:  decrementing replies of parent 3005 because of delete of 3006
NOTICE:  actually deleting 3006 now
NOTICE:  decrementing replies of parent 3004 because of delete of 3005
NOTICE:  actually deleting 3005 now
NOTICE:  decrementing replies of parent 3003 because of delete of 3004
NOTICE:  actually deleting 3004 now
NOTICE:  decrementing replies of parent 3002 because of delete of 3003
NOTICE:  actually deleting 3003 now
NOTICE:  decrementing replies of parent 3001 because of delete of 3002
NOTICE:  actually deleting 3002 now
DELETE 0

Interestingly, I see the following other debugging information among
the above NOTICE's, I'm not sure if it's related to this problem. It
doesn't seem to be an error, but I don't know the source of it:

justin=> delete from post where id = 3002;
NOTICE:  deleting any that have a parent of 3002 so I can delete 3002
NOTICE:  deleting any that have a parent of 3003 so I can delete 3003
CONTEXT:  SQL statement "DELETE FROM post WHERE parent =  $1 "
PL/pgSQL function "post_update_replies" line 11 at SQL statement
NOTICE:  deleting any that have a parent of 3004 so I can delete 3004
CONTEXT:  SQL statement "DELETE FROM post WHERE parent =  $1 "
PL/pgSQL function "post_update_replies" line 11 at SQL statement
SQL statement "DELETE FROM post WHERE parent =  $1 "
PL/pgSQL function "post_update_replies" line 11 at SQL statement
NOTICE:  deleting any that have a parent of 3005 so I can delete 3005

[snip]

I'm guessing that the DELETE is doing something to the rows, even
before the trigger returns for that iteration causing subsequent
UPDATE's to break in some subtle way.

Any ideas (or workarounds) most appreciated :-)

Pg version is 8.1.0, tried also on 8.0.3 with no difference.

    - Justin

--
Justin Hawkins | justin@hawkins.id.au
               | http://hawkins.id.au

Re: Trouble with recursive trigger

From
Martijn van Oosterhout
Date:
On Wed, Nov 16, 2005 at 11:45:45AM +1030, Justin Hawkins wrote:
> Hi all,
>
> I am writing a bulletin board style system, which stores posts in a
> hierachy.

<snip>

> However, only the ultimate child (the post with no children posts)
> gets deleted, despite the debugging NOTICE's seeming to indicate that
> the right thing is happening.

Just a thought, maybe it has something to do with the UPDATE updating a
row where the trigger is running. So, think of the execution like
this:

# DELETE FROM post WHERE id = 3002;
trigger> DELETE FROM post WHERE parent = 3002;
*recurses*
trigger#2> DELETE FROM post WHERE parent = 3003;
*recurses*

...
trigger#5> DELETE FROM post where parent = 3005;
*recurses*
trigger#6> DELETE FROM post where parent = 3006;    -- Does nothing
trigger#6> UPDATE post SET replies = replies - 1 WHERE id = 3005;

See this last line, it's updating the row while the delete trigger is
running. I don't know the semantics but what's probably happening is
that the original row the trigger ran on *was* deleted, but the UPDATE
created a new one which hasn't been deleted.

No ideas how to fix it though. Search the docs for a reference... Also,
what if it's an AFTER DELETE trigger?

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Trouble with recursive trigger

From
Justin Hawkins
Date:
On Wed, Nov 16, 2005 at 07:43:16AM +0100, Martijn van Oosterhout wrote:

> Just a thought, maybe it has something to do with the UPDATE updating a
> row where the trigger is running. So, think of the execution like
> this:
>
> # DELETE FROM post WHERE id = 3002;
> trigger> DELETE FROM post WHERE parent = 3002;
> *recurses*
> trigger#2> DELETE FROM post WHERE parent = 3003;
> *recurses*
>
> ...
> trigger#5> DELETE FROM post where parent = 3005;
> *recurses*
> trigger#6> DELETE FROM post where parent = 3006;    -- Does nothing
> trigger#6> UPDATE post SET replies = replies - 1 WHERE id = 3005;
>
> See this last line, it's updating the row while the delete trigger is
> running. I don't know the semantics but what's probably happening is
> that the original row the trigger ran on *was* deleted, but the UPDATE
> created a new one which hasn't been deleted.

Yep I suspect it's something like this. I don't see why, as to me if
the trigger hasn't completed yet then the row should still be
there. And if that's not the case (the row is in some sort of
half-deleted limbo state) then I'd expect some sort of sensible error,
not a quiet failure of the subsequent completion of the trigger to
actually delete the row.

> No ideas how to fix it though. Search the docs for a reference... Also,
> what if it's an AFTER DELETE trigger?

The referential integrity means that if I delete a row in 'the middle'
I need to delete the children myself first. If I let a cascade deal
with that then I don't get the opportunity to update rows further up
the tree to reflect the fact there are now less replies.

I can't see any particular flaw in my method so I'd really like to get
to the heart of why this doesn't work.

    - Justin

--
Justin Hawkins | justin@hawkins.id.au
               | http://hawkins.id.au

Re: Trouble with recursive trigger

From
Tom Lane
Date:
Justin Hawkins <justin@hawkins.id.au> writes:
> I'm having trouble with the DELETE. When deleting a row three things
> need to happen:

> o recursively DELETE all children posts to preserve data integrity
> o decrement the number of replies of the parent post (if it exists)
> o delete itself

This has a couple of problems:

1. You can't delete a post's children before deleting the post itself,
because of the REFERENCES constraint.  I'm not entirely sure why your
original formulation of the trigger didn't hit that failure, but I sure
hit it while experimenting with alternatives.

2. The reason the UPDATE causes a problem is that it creates row
versions that are newer than the versions the outer DELETE can see.
(Any database changes caused by a function invoked by a query are by
definition later than that query.)  This means that if the outer
DELETE hasn't yet zapped a row that the UPDATE touches, it will fail to
delete that row when it does come to it.

The easiest way to fix #2 is to do the UPDATEs in an AFTER trigger
instead of a BEFORE trigger, and the easiest way to fix #1 is to let the
system do it for you, by using ON DELETE CASCADE instead of a
handwritten trigger.  I got reasonable behavior with this:

---------

CREATE TABLE post (
  id         SERIAL NOT NULL PRIMARY KEY,
  parent     INT REFERENCES post(id) ON DELETE CASCADE,
  replies    INT NOT NULL DEFAULT 0
);

CREATE OR REPLACE FUNCTION post_update_replies() RETURNS TRIGGER AS $function$
  DECLARE iv integer;
  BEGIN
    IF (TG_OP = 'DELETE') THEN
      -- now update the parents replies, if they have any
      IF (OLD.parent IS NOT NULL) THEN
        RAISE NOTICE 'decrementing replies of parent % because of delete of %', OLD.parent, OLD.id;
        UPDATE post SET replies = replies - 1 WHERE id = OLD.parent;
    GET DIAGNOSTICS iv = ROW_COUNT;
    RAISE NOTICE 'decremented % parent rows of %', iv, OLD.id;
      END IF;
      RETURN OLD;
    END IF;
  END;
$function$ LANGUAGE plpgsql;

CREATE TRIGGER post_update_replies AFTER DELETE ON post
    FOR EACH ROW EXECUTE PROCEDURE post_update_replies();

COPY post FROM stdin WITH CSV;
3000,,0
3001,3000,0
3002,3001,0
3003,3002,0
3004,3003,0
3005,3004,0
3006,3005,0
\.

---------

to wit:

regression=# DELETE FROM post WHERE id = 3002;
NOTICE:  decrementing replies of parent 3005 because of delete of 3006
CONTEXT:  SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
NOTICE:  decremented 0 parent rows of 3006
CONTEXT:  SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
NOTICE:  decrementing replies of parent 3004 because of delete of 3005
CONTEXT:  SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
NOTICE:  decremented 0 parent rows of 3005
CONTEXT:  SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
NOTICE:  decrementing replies of parent 3003 because of delete of 3004
CONTEXT:  SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
NOTICE:  decremented 0 parent rows of 3004
CONTEXT:  SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
NOTICE:  decrementing replies of parent 3002 because of delete of 3003
CONTEXT:  SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
NOTICE:  decremented 0 parent rows of 3003
CONTEXT:  SQL statement "DELETE FROM ONLY "public"."post" WHERE "parent" = $1"
NOTICE:  decrementing replies of parent 3001 because of delete of 3002
NOTICE:  decremented 1 parent rows of 3002
DELETE 1
regression=# select * from post;
  id  | parent | replies
------+--------+---------
 3000 |        |       0
 3001 |   3000 |      -1
(2 rows)

regression=#

Notice that most of the UPDATEs report not doing anything, because the
parent row they would need to hit is already gone by the time the AFTER
trigger runs.

            regards, tom lane

Re: Trouble with recursive trigger

From
Justin Hawkins
Date:
On Thu, Nov 17, 2005 at 11:45:07PM -0500, Tom Lane wrote:

> This has a couple of problems:
>
> 1. You can't delete a post's children before deleting the post itself,
> because of the REFERENCES constraint.  I'm not entirely sure why your
> original formulation of the trigger didn't hit that failure, but I sure
> hit it while experimenting with alternatives.

I tried with and without the constraint, as I realised I was basically
implementing the cascade myself, but it didn't make any difference to
the failure mode.

> 2. The reason the UPDATE causes a problem is that it creates row
> versions that are newer than the versions the outer DELETE can see.
> (Any database changes caused by a function invoked by a query are by
> definition later than that query.)  This means that if the outer
> DELETE hasn't yet zapped a row that the UPDATE touches, it will fail to
> delete that row when it does come to it.

OK, I understand the issue now :-)

> The easiest way to fix #2 is to do the UPDATEs in an AFTER trigger
> instead of a BEFORE trigger, and the easiest way to fix #1 is to let the
> system do it for you, by using ON DELETE CASCADE instead of a
> handwritten trigger.  I got reasonable behavior with this:

[snip]

This is close, but not quite suitable.

The issue is that I need to UPDATE the parent, reducing the 'replies' column by
the number of children rows I am deleting, including the sub-children.

This is why I was implementing the cascade myself in my first attempt
- it let me call an UPDATE once per DELETE, making sure we reduce the
number of replies in the parents by the correct amount.

As I've just tried without triggers (just the cascade constraint), a
'DELETE FROM post WHERE id = x' will only admit to deleting one
row. The same problem can be seen in your example:

> regression=# DELETE FROM post WHERE id = 3002;

[snip]

> NOTICE:  decremented 1 parent rows of 3002
> DELETE 1
> regression=# select * from post;
>   id  | parent | replies
> ------+--------+---------
>  3000 |        |       0
>  3001 |   3000 |      -1
> (2 rows)

The 'replies' should have been reduced not by one, but by how ever
many children below it that were deleted.

I've been thinking about this a little while since your message, but
I'm not sure yet I can think of a way around it. If the AFTER delete
trigger could discover how many rows were deleted and perform the
single UPDATE, reducing 'replies' by that much, the rest would work
by virtue of a seperate UPDATE trigger.

This seems possible, but I will have to cogitate on it for a while :-)

On the face of it it seems I will have to implement the cascading DELETE
myself, and keep track of how many DELETE's I am doing.

    - Justin

--
Justin Hawkins | justin@hawkins.id.au
               | http://hawkins.id.au