Thread: Why is the number of dead tuples causing the performance of deferred triggers to degrading so rapidly (exponentionally)?

Hi,

I'm seeing the following behaviour with the table and functions given below:

db=# insert into f select * from full_sequence(1, 1000);
INSERT 0 1000
Time: 197,507 ms
db=# insert into f select * from full_sequence(1, 1000);
INSERT 0 1000
Time: 341,880 ms
db=# insert into f select * from full_sequence(1, 1000);
INSERT 0 1000
Time: 692,603 ms
db=# insert into f select * from full_sequence(1, 1000);
INSERT 0 1000
Time: 985,253 ms
db=# insert into f select * from full_sequence(1, 1000);
INSERT 0 1000
Time: 1241,334 ms

Or even worse (fresh drop/create of the table and functions):

db=# insert into f select id from full_sequence(1, 10000);
INSERT 0 10000
Time: 22255,767 ms
db=# insert into f select id from full_sequence(1, 10000);
INSERT 0 10000
Time: 45398,433 ms
db=# insert into f select id from full_sequence(1, 10000);
INSERT 0 10000
Time: 67993,476 ms

Wrapping the commands in a transaction only accumulates the penalty at commit.

It seems in this case the time needed for a single deferred trigger somehow
depends on the number of dead tuples in the table, because a vacuum of the
table will 'reset' the query-times. However, even if I wanted to, vacuum is
not allowed from within a function.

What is happening here? And more importantly, what can I do to prevent this?

NB. My real-world application 'collects' id's in need for deferred work, but
this work is both costly and only needed once per base record. So I use an
'update' table whose content I join with the actual tables in order to do the
work for _all_ the base records involved upon the first execution of the
deferred trigger. At the end of the trigger, this 'update' table is emptied
so any additional deferred triggers on the same table will hardly lose any
time. Or at least, that was the intention....

*********** demo script ***********
drop table f cascade;
drop function tr_f_def() cascade;
drop function full_sequence(integer, integer);
drop type full_sequence_type;

create table f (id int);
create function tr_f_def() RETURNS trigger LANGUAGE 'plpgsql' STABLE STRICT
SECURITY INVOKER AS '
        DECLARE
        BEGIN
        -- do stuff with all the ids in the table

        -- delete the contents
--        delete from f;
        IF EXISTS (SELECT 1 FROM f) THEN
            DELETE FROM F;
            VACUUM F;
        END IF;

                RETURN NULL;
        END;';
create type full_sequence_type as (id int);
create function full_sequence(integer, integer)
    RETURNS SETOF full_sequence_type
    LANGUAGE 'plpgsql'
    IMMUTABLE
    STRICT
    SECURITY INVOKER
    AS '    DECLARE
            my_from ALIAS FOR $1;
            my_to ALIAS FOR $2;
            result full_sequence_type%ROWTYPE;
        BEGIN
            -- just loop
            FOR i IN my_from..my_to LOOP
                result.id = i;
                RETURN NEXT result;
            END LOOP;

            -- finish
            RETURN;
        END;';
CREATE CONSTRAINT TRIGGER f_def AFTER INSERT ON f DEFERRABLE INITIALLY
DEFERRED FOR EACH ROW EXECUTE PROCEDURE tr_f_def();
*********** demo script ***********

db=# select version();
                               version
---------------------------------------------------------------------
 PostgreSQL 7.4.3 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66





--
Best,




Frank.


Frank,

> It seems in this case the time needed for a single deferred trigger somehow
> depends on the number of dead tuples in the table, because a vacuum of the
> table will 'reset' the query-times. However, even if I wanted to, vacuum is
> not allowed from within a function.
>
> What is happening here? And more importantly, what can I do to prevent
> this?

I'm not clear on all of the work you're doing in the trigger.  However, it
seems obvious that you're deleting and/or updating a large number of rows.
The escalating execution times would be consistent with that.

> NB. My real-world application 'collects' id's in need for deferred work,
> but this work is both costly and only needed once per base record. So I use
> an 'update' table whose content I join with the actual tables in order to
> do the work for _all_ the base records involved upon the first execution of
> the deferred trigger. At the end of the trigger, this 'update' table is
> emptied so any additional deferred triggers on the same table will hardly
> lose any time. Or at least, that was the intention....

I think you're doing a lot more than is wise to do in triggers.   Deferrable
triggers aren't really intended for running long procedures with the creation
of types and temporary tables (your post got a bit garbled, so pardon me if
I'm misreading it).   I'd suggest reconsidering your approach to this
application problem.

At the very least, increase max_fsm_relations to some high value, which may
help (or not).

-Josh

--
__Aglio Database Solutions_______________
Josh Berkus               Consultant
josh@agliodbs.com     www.agliodbs.com
Ph: 415-752-2500    Fax: 415-752-2387
2166 Hayes Suite 200    San Francisco, CA

Hi Josh,

> > It seems in this case the time needed for a single deferred trigger
> > somehow depends on the number of dead tuples in the table

After further investigation I think I have a better grasp of what's going on.

The thing biting me here is indeed the 'delete from' on a table with a number
of dead rows, possibly made worse in some cases where not everything can be
handled in memory.

> I'm not clear on all of the work you're doing in the trigger.
> > NB. My real-world application 'collects' id's in need for deferred work
> I think you're doing a lot more than is wise to do in triggers.

I probably wasn't clear enough on this. I'm not creating types and/or
temporary tables or anything of that kind.

The ratio is probably explained better by this example:

- the database has knowledge on 'parts' and 'sets', the sets have a few fields
whose content depend on the parts, but the proper value for these fields can
only be determined by looking at all the parts of the particular set together
(i.e. it's not a plain 'part-count' that one could update by a trigger on the
part)

- during a transaction, a number of things will happen to various parts of
various sets, so I have after triggers on the parts that will insert the ids
of the sets that need an update into a set_update holding table; in turn,
this set_update table has a deferred trigger

- upon execution of the deferred triggers, I now know that all the work on the
parts is finished, so the deferred trigger initiates an update for the sets
whose ids are in the update table and it will delete these ids afterwards

Now, because multiple updates to parts of the same set will result in multiple
inserts in the update table, I want to avoid doing the set-update more that
once.

Obviously, it would be better to be able to 'cancel' the rest of the calls to
the deferred trigger after it has been executed for the first time, but that
doesn't seem possible.

Even better would be to use a 'for each statement' trigger on the set_update
holding table instead, but it is not possible to create a deferred 'for each
statement' trigger..... ;(

So, I seem to be a bit between a rock and a hard place here, I must use
deferred triggers in order to avoid a costly set update on each part update,
but in such a deferred trigger I cannot avoid doing the update multiple
times....(due to the growing cost of a 'delete from' in the trigger)

Mmm, it seems that by hacking pg_trigger I am able to create a for each
statement trigger that is 'deferrable initially deferred'.

This probably solves my problem, I will ask on 'general' whether this has any
unforseen side effects and whether or not a 'regular' deferrable for each
statement trigger is incorporated in v8.0.

Thanks for you reply!



--
Best,




Frank.


Re: Why is the number of dead tuples causing the performance

From
DeJuan Jackson
Date:
Since you are updating all of the sets with the specified part number
why not just ensure that a transaction never inserts the same part
number more than once (an INSERT  ...SELECT ... WHERE NOT EXISTS(...)
comes to mind), then delete the row before the end of transaction.

Frank van Vugt wrote:

>Hi Josh,
>
>
>
>>>It seems in this case the time needed for a single deferred trigger
>>>somehow depends on the number of dead tuples in the table
>>>
>>>
>
>After further investigation I think I have a better grasp of what's going on.
>
>The thing biting me here is indeed the 'delete from' on a table with a number
>of dead rows, possibly made worse in some cases where not everything can be
>handled in memory.
>
>
>
>>I'm not clear on all of the work you're doing in the trigger.
>>
>>
>>>NB. My real-world application 'collects' id's in need for deferred work
>>>
>>>
>>I think you're doing a lot more than is wise to do in triggers.
>>
>>
>
>I probably wasn't clear enough on this. I'm not creating types and/or
>temporary tables or anything of that kind.
>
>The ratio is probably explained better by this example:
>
>- the database has knowledge on 'parts' and 'sets', the sets have a few fields
>whose content depend on the parts, but the proper value for these fields can
>only be determined by looking at all the parts of the particular set together
>(i.e. it's not a plain 'part-count' that one could update by a trigger on the
>part)
>
>- during a transaction, a number of things will happen to various parts of
>various sets, so I have after triggers on the parts that will insert the ids
>of the sets that need an update into a set_update holding table; in turn,
>this set_update table has a deferred trigger
>
>- upon execution of the deferred triggers, I now know that all the work on the
>parts is finished, so the deferred trigger initiates an update for the sets
>whose ids are in the update table and it will delete these ids afterwards
>
>Now, because multiple updates to parts of the same set will result in multiple
>inserts in the update table, I want to avoid doing the set-update more that
>once.
>
>Obviously, it would be better to be able to 'cancel' the rest of the calls to
>the deferred trigger after it has been executed for the first time, but that
>doesn't seem possible.
>
>Even better would be to use a 'for each statement' trigger on the set_update
>holding table instead, but it is not possible to create a deferred 'for each
>statement' trigger..... ;(
>
>So, I seem to be a bit between a rock and a hard place here, I must use
>deferred triggers in order to avoid a costly set update on each part update,
>but in such a deferred trigger I cannot avoid doing the update multiple
>times....(due to the growing cost of a 'delete from' in the trigger)
>
>Mmm, it seems that by hacking pg_trigger I am able to create a for each
>statement trigger that is 'deferrable initially deferred'.
>
>This probably solves my problem, I will ask on 'general' whether this has any
>unforseen side effects and whether or not a 'regular' deferrable for each
>statement trigger is incorporated in v8.0.
>
>Thanks for you reply!
>
>
>
>
>