on update / on delete performance of foreign keys - Mailing list pgsql-general

From Florian G. Pflug
Subject on update / on delete performance of foreign keys
Date
Msg-id 41F4E5BA.5020900@phlo.org
Whole thread Raw
Responses Re: on update / on delete performance of foreign keys  (Richard Huxton <dev@archonet.com>)
Re: on update / on delete performance of foreign keys  (Greg Stark <gsstark@mit.edu>)
Re: on update / on delete performance of foreign keys  (Stephan Szabo <sszabo@megazone.bigpanda.com>)
List pgsql-general
Hi

I ran into some performance problems regarding foreign keys lately.
My schema has about 20 tables, which each contain from 10 to 100.000
records. They have quite complicated interdependencies, modeled using
foregin keys set to "on update cascade, on delete cascade".
The schema stores data for multiple customers - Recently I wanted
to extract the data for just a single customer. I duplicated the schema,
and deleted all but one customer from the "customer" table. This worked
as expected, but the delete took a few hours (!) on a moderatly fast
machine (dual 1GHz PIII, RAID5-Array for postgres-data).

As far as I understand the code, foreign keys are implemented using
triggers, which in case of "on delete cascade" delete from the
"slave"-table when a record from the "master"-table is deleted. I guess
that the bad performance is due to running this trigger about 100.1000
times when deleting a lot of rows from a large table - and each time it
has to find referencing tuples by doing an index scan - so the
performance is about as bad as it would be if "delete * from table"
would use an index scan.

Since postgres already incoporates code to check foreign keys more
efficiently (when doing alter table ... add constraint .. foreign key,
postgres seems to use a merge or a hash join, instead of a nested loop),
I wondered how hard it would be to use this for the triggers too.

I imagined creating a statement-level trigger in parallel to the
row-level triggers, and defining some threshold (let's say, more than
10% of the rows deleted). If the threshold is reached, the row-level
trigger would just do nothing, and the statement-level trigger would
delete the referencing records doing a join.

Would this be feasable? And would it be something a newbie could tackle,
or is it more involved than I think?

greetings, Florian Pflug

Attachment

pgsql-general by date:

Previous
From: Terry Lee Tucker
Date:
Subject: Re: disable trigger from transaction
Next
From: Richard Huxton
Date:
Subject: Re: on update / on delete performance of foreign keys