Re: Extremely Slow Cascade Delete Operation - Mailing list pgsql-general

From Craig Ringer
Subject Re: Extremely Slow Cascade Delete Operation
Date
Msg-id 4B592EC6.6080709@postnewspapers.com.au
Whole thread Raw
In response to Re: Extremely Slow Cascade Delete Operation  (Yan Cheng Cheok <yccheok@yahoo.com>)
Responses Re: Extremely Slow Cascade Delete Operation
List pgsql-general
Yan Cheng Cheok wrote:
> I try to create a following simple scenario, to demonstrate cascade delete is rather slow in PostgreSQL.
>
> Can anyone help me to confirm? Is this my only machine problem, or every PostgreSQL users problem?
>
> I create 1 lot.
> every lot is having 10000 unit
> every unit is having 100 measurement.

101 measurements per unit by the looks. But it doesn't much matter.


test=> CREATE INDEX fk_unit_id_idx ON measurement (fk_unit_id);
CREATE INDEX
Time: 3072.635 ms


Now suddenly everything is much faster:

test=> delete from lot;
DELETE 1
Time: 8066.140 ms


Before that index creation, every deletion of a unit required a seqscan
of `measurement' to find referenced measurements. At 200ms apiece, it
would've taken about half an hour to `delete from lot' on my machine,
and smaller deletes took a proportional amount of time (ie 20s for 100
units). Now it takes 8 seconds to delete the lot.

You just forgot to create an index on one of the foreign key
relationships that you do a cascade delete on.

BTW, Pg doesn't force you to do this because sometimes you'd prefer to
wait. For example, you might do the deletes very rarely, and not way to
pay the cost of maintaining the index the rest of the time.

(What I was personally surprised by is that it's no faster to DELETE
FROM measurement; directly than to delete via LOT. I would've expected a
seqscan delete of the table to be MUCH faster than all the index-hopping
required to delete via lot. I guess the reason there's no real
difference is because the whole dataset fits in cache, so there's no
seek penalty. )

AFAIK, Pg isn't clever enough to batch foreign key deletes together and
then plan them as a single operation. That means it can't use something
other than a bunch of little index lookups where doing a sequential scan
or a hash join might be faster. Adding this facility would certainly be
an "interesting" project. Most of the time, though, you get on fine
using index-based delete cascading, and you can generally pre-delete
rows using a join on those rare occasions it is a problem.

--
Craig Ringer

pgsql-general by date:

Previous
From: Alex -
Date:
Subject: Slow Query / Check Point Segments
Next
From: Scott Marlowe
Date:
Subject: Re: Slow Query / Check Point Segments