Thread: Extremely Slow Cascade Delete Operation

Extremely Slow Cascade Delete Operation

From
Yan Cheng Cheok
Date:
I have 3 tables - lot, unit and measurement

1 lot is having relationship to many unit.
1 unit is having relationship to many measurement.
delete cascade is being used among their relationship

I try to perform delete operation on single row of lot.
=======================================================
SemiconductorInspection=# select count(*) from lot;
 count
-------
     2
(1 row)


SemiconductorInspection=# select count(*) from unit;
  count
---------
 1151927
(1 row)


SemiconductorInspection=# select count(*) from measurement;
  count
---------
 9215416
(1 row)


SemiconductorInspection=# VACUUM ANALYZE;
VACUUM

SemiconductorInspection=# delete from lot where lot_id = 2;

Opps, this is a coffee operation. That's mean I can go out to have few cups of coffee and the operation still on going.

Even I use :
SemiconductorInspection=# EXPLAIN ANALYZE delete from lot where lot_id = 2;

It still hang there :(

Any suggestion? Anything I am doing wrong? Or this is the expected performance?

The table SQL is as follow :

   IF NOT EXISTS(SELECT * FROM information_schema.tables WHERE table_name = 'lot') THEN
        CREATE TABLE lot
        (
          lot_id bigserial NOT NULL,
          CONSTRAINT pk_lot_id PRIMARY KEY (lot_id)
        );
    END IF;
    IF NOT EXISTS(SELECT * FROM information_schema.tables WHERE table_name = 'unit') THEN
        CREATE TABLE unit
        (
          unit_id bigserial NOT NULL,
          fk_lot_id bigint NOT NULL,
          CONSTRAINT pk_unit_id PRIMARY KEY (unit_id),
          CONSTRAINT fk_lot_id FOREIGN KEY (fk_lot_id)
              REFERENCES lot (lot_id) MATCH SIMPLE
              ON UPDATE NO ACTION ON DELETE CASCADE
        );
    END IF;
    IF NOT EXISTS(SELECT * FROM information_schema.tables WHERE table_name = 'measurement') THEN
        CREATE TABLE measurement
        (
          measurement_id bigserial NOT NULL,
          fk_unit_id bigint NOT NULL,
          CONSTRAINT pk_measurement_id PRIMARY KEY (measurement_id),
          CONSTRAINT fk_unit_id FOREIGN KEY (fk_unit_id)
              REFERENCES unit (unit_id) MATCH SIMPLE
              ON UPDATE NO ACTION ON DELETE CASCADE
        );
    END IF;

Thanks and Regards
Yan Cheng CHEOK





Re: Extremely Slow Cascade Delete Operation

From
Grzegorz Jaśkiewicz
Date:
try checking if it is waiting perhaps for something (is locked).

Peek at: (using different connection)
SELECT * FROM pg_stat_activity;
SELECT * FROM pg_locks;

Did you used prepared transactions ?
Try:
SELECT * FROM pg_prepared_xacts ;

Maybe some other transaction is blocking it.


HTH

Re: Extremely Slow Cascade Delete Operation

From
Yan Cheng Cheok
Date:
It looks like this :

http://sites.google.com/site/yanchengcheok/Home/log.txt

I put it in google site, for easy reading)

Any hint? Thanks!

Thanks and Regards
Yan Cheng CHEOK


--- On Wed, 1/13/10, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:

> From: Grzegorz Jaśkiewicz <gryzman@gmail.com>
> Subject: Re: [GENERAL] Extremely Slow Cascade Delete Operation
> To: "Yan Cheng Cheok" <yccheok@yahoo.com>
> Cc: pgsql-general@postgresql.org
> Date: Wednesday, January 13, 2010, 4:13 PM
> try checking if it is waiting perhaps
> for something (is locked).
>
> Peek at: (using different connection)
> SELECT * FROM pg_stat_activity;
> SELECT * FROM pg_locks;
>
> Did you used prepared transactions ?
> Try:
> SELECT * FROM pg_prepared_xacts ;
>
> Maybe some other transaction is blocking it.
>
>
> HTH
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>





Re: Extremely Slow Cascade Delete Operation

From
Grzegorz Jaśkiewicz
Date:
It doesn't look like it is locked, so it is carrying the delete out.
However that doesn't mean, that there isn't any other locking
occurring, or simply your disks are rather busy.

Also, maybe the DB is rather big, what are the table sizes ?
If you are using 8.4+, than do \dt+ to get an idea, otherwise SELECT
pg_size_pretty(pg_total_relation_size('table_name')); for each table.

Re: Extremely Slow Cascade Delete Operation

From
Yan Cheng Cheok
Date:
SemiconductorInspection=# \dt+
                            List of relations
 Schema |       Name       | Type  |  Owner   |    Size    | Description
--------+------------------+-------+----------+------------+-------------
 public | lot              | table | postgres | 8192 bytes |
 public | measurement      | table | postgres | 529 MB     |
 public | measurement_type | table | postgres | 8192 bytes |
 public | measurement_unit | table | postgres | 8192 bytes |
 public | unit             | table | postgres | 57 MB      |
(5 rows)

I can see the PostgreSQL process is occupy CPU. But how come it takes so long? There are only 1000++ row of unit, where
theirlot_id is 2. 

Seems not reasonable to me. :(

Thanks and Regards
Yan Cheng CHEOK


--- On Wed, 1/13/10, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:

> From: Grzegorz Jaśkiewicz <gryzman@gmail.com>
> Subject: Re: [GENERAL] Extremely Slow Cascade Delete Operation
> To: "Yan Cheng Cheok" <yccheok@yahoo.com>
> Cc: pgsql-general@postgresql.org
> Date: Wednesday, January 13, 2010, 4:35 PM
> It doesn't look like it is locked, so
> it is carrying the delete out.
> However that doesn't mean, that there isn't any other
> locking
> occurring, or simply your disks are rather busy.
>
> Also, maybe the DB is rather big, what are the table sizes
> ?
> If you are using 8.4+, than do \dt+ to get an idea,
> otherwise SELECT
> pg_size_pretty(pg_total_relation_size('table_name')); for
> each table.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>





Re: Extremely Slow Cascade Delete Operation

From
Craig Ringer
Date:
On 13/01/2010 4:09 PM, Yan Cheng Cheok wrote:
> I have 3 tables - lot, unit and measurement
>
> 1 lot is having relationship to many unit.
> 1 unit is having relationship to many measurement.
> delete cascade is being used among their relationship
>

> SemiconductorInspection=# delete from lot where lot_id = 2;

Are there indexes on fk_lot_id and fk_unit_id ? If not, a DELETE from
lot will cause a seqscan of unit for affected units, and if any must be
deleted each will cause a seqscan of measurement for affected
measurements. That's going to get ugly fast.

--
Craig Ringer

Re: Extremely Slow Cascade Delete Operation

From
Yan Cheng Cheok
Date:
OMG, I never know what is index (Sorry for my newbies) I will study about them and update you all about their
performance.

Thanks and Regards
Yan Cheng CHEOK


--- On Wed, 1/13/10, Craig Ringer <craig@postnewspapers.com.au> wrote:

> From: Craig Ringer <craig@postnewspapers.com.au>
> Subject: Re: [GENERAL] Extremely Slow Cascade Delete Operation
> To: "Yan Cheng Cheok" <yccheok@yahoo.com>
> Cc: pgsql-general@postgresql.org
> Date: Wednesday, January 13, 2010, 7:01 PM
> On 13/01/2010 4:09 PM, Yan Cheng
> Cheok wrote:
> > I have 3 tables - lot, unit and measurement
> >
> > 1 lot is having relationship to many unit.
> > 1 unit is having relationship to many measurement.
> > delete cascade is being used among their relationship
> >
>
> > SemiconductorInspection=# delete from lot where lot_id
> = 2;
>
> Are there indexes on fk_lot_id and fk_unit_id ? If not, a
> DELETE from lot will cause a seqscan of unit for affected
> units, and if any must be deleted each will cause a seqscan
> of measurement for affected measurements. That's going to
> get ugly fast.
>
> --
> Craig Ringer
>
> -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>





Re: Extremely Slow Cascade Delete Operation

From
Yan Cheng Cheok
Date:
I try to add index to tables.

(please refer to http://sites.google.com/site/yanchengcheok/Home/question.txt)

Database is designed in the following graphical view

(please refer to http://sites.google.com/site/yanchengcheok/Home/question.png)

Here is the setting of my database. All using default except syncrhonous_commit = off.

(please refer to http://sites.google.com/site/yanchengcheok/Home/database.PNG)

My database size is as follow :

SemiconductorInspection=# \timing on
Timing is on.
SemiconductorInspection=# SELECT count(*) FROM lot;
 count
-------
     2
(1 row)


Time: 1.003 ms
SemiconductorInspection=# SELECT count(*) FROM unit;
 count
--------
 206363
(1 row)


Time: 92.766 ms
SemiconductorInspection=# SELECT count(*) FROM measurement;
  count
---------
 1650904
(1 row)


Time: 355.161 ms
SemiconductorInspection=#

I simply run a delete operation :

delete from lot where lot_id = 3;

It takes TWO hours and never able to return!

I even run VACUUM, with options FULL + FREEZE + ANALYZE

but it does not help at all.

Is there other optimization steps I had missed out?


Thanks and Regards
Yan Cheng CHEOK


--- On Wed, 1/13/10, Craig Ringer <craig@postnewspapers.com.au> wrote:

> From: Craig Ringer <craig@postnewspapers.com.au>
> Subject: Re: [GENERAL] Extremely Slow Cascade Delete Operation
> To: "Yan Cheng Cheok" <yccheok@yahoo.com>
> Cc: pgsql-general@postgresql.org
> Date: Wednesday, January 13, 2010, 7:01 PM
> On 13/01/2010 4:09 PM, Yan Cheng
> Cheok wrote:
> > I have 3 tables - lot, unit and measurement
> >
> > 1 lot is having relationship to many unit.
> > 1 unit is having relationship to many measurement.
> > delete cascade is being used among their relationship
> >
>
> > SemiconductorInspection=# delete from lot where lot_id
> = 2;
>
> Are there indexes on fk_lot_id and fk_unit_id ? If not, a
> DELETE from lot will cause a seqscan of unit for affected
> units, and if any must be deleted each will cause a seqscan
> of measurement for affected measurements. That's going to
> get ugly fast.
>
> --
> Craig Ringer
>
> -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>





Re: Extremely Slow Cascade Delete Operation

From
Craig Ringer
Date:
Yan Cheng Cheok wrote:

> I simply run a delete operation :
>
> delete from lot where lot_id = 3;
>
> It takes TWO hours and never able to return!

What does:

EXPLAIN DELETE FROM lot WHERE lot_id = 3;

report?

By the way, you've created a LOT of indexes. Indexes speed up lookups,
but can slow down insert/update/delete. They also use disk space. So
avoid creating indexes for things unless you know the index will
actually be used and be useful. Using EXPLAIN and EXPLAIN ANALYZE are
helpful for discovering this.

In general, I find creating indexes on foreign key columns to be a good
idea unless you never expect to DELETE from the parent table (say, if
you only TRUNCATE, if you rely on partitioning, or if the parent table
is append-only).

--
Craig Ringer

Re: Extremely Slow Cascade Delete Operation

From
Yan Cheng Cheok
Date:
I try to create a following simple scenario, to demonstrate cascade delete is rather slow in PostgreSQL.

Can anyone help me to confirm? Is this my only machine problem, or every PostgreSQL users problem?

I create 1 lot.
every lot is having 10000 unit
every unit is having 100 measurement.

hence :

lot - 1 row entry
unit - 10000 row entries
measurement - 1000000 row entries

run command :

delete from lot where lot_id = 1;

Opps. Never ending....

To reproduce :

(1) Download SQL script from http://sites.google.com/site/yanchengcheok/Home/test.sql

(2) Create a empty database named Sandbox. Follow all default parameters in pgAdmin.

(3) Execute SQL statement in test.sql. It will create tables, inserting data. It may take up to few minutes.

(4) run
delete from lot where lot_id = 1;

dang! dang! dang! a never ending story.

Do you guys get an extremely slow experience as me? Does other database (like MySQL) experience same issues too?

Thanks and Regards
Yan Cheng CHEOK


--- On Wed, 1/13/10, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:

> From: Grzegorz Jaśkiewicz <gryzman@gmail.com>
> Subject: Re: [GENERAL] Extremely Slow Cascade Delete Operation
> To: "Yan Cheng Cheok" <yccheok@yahoo.com>
> Cc: pgsql-general@postgresql.org
> Date: Wednesday, January 13, 2010, 4:35 PM
> It doesn't look like it is locked, so
> it is carrying the delete out.
> However that doesn't mean, that there isn't any other
> locking
> occurring, or simply your disks are rather busy.
>
> Also, maybe the DB is rather big, what are the table sizes
> ?
> If you are using 8.4+, than do \dt+ to get an idea,
> otherwise SELECT
> pg_size_pretty(pg_total_relation_size('table_name')); for
> each table.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>





Re: Extremely Slow Cascade Delete Operation

From
Craig Ringer
Date:
Yan Cheng Cheok wrote:
> I try to create a following simple scenario, to demonstrate cascade delete is rather slow in PostgreSQL.
>
> Can anyone help me to confirm? Is this my only machine problem, or every PostgreSQL users problem?
>
> I create 1 lot.
> every lot is having 10000 unit
> every unit is having 100 measurement.

101 measurements per unit by the looks. But it doesn't much matter.


test=> CREATE INDEX fk_unit_id_idx ON measurement (fk_unit_id);
CREATE INDEX
Time: 3072.635 ms


Now suddenly everything is much faster:

test=> delete from lot;
DELETE 1
Time: 8066.140 ms


Before that index creation, every deletion of a unit required a seqscan
of `measurement' to find referenced measurements. At 200ms apiece, it
would've taken about half an hour to `delete from lot' on my machine,
and smaller deletes took a proportional amount of time (ie 20s for 100
units). Now it takes 8 seconds to delete the lot.

You just forgot to create an index on one of the foreign key
relationships that you do a cascade delete on.

BTW, Pg doesn't force you to do this because sometimes you'd prefer to
wait. For example, you might do the deletes very rarely, and not way to
pay the cost of maintaining the index the rest of the time.

(What I was personally surprised by is that it's no faster to DELETE
FROM measurement; directly than to delete via LOT. I would've expected a
seqscan delete of the table to be MUCH faster than all the index-hopping
required to delete via lot. I guess the reason there's no real
difference is because the whole dataset fits in cache, so there's no
seek penalty. )

AFAIK, Pg isn't clever enough to batch foreign key deletes together and
then plan them as a single operation. That means it can't use something
other than a bunch of little index lookups where doing a sequential scan
or a hash join might be faster. Adding this facility would certainly be
an "interesting" project. Most of the time, though, you get on fine
using index-based delete cascading, and you can generally pre-delete
rows using a join on those rare occasions it is a problem.

--
Craig Ringer

Re: Extremely Slow Cascade Delete Operation

From
Yan Cheng Cheok
Date:
Hi Craig Ringer,

Really appreciate a lot for your advice! This at least has cleared my doubt, which had been confused me for quite some
time.

Thanks and Regards
Yan Cheng CHEOK


--- On Fri, 1/22/10, Craig Ringer <craig@postnewspapers.com.au> wrote:

> From: Craig Ringer <craig@postnewspapers.com.au>
> Subject: Re: [GENERAL] Extremely Slow Cascade Delete Operation
> To: "Yan Cheng Cheok" <yccheok@yahoo.com>
> Cc: "Grzegorz Jaśkiewicz" <gryzman@gmail.com>, pgsql-general@postgresql.org
> Date: Friday, January 22, 2010, 12:51 PM
> Yan Cheng Cheok wrote:
> > I try to create a following simple scenario, to
> demonstrate cascade delete is rather slow in PostgreSQL.
> >
> > Can anyone help me to confirm? Is this my only machine
> problem, or every PostgreSQL users problem?
> >
> > I create 1 lot.
> > every lot is having 10000 unit
> > every unit is having 100 measurement.
>
> 101 measurements per unit by the looks. But it doesn't much
> matter.
>
>
> test=> CREATE INDEX fk_unit_id_idx ON measurement
> (fk_unit_id);
> CREATE INDEX
> Time: 3072.635 ms
>
>
> Now suddenly everything is much faster:
>
> test=> delete from lot;
> DELETE 1
> Time: 8066.140 ms
>
>
> Before that index creation, every deletion of a unit
> required a seqscan
> of `measurement' to find referenced measurements. At 200ms
> apiece, it
> would've taken about half an hour to `delete from lot' on
> my machine,
> and smaller deletes took a proportional amount of time (ie
> 20s for 100
> units). Now it takes 8 seconds to delete the lot.
>
> You just forgot to create an index on one of the foreign
> key
> relationships that you do a cascade delete on.
>
> BTW, Pg doesn't force you to do this because sometimes
> you'd prefer to
> wait. For example, you might do the deletes very rarely,
> and not way to
> pay the cost of maintaining the index the rest of the
> time.
>
> (What I was personally surprised by is that it's no faster
> to DELETE
> FROM measurement; directly than to delete via LOT. I
> would've expected a
> seqscan delete of the table to be MUCH faster than all the
> index-hopping
> required to delete via lot. I guess the reason there's no
> real
> difference is because the whole dataset fits in cache, so
> there's no
> seek penalty. )
>
> AFAIK, Pg isn't clever enough to batch foreign key deletes
> together and
> then plan them as a single operation. That means it can't
> use something
> other than a bunch of little index lookups where doing a
> sequential scan
> or a hash join might be faster. Adding this facility would
> certainly be
> an "interesting" project. Most of the time, though, you get
> on fine
> using index-based delete cascading, and you can generally
> pre-delete
> rows using a join on those rare occasions it is a problem.
>
> --
> Craig Ringer
>