Thread: Optimizing DELETE

Optimizing DELETE

From
Ivan Voras
Date:
I've just fired off a "DELETE FROM table" command (i.e. unfiltered
DELETE) on a trivially small table but with many foreign key references
(on similar-sized tables), and I'm waiting for it to finish. It's been
10 minutes now, which seems very excessive for a table of 9000 rows on a
3 GHz desktop machine.

'top' says it's all spent in USER time, and there's a ~~500KB/s write
rate going on. Just before this DELETE, I've deleted data from a larger
table (50000 rows) using the same method and it finished in couple of
seconds - maybe it's a PostgreSQL bug?

My question is: assuming it's not a bug, how to optimize DELETEs?
Increasing work_mem maybe?

(I'm using PostgreSQL 8.1.4 on FreeBSD 6- amd64)

(I know about TRUNCATE; I need those foreign key references to cascade)

Re: Optimizing DELETE

From
Csaba Nagy
Date:
> I've just fired off a "DELETE FROM table" command (i.e. unfiltered
> DELETE) on a trivially small table but with many foreign key references
> (on similar-sized tables), and I'm waiting for it to finish. It's been
> 10 minutes now, which seems very excessive for a table of 9000 rows on a
> 3 GHz desktop machine.

If you have missing indexes on the child tables foreign keys, that might
be a cause of slow delete. The cascading delete must look up the to be
deleted rows in all child tables, which will do sequential scans if you
don't have proper indexes.

Try to do an explain analyze for deleting one row, that should also show
you the time spent in triggers, which might clue you in what's taking so
long.

Cheers,
Csaba.



Re: Optimizing DELETE

From
Rod Taylor
Date:
On Tue, 2006-09-19 at 15:22 +0200, Ivan Voras wrote:
> I've just fired off a "DELETE FROM table" command (i.e. unfiltered
> DELETE) on a trivially small table but with many foreign key references
> (on similar-sized tables), and I'm waiting for it to finish. It's been
> 10 minutes now, which seems very excessive for a table of 9000 rows on a
> 3 GHz desktop machine.

I would guess that a few of those referenced tables are missing indexes
on the referenced column.

> 'top' says it's all spent in USER time, and there's a ~~500KB/s write
> rate going on. Just before this DELETE, I've deleted data from a larger
> table (50000 rows) using the same method and it finished in couple of
> seconds - maybe it's a PostgreSQL bug?
>
> My question is: assuming it's not a bug, how to optimize DELETEs?
> Increasing work_mem maybe?
>
> (I'm using PostgreSQL 8.1.4 on FreeBSD 6- amd64)
>
> (I know about TRUNCATE; I need those foreign key references to cascade)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


Re: Optimizing DELETE

From
Mark Lewis
Date:
You do not have indexes on all of the columns which are linked by
foreign key constraints.

For example, let's say that I had a "scientist" table with a single
column "scientist_name" and another table "discovery" which had
"scientist_name" as a column with a foreign key constraint to the
"scientist" table.

If the system were to try to delete a row from the scientist table, then
it would need to scan the discovery table for any row which referenced
that scientist_name.

If there is an index on the scientist_name column in the discovery
table, this is a fast operation.  In your case however, there most
likely isn't an index on that column, so it needs to do a full table
scan of the discovery table for each row deleted from the scientist
table.

If the discovery table has 100,000 rows, and there are 100 scientists,
then deleting those 100 scientists would require scanning 100,000 * 100
= 10M records, so this sort of thing can quickly become a very expensive
operation.

Because of this potential for truly atrocious update/delete behavior,
some database systems (SQL Server at least, and IIRC Oracle as well)
either automatically create the index on discovery.scientist_name when
the foreign key constraint is created, or refuse to create the foreign
key constraint if there isn't already an index.

PG doesn't force you to have an index, which can be desirable for
performance reasons in some situations if you know what you're doing,
but allows you to royally shoot yourself in the foot on deletes/updates
to the parent table if you're not careful.

If you have a lot of constraints and want to track down which one is
unindexed, then doing an EXPLAIN ANALYZE of deleting a single row from
the parent table will tell you how long each of the referential
integrity checks takes, so you can figure out which indexes are missing.

-- Mark Lewis

On Tue, 2006-09-19 at 15:22 +0200, Ivan Voras wrote:
> I've just fired off a "DELETE FROM table" command (i.e. unfiltered
> DELETE) on a trivially small table but with many foreign key references
> (on similar-sized tables), and I'm waiting for it to finish. It's been
> 10 minutes now, which seems very excessive for a table of 9000 rows on a
> 3 GHz desktop machine.
>
> 'top' says it's all spent in USER time, and there's a ~~500KB/s write
> rate going on. Just before this DELETE, I've deleted data from a larger
> table (50000 rows) using the same method and it finished in couple of
> seconds - maybe it's a PostgreSQL bug?
>
> My question is: assuming it's not a bug, how to optimize DELETEs?
> Increasing work_mem maybe?
>
> (I'm using PostgreSQL 8.1.4 on FreeBSD 6- amd64)
>
> (I know about TRUNCATE; I need those foreign key references to cascade)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Re: Optimizing DELETE

From
Ivan Voras
Date:
Rod Taylor wrote:
> On Tue, 2006-09-19 at 15:22 +0200, Ivan Voras wrote:
>> I've just fired off a "DELETE FROM table" command (i.e. unfiltered
>> DELETE) on a trivially small table but with many foreign key references
>> (on similar-sized tables), and I'm waiting for it to finish. It's been
>> 10 minutes now, which seems very excessive for a table of 9000 rows on a
>> 3 GHz desktop machine.
>
> I would guess that a few of those referenced tables are missing indexes
> on the referenced column.

Yes, it was a pilot error :(

Among the small and properly indexed referencing tables there was a
seldom queried but huge log table.