Home > mailing lists

Re: Optimizing DELETE - Mailing list pgsql-performance

From	Mark Lewis
Subject	Re: Optimizing DELETE
Date	September 19, 2006 11:22:09
Msg-id	1158675445.9657.1587.camel@archimedes Whole thread Raw
In response to	Optimizing DELETE (Ivan Voras <ivoras@fer.hr>)
List	pgsql-performance

Tree view

You do not have indexes on all of the columns which are linked by
foreign key constraints.

For example, let's say that I had a "scientist" table with a single
column "scientist_name" and another table "discovery" which had
"scientist_name" as a column with a foreign key constraint to the
"scientist" table.

If the system were to try to delete a row from the scientist table, then
it would need to scan the discovery table for any row which referenced
that scientist_name.

If there is an index on the scientist_name column in the discovery
table, this is a fast operation.  In your case however, there most
likely isn't an index on that column, so it needs to do a full table
scan of the discovery table for each row deleted from the scientist
table.

If the discovery table has 100,000 rows, and there are 100 scientists,
then deleting those 100 scientists would require scanning 100,000 * 100
= 10M records, so this sort of thing can quickly become a very expensive
operation.

Because of this potential for truly atrocious update/delete behavior,
some database systems (SQL Server at least, and IIRC Oracle as well)
either automatically create the index on discovery.scientist_name when
the foreign key constraint is created, or refuse to create the foreign
key constraint if there isn't already an index.

PG doesn't force you to have an index, which can be desirable for
performance reasons in some situations if you know what you're doing,
but allows you to royally shoot yourself in the foot on deletes/updates
to the parent table if you're not careful.

If you have a lot of constraints and want to track down which one is
unindexed, then doing an EXPLAIN ANALYZE of deleting a single row from
the parent table will tell you how long each of the referential
integrity checks takes, so you can figure out which indexes are missing.

-- Mark Lewis

On Tue, 2006-09-19 at 15:22 +0200, Ivan Voras wrote:
> I've just fired off a "DELETE FROM table" command (i.e. unfiltered
> DELETE) on a trivially small table but with many foreign key references
> (on similar-sized tables), and I'm waiting for it to finish. It's been
> 10 minutes now, which seems very excessive for a table of 9000 rows on a
> 3 GHz desktop machine.
>
> 'top' says it's all spent in USER time, and there's a ~~500KB/s write
> rate going on. Just before this DELETE, I've deleted data from a larger
> table (50000 rows) using the same method and it finished in couple of
> seconds - maybe it's a PostgreSQL bug?
>
> My question is: assuming it's not a bug, how to optimize DELETEs?
> Increasing work_mem maybe?
>
> (I'm using PostgreSQL 8.1.4 on FreeBSD 6- amd64)
>
> (I know about TRUNCATE; I need those foreign key references to cascade)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

pgsql-performance by date:

From: Rod Taylor
Date: 19 September 2006, 11:17:43
Subject: Re: Optimizing DELETE

From: Ivan Voras
Date: 19 September 2006, 17:39:58
Subject: Re: Optimizing DELETE

Re: Optimizing DELETE - Mailing list pgsql-performance

Previous

Next