How to increase row deletion efficiency? - Mailing list pgsql-sql

From Alexander Stanier
Subject How to increase row deletion efficiency?
Date
Msg-id 4396F360.6080000@egsgroup.com
Whole thread Raw
Responses Re: How to increase row deletion efficiency?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
I am currently trying to separate two environments contained in one 
database. Essentially I need to break that one database down into two 
with a portion of the data going to each new database. I am intending to 
achieve this by duplicating the database and then stripping out the data 
that is not required in each database. I have started by trying to 
delete data from a set of 28 related tables, however the performance 
appears to be terrible. I am deleting from a table called document which 
cascades down to 27 tables underneath it linked by various cascading 
foreign key constraints. Some of these subsidiary tables have as many as 
a couple of million records.

Before executing the delete statement from document I tried setting all 
constraints as deferred within a transaction, but this does not seem to 
have helped.

I can't work out whether the indexes on these tables are a help or a 
hindrance. Presumably, any involving the foreign keys should help as 
long as PostgreSQL will actually use them, but given that large numbers 
of records are being deleted the query planner may decide just to do a 
sequence scan. An EXPLAIN doesn't show me what it does past the delete 
from document, i.e. if indexes are used when cascading. The downside of 
the indexes is that they have to be maintained which could be a lot of 
work in large scale deletions.

What I fear is that for every row that is deleted from the document 
table, the database is visiting all subsidiary tables to delete all data 
related to that one row before returning to document to delete another 
row. this would mean that all tables are being visited many times. If 
this is the way it is working, then the large tables are going to be a 
real problem. The most efficient way to do it would be to delete all 
document records, then with that list of documents in mind go on to the 
next table and delete all related records so that each table is only 
visited once to delete all the relevant records. I was hoping that 
setting constraints deferred would achieve this.

Can anyone advise me on how PostgreSQL (v8.0.3 on MacOS X 10.3) works in 
a delete statement and what strategy it uses to remove the data?
Can I specify "Unrecoverable" so that it doesn't write redo?
Are they any indicators I can use to tell me what part of the delete is 
taking so much time?
Also can anyone suggest anything else I can do to speed things up?

Or perhaps it simply is a lot of work and there is no way round it. My 
fallback option is to SELECT data that I do need rather than DELETE the 
data that I don't, but this route means I cannot make use of the foreign 
keys.

Regards,
Alex Stanier.


pgsql-sql by date:

Previous
From: Michael Burke
Date:
Subject: Re: Database with "override" tables
Next
From: Tom Lane
Date:
Subject: Re: How to increase row deletion efficiency?