Thread: How to increase row deletion efficiency?

How to increase row deletion efficiency?

From
Alexander Stanier
Date:
I am currently trying to separate two environments contained in one 
database. Essentially I need to break that one database down into two 
with a portion of the data going to each new database. I am intending to 
achieve this by duplicating the database and then stripping out the data 
that is not required in each database. I have started by trying to 
delete data from a set of 28 related tables, however the performance 
appears to be terrible. I am deleting from a table called document which 
cascades down to 27 tables underneath it linked by various cascading 
foreign key constraints. Some of these subsidiary tables have as many as 
a couple of million records.

Before executing the delete statement from document I tried setting all 
constraints as deferred within a transaction, but this does not seem to 
have helped.

I can't work out whether the indexes on these tables are a help or a 
hindrance. Presumably, any involving the foreign keys should help as 
long as PostgreSQL will actually use them, but given that large numbers 
of records are being deleted the query planner may decide just to do a 
sequence scan. An EXPLAIN doesn't show me what it does past the delete 
from document, i.e. if indexes are used when cascading. The downside of 
the indexes is that they have to be maintained which could be a lot of 
work in large scale deletions.

What I fear is that for every row that is deleted from the document 
table, the database is visiting all subsidiary tables to delete all data 
related to that one row before returning to document to delete another 
row. this would mean that all tables are being visited many times. If 
this is the way it is working, then the large tables are going to be a 
real problem. The most efficient way to do it would be to delete all 
document records, then with that list of documents in mind go on to the 
next table and delete all related records so that each table is only 
visited once to delete all the relevant records. I was hoping that 
setting constraints deferred would achieve this.

Can anyone advise me on how PostgreSQL (v8.0.3 on MacOS X 10.3) works in 
a delete statement and what strategy it uses to remove the data?
Can I specify "Unrecoverable" so that it doesn't write redo?
Are they any indicators I can use to tell me what part of the delete is 
taking so much time?
Also can anyone suggest anything else I can do to speed things up?

Or perhaps it simply is a lot of work and there is no way round it. My 
fallback option is to SELECT data that I do need rather than DELETE the 
data that I don't, but this route means I cannot make use of the foreign 
keys.

Regards,
Alex Stanier.


Re: How to increase row deletion efficiency?

From
Tom Lane
Date:
Alexander Stanier <alexander.stanier@egsgroup.com> writes:
> I am currently trying to separate two environments contained in one 
> database. Essentially I need to break that one database down into two 
> with a portion of the data going to each new database. I am intending to 
> achieve this by duplicating the database and then stripping out the data 
> that is not required in each database. I have started by trying to 
> delete data from a set of 28 related tables, however the performance 
> appears to be terrible. I am deleting from a table called document which 
> cascades down to 27 tables underneath it linked by various cascading 
> foreign key constraints. Some of these subsidiary tables have as many as 
> a couple of million records.

Do you have indexes on the referencing columns?  PG enforces an index on
the referenced column, but not on the other end, and DELETE is where it
will hurt if you haven't got one.
        regards, tom lane


Re: How to increase row deletion efficiency?

From
Alexander Stanier
Date:
Have only just had a chance to try this. There were several missing indexes on the referencing columns including one
ona table of circa 150K records. It now completes in a couple of minutes. Also it appears to be quicker if one doesn't
deferthe constraints. Thanks for the advice.<br /><br /> Regards, Alex.<br /><br /> Tom Lane wrote: <blockquote
cite="mid29270.1133968316@sss.pgh.pa.us"type="cite"><pre wrap="">Alexander Stanier <a class="moz-txt-link-rfc2396E"
href="mailto:alexander.stanier@egsgroup.com"><alexander.stanier@egsgroup.com></a>writes: </pre><blockquote
type="cite"><prewrap="">I am currently trying to separate two environments contained in one 
 
database. Essentially I need to break that one database down into two 
with a portion of the data going to each new database. I am intending to 
achieve this by duplicating the database and then stripping out the data 
that is not required in each database. I have started by trying to 
delete data from a set of 28 related tables, however the performance 
appears to be terrible. I am deleting from a table called document which 
cascades down to 27 tables underneath it linked by various cascading 
foreign key constraints. Some of these subsidiary tables have as many as 
a couple of million records.   </pre></blockquote><pre wrap="">
Do you have indexes on the referencing columns?  PG enforces an index on
the referenced column, but not on the other end, and DELETE is where it
will hurt if you haven't got one.
        regards, tom lane
 </pre></blockquote>