Re: Very long deletion time on a 200 GB database - Mailing list pgsql-performance

From Glyn Astill
Subject Re: Very long deletion time on a 200 GB database
Date
Msg-id 1329990620.15378.YahooMailNeo@web171403.mail.ir2.yahoo.com
Whole thread Raw
In response to Very long deletion time on a 200 GB database  ("Reuven M. Lerner" <reuven@lerner.co.il>)
Responses Re: Very long deletion time on a 200 GB database
List pgsql-performance
Do you have any more detailed information about the hardware, what sort of disk configuration does it have?

Can you get onto the machine to look at what is using those resources?  You mention the 25gb of virtual memory; is that being used?  If so is it being used by postgres or something else?  If it's being used by postgres you should change postgresql.conf to work within your 5gb, otherwise can you stop the other applications to do your delete?  A snapshot from task manager or process monitor of process resource usage would be useful, even better some logging from perfmon including physical disk usage.

What would be even more useful is the table definitions; you mention trying to drop constraints to speed it up but is there anything else at play, e.g. triggers?



From: Reuven M. Lerner <reuven@lerner.co.il>
To: pgsql-performance@postgresql.org
Sent: Thursday, 23 February 2012, 8:39
Subject: [PERFORM] Very long deletion time on a 200 GB database

Hi, everyone.  I'm maintaining an application that exists as a "black box" in manufacturing plants.  The system is based on Windows, .NET, and PostgreSQL 8.3.  I'm a Unix kind of guy myself, but the application layer and system administration are being handled by other people; I'm just the PostgreSQL guy.

Because of the nature of the application, we don't have direct control over what happens.  And it turns out that at one installation, we're quickly running out of disk space.  The database is already taking up about 200 GB of space, and is growing by 1 GB or so a day.  Switching disks, either to a larger/faster traditional drive, or even to a SSD, is not an option.  (And yes, I know that SSDs have their own risks, but I'm just throwing that out as one option.)

Right now, the best solution to the space problem is to delete information associated with old records, where "old" is from at least 30 days ago.  The old records are spread across a few tables, including many large objects.  (The application was written by people who were new to PostgreSQL, and didn't realize that they could use BYTEA.) Basically, given a foreign key B.a_id that points to table A, I want to DELETE all in B where A's creation date is at least 30 days ago.

Unfortunately, when we implemented this simple delete, it executed slower than molasses, taking about 9 hours to do its thing.  Not only does this seem like a really, really long time to do such deleting, but we have only a 4-hour window in which to run this maintenance activity, before the factory starts to use our black box again.

I've tried a few approaches so far, none of which have been hugely successful.  The fact that it takes several hours to test each theory is obviously a bit of a pain, and so I'm curious to hear suggestions from people here.

I should note that my primary concern is available RAM.  The database, as I wrote, is about 200 GB in size, and PostgreSQL is reporting (according to Windows) use of about 5 GB RAM, plus another 25 GB of virtual memory.  I've told the Windows folks on this project that virtual memory kills a database, and that it shouldn't surprise us to have horrible performance if the database and operating system are both transferring massive amounts of data back and forth.  But there doesn't seem to be a good way to handle this

This is basically what I'm trying to execute:

DELETE FROM  B
WHERE r_id IN (SELECT R.id
    FROM R, B
    WHERE r.end_date < (NOW() - (interval '1 day' * 30))
      AND r.id = b.r_id

(1) I tried to write this as a join, rather than a subselect.  But B has an oid column that points to large objects, and on which we have a rule that removes the associated large object when a row in B is removed. Doing the delete as a join resulted in "no such large object with an oid of xxx" errors.  (I'm not sure why, although it might have to do with the rule.)

(2) I tried to grab the rows that *do* interest me, put them into a temporary table, TRUNCATE the existing table, and then copy the rows back.  I only tested that with a 1 GB subset of the data, but that took longer than other options.

(3) There are some foreign-key constraints on the B table.  I thought that perhaps doing a mass DELETE was queueing up all of those constraints, and possibly using up lots of memory and/or taking a long time to execute.  I thus rewrote my queries such that they first removed the constraints, then executed the DELETE, and then restored the constraints.  That didn't seem to improve things much either, and took a long time (30 minutes) just to remove the constraints.  I expected re-adding the constraints to take a while, but shouldn't removing them be relatively quick?

(4) I tried "chunking" the deletes, such that instead of trying to delete all of the records from the B table, I would instead delete just those associated with 100 or 200 rows from the R table.  On a 1 GB subset of the data, this seemed to work just fine.  But on the actual database, it was still far too slow.

I've been surprised by the time it takes to delete the records in question.  I keep trying to tell the others on this project that PostgreSQL isn't inherently slow, but that a 200 GB database running on a non-dedicated machine, with an old version (8.3), and while it's swapping RAM, will be slow regardless of the database software we're using.  But even so, 9 hours to delete 100 GB of data strikes me as a very long process.

Again, I continue to believe that given our hard time deadlines, and the fact that we're using a large amount of virtual memory, that there isn't really a solution that will work quickly and easily.  But I'd be delighted to be wrong, and welcome any and all comments and suggestions for how to deal with this.

Reuven

-- Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner

-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


pgsql-performance by date:

Previous
From: "Reuven M. Lerner"
Date:
Subject: Very long deletion time on a 200 GB database
Next
From: Marcin Mańk
Date:
Subject: Re: Very long deletion time on a 200 GB database