Re: Very long deletion time on a 200 GB database - Mailing list pgsql-performance

From Reuven M. Lerner
Subject Re: Very long deletion time on a 200 GB database
Date
Msg-id 4F4B9A6D.1060102@lerner.co.il
Whole thread Raw
In response to Re: Very long deletion time on a 200 GB database  (Shaun Thomas <sthomas@peak6.com>)
Responses Re: Very long deletion time on a 200 GB database
Re: Very long deletion time on a 200 GB database
List pgsql-performance
Hi, Shaun.  You wrote:
>
>> In the end, it was agreed that we could execute the deletes over
>> time, deleting items in the background, or in parallel with the
>> application's work. After all, if the disk is filling up at the rate
>> of 2 GB/day, then so long as we delete 4 GB/day (which is pretty easy
>> to do), we should be fine.
>
> Please tell me you understand deleting rows from a PostgreSQL database
> doesn't work like this. :) The MVCC storage system means you'll
> basically just be marking all those deleted rows as reusable, so your
> database will stop growing, but you'll eventually want to purge all the
> accumulated dead rows.

Oh, I understand that all right.  I've had many, *many* conversations
with this company explaining MVCC.  It doesn't seem to work; when they
refer to "vacuuming the database," I remind them that we have autovacuum
working, to which they respond, "Oh, we mean VACUUM FULL."  At which
point I remind them that VACUUM FULL is almost certainly not what they
want to do, and then they say, "Yes, we know, but we still like to do it
every so often."

 From what I understand, the issue isn't one of current disk space, but
rather of how quickly the disk space is being used up.  Maybe they want
to reclaim disk space, but it's more crucial to stop the rate at which
disk space is being taken.  If we were to delete all of the existing
rows, and let vacuum mark them as dead and available for reuse, then
that would probably be just fine.

I wouldn't be surprised if we end up doing a CLUSTER at some point.  The
problem is basically that this machine is in 24/7 operation at
high-speed manufacturing plants, and the best-case scenario is for a
4-hour maintenance window.  I've suggested that we might be able to help
the situation somewhat by attaching a portable USB-based hard disk, and
adding a new tablespace that'll let us keep running while we divide up
the work that the disk is doing, but they've made it clear that the
current hardware configuration cannot and will not change.  Period.

So for now, we'll just try to DELETE faster than we INSERT, and combined
with autovacuum, I'm hoping that this crisis will be averted.  That
said, the current state of affairs with these machines is pretty
fragile, and I think that we might want to head off such problems in the
future, rather than be surprised by them.

Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner

pgsql-performance by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Very long deletion time on a 200 GB database
Next
From: Shaun Thomas
Date:
Subject: Re: Very long deletion time on a 200 GB database