Re: Controlling Database Growth - Mailing list pgsql-general

From Bill Moran
Subject Re: Controlling Database Growth
Date
Msg-id 20070125164312.9fd5ccd8.wmoran@collaborativefusion.com
Whole thread Raw
In response to Controlling Database Growth  (Mark Drago <markdrago@mail.com>)
Responses Re: Controlling Database Growth
List pgsql-general
In response to Mark Drago <markdrago@mail.com>:
>
> I'm using PostgreSQL to log web traffic leaving our network.  This
> results in the database growing to a fairly large size.  This machine
> will be left unattended and basically unmaintained for long stretches of
> time so I need a way to limit the disk space that Postgres uses.  So far
> I have come up with two basic ideas:
>
> 1. Routinely run 'du' in the directory containing the PostgreSQL data,
> which in my case is /var/postgresql/data and when it gets to a certain
> size remove a whole bunch of the old data from the database, and run
> 'vacuum full; reindex database db_name; analyze;'.
>
> The problem with this is that the vacuum could take nearly an hour to
> run in some cases and there will be data that needs to get logged during
> this hour.  Also, the vacuum process could use disk space above what the
> database is currently using and that disk space may not be available.
>
> 2. Use pgstattuple() to determine how much space is being used at any
> given time and delete a bunch of old rows from the database when it is
> approaching a limit.
>
> The nice thing about this is that 'vacuum full;' does not have to be
> executed in order to see the space get reclaimed.  The downside is that
> running pgstattuple() is much more expensive than running 'du', so the
> disk space checks can't happen as often, and they can not be run at all
> during the day.
>
> I am curious to know if anyone has any other ideas as to how I can limit
> the disk space that PostgreSQL uses to say 5GB.  I have not looked in to
> pg_autovacuum yet, but from what I have read about it it does not seem
> to be the answer to this problem.  Has anyone else had to do such a
> thing before?  Does anyone have any ideas on how to do this better?

I don't think either of those are good ideas, because they both rely on
disk limits to trigger drastic changes in database size, which will then
require drastic maintenance operations (vacuum full, reindex) to clean
up.

Personally, I think you'd be a lot better off estimating how much new
data comes in each day, and scheduling a daily delete of old data combined
with regular vacuum (either via cron or using autovacuum) and an occasional
reindex.

Add to that some system monitoring via snmp traps -- maybe even graphing
with mrtg -- to keep an eye on things in case you need to adjust the
frequency or amount of stuff that's done, and you should see the database
stabilize at a manageable size.

--
Bill Moran
Collaborative Fusion Inc.

pgsql-general by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Stats collector frozen?
Next
From: Martijn van Oosterhout
Date:
Subject: Re: column limit