Thread: Controlling Database Growth
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, I'm using PostgreSQL to log web traffic leaving our network. This results in the database growing to a fairly large size. This machine will be left unattended and basically unmaintained for long stretches of time so I need a way to limit the disk space that Postgres uses. So far I have come up with two basic ideas: 1. Routinely run 'du' in the directory containing the PostgreSQL data, which in my case is /var/postgresql/data and when it gets to a certain size remove a whole bunch of the old data from the database, and run 'vacuum full; reindex database db_name; analyze;'. The problem with this is that the vacuum could take nearly an hour to run in some cases and there will be data that needs to get logged during this hour. Also, the vacuum process could use disk space above what the database is currently using and that disk space may not be available. 2. Use pgstattuple() to determine how much space is being used at any given time and delete a bunch of old rows from the database when it is approaching a limit. The nice thing about this is that 'vacuum full;' does not have to be executed in order to see the space get reclaimed. The downside is that running pgstattuple() is much more expensive than running 'du', so the disk space checks can't happen as often, and they can not be run at all during the day. I am curious to know if anyone has any other ideas as to how I can limit the disk space that PostgreSQL uses to say 5GB. I have not looked in to pg_autovacuum yet, but from what I have read about it it does not seem to be the answer to this problem. Has anyone else had to do such a thing before? Does anyone have any ideas on how to do this better? Thanks, Mark Drago -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFFuSGy2ovBrIOxiiARAjmeAKCgmN4fNWTv1ZTgkCQZCeAAgYdLyQCgwZsb uqveC3xd97nWNg2ty2MCs0M= =dTca -----END PGP SIGNATURE-----
In response to Mark Drago <markdrago@mail.com>: > > I'm using PostgreSQL to log web traffic leaving our network. This > results in the database growing to a fairly large size. This machine > will be left unattended and basically unmaintained for long stretches of > time so I need a way to limit the disk space that Postgres uses. So far > I have come up with two basic ideas: > > 1. Routinely run 'du' in the directory containing the PostgreSQL data, > which in my case is /var/postgresql/data and when it gets to a certain > size remove a whole bunch of the old data from the database, and run > 'vacuum full; reindex database db_name; analyze;'. > > The problem with this is that the vacuum could take nearly an hour to > run in some cases and there will be data that needs to get logged during > this hour. Also, the vacuum process could use disk space above what the > database is currently using and that disk space may not be available. > > 2. Use pgstattuple() to determine how much space is being used at any > given time and delete a bunch of old rows from the database when it is > approaching a limit. > > The nice thing about this is that 'vacuum full;' does not have to be > executed in order to see the space get reclaimed. The downside is that > running pgstattuple() is much more expensive than running 'du', so the > disk space checks can't happen as often, and they can not be run at all > during the day. > > I am curious to know if anyone has any other ideas as to how I can limit > the disk space that PostgreSQL uses to say 5GB. I have not looked in to > pg_autovacuum yet, but from what I have read about it it does not seem > to be the answer to this problem. Has anyone else had to do such a > thing before? Does anyone have any ideas on how to do this better? I don't think either of those are good ideas, because they both rely on disk limits to trigger drastic changes in database size, which will then require drastic maintenance operations (vacuum full, reindex) to clean up. Personally, I think you'd be a lot better off estimating how much new data comes in each day, and scheduling a daily delete of old data combined with regular vacuum (either via cron or using autovacuum) and an occasional reindex. Add to that some system monitoring via snmp traps -- maybe even graphing with mrtg -- to keep an eye on things in case you need to adjust the frequency or amount of stuff that's done, and you should see the database stabilize at a manageable size. -- Bill Moran Collaborative Fusion Inc.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/25/07 15:43, Bill Moran wrote: > In response to Mark Drago <markdrago@mail.com>: [snip] > I don't think either of those are good ideas, because they both > rely on disk limits to trigger drastic changes in database size, > which will then require drastic maintenance operations (vacuum > full, reindex) to clean up. > > Personally, I think you'd be a lot better off estimating how much > new data comes in each day, and scheduling a daily delete of old > data combined with regular vacuum (either via cron or using > autovacuum) and an occasional reindex. > > Add to that some system monitoring via snmp traps -- maybe even > graphing with mrtg -- to keep an eye on things in case you need > to adjust the frequency or amount of stuff that's done, and you > should see the database stabilize at a manageable size. Agree totally with this. You could even partition the table (by month, probably) either using a view of a UNION ALL or with PostgreSQL's built-in partitioning. Dropping the oldest month would then be a rapid process. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFuUOIS9HxQb37XmcRArJKAJ4u39v+IpTjpCZ6oPSpmfrhkybikACfWrGB 1JM2fokqQafd/yOWGv7vDa8= =1jNP -----END PGP SIGNATURE-----