I'm helping manage a postgres installation that continually consumes a
considerable amount of disk space, and I'm hoping to learn a bit more
about both treating the symptoms and addressing the causes.
Here are the basics:
It's a pg 7.4.1 installation on a Debian stable GNU/Linux 2.6.2 box
with 4GB RAM with 4 2.4 GHz processors and 36 GB of disk space.
There are thousands of tables, many of which are object-relational
(I.e., many are subclasses of sets of top-level tables). There are
indexes in place for joins that apply to many of the columns in the
subclassed tables.
It's a high turnover database, in that the applications that use it
perform thousands of inserts, updates, and deletes on a daily basis.
We're seeing about 5-10 GB of increased disk space used on a daily
basis if a vacuum (full) or reindexdb is not performed. We were doing
one vacuum analyze full a week with nightly vacuum analyzes. We began
manually reindexing the worst offenders once we passed 50% disk usage
regularly.
So here are my questions:
1. Is adding reindexdb to cron to reindex the entire database nightly
overkill?
2. If we turn on pg_autovacuum and leave in place one weekly vacuum
full, is that a reasonable strategy?
3. Otherwise, is it better in general to vacuum prior to reindexing?
4. What are the best places to look for causes of the velocity of
growth?
Thanks!
-tfo