I'm no closer to a solution, but here are some additional data points
- all taken on Fedora Core 6.
Postgres 8.1 built from source. Auto vacuum disabled.
Create Empty Database.
Run our load on the system for 2 hours to populate and exercise the database.
Run Vacuum. Takes more than a minute.
Run Vacuum immediately again. Takes more than a minute.
Reindex the database. Takes 10 seconds.
Run Vacuum again. Takes 2 seconds.
Allow load to run on the system for 30 minutes.
Run Vacuum again. Takes more than a minute.
Postgres 8.3 built from source. Auto vacuum disabled.
Create Empty Database.
Run our load on the system for 2 hours to populate and exercise the database.
Run Vacuum. Takes more than a minute.
Run Vacuum immediately again. Takes 15 seconds.
Run Vacuum immediately again. Takes 15 seconds.
Reindex the database. Takes 10 seconds.
Run Vacuum again. Takes 2 seconds.
So, PostgreSQL 8.3 shows better behaviour, but it is still showing
some sort of performance issue which a reindex fixes.
And then of course, the kicker is that we can't recreate any of these
issues when running the same exact test, on the same exact hardware -
but using a different underlying OS. When we were running under a
modern Ubuntu, the vacuum never takes more than 2 seconds. We will be
checking other OSs soon. I guess if we can't figure out what is
causing it, we can at least isolate the distros that we need to tell
our customers not to use (or to schedule a reindex if they insist on
not upgrading their OS)