Re: DB files, sizes and cleanup - Mailing list pgsql-general

From Bill Moran
Subject Re: DB files, sizes and cleanup
Date
Msg-id 20101217121707.85a719f4.wmoran@potentialtech.com
Whole thread Raw
In response to DB files, sizes and cleanup  ("Gauthier, Dave" <dave.gauthier@intel.com>)
Responses Re: DB files, sizes and cleanup
List pgsql-general
In response to "Gauthier, Dave" <dave.gauthier@intel.com>:

> Hi:
>
> I'm trying to justify disk space for a new linux server they're going to give me for my Postgres instance.  When I do
a"du" of the place I installed the older instance on the system that is to be replaced, I see that the vast, vast
majorityofthe space goes to the contents of the "base" dir.  In there are a bunch of files with integers for names
(iod's?).  And some of those have millions of files inside. 
>
> Is this normal?  Should there be millions of files in some of these "base" directories?
> Is this indicative of some sort of problem or lack of cleanup that I should have been doing?
>
> The "du" shows that I'm using 196G (again, mostly in "base") but pg_database_size shows something like 1/4 that
amount,around 50G.  I'd like to know if there's something I'm supposed to be doing to cleanup old (possibly deleted)
data.
>
> Also, I was running pg_size_pretty(pg_database_size('mydb')) on all the dbs.  It runs very fast for most, but just
hangsfor two of the databases.  Is this indicative of some sort of problem?  (BTW, the 2 it hangs on are very much like
othersthat it doesn't hang on, so I used those numbers to estimate the 50G) 

1) Do you have autovacuum running, or do you have a regular vacuum
   scheduled?  Because this seems indicative of no vacuuming, or errors
   in vacuuming, or significantly insufficient vacuuming.
2) Unless your databases contain close to 100G of actual data, that size
   seems unreasonable.
3) pg_database_size() is probably not "hanging", it's probably just taking
   a very long time to stat() millions of files.

Overall, I'm guessing you're not vacuuming your databases on a proper
schedule and that most of that 196G is bloat that doesn't need to be
there.  When bloat gets really bad, you're generally better off dumping
the datbases and restoring them, as a vacuum full might take a very,
very long time.

If you can demonstrate that the cause of this is table bloat, then I
would go through all your databases and do a vacuum full/reindex or
do a dump/restore if the problem is very bad.  Once you have done that,
your du output should be more realistic and more helpful.

Then, take some time to set up appropriate autovacuum settings so the
problem doesn't come back.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Table both does not and does exist! wth?
Next
From: "Gauthier, Dave"
Date:
Subject: Re: DB files, sizes and cleanup