Thread: DB files, sizes and cleanup

DB files, sizes and cleanup

From

"Gauthier, Dave"

Date:

17 December 2010, 12:10:44

Hi:

I'm trying to justify disk space for a new linux server they're going to give me for my Postgres instance. When I do a "du" of the place I installed the older instance on the system that is to be replaced, I see that the vast, vast majorityof the space goes to the contents of the "base" dir. In there are a bunch of files with integers for names (iod's ?). And some of those have millions of files inside.

Is this normal? Should there be millions of files in some of these "base" directories?

Is this indicative of some sort of problem or lack of cleanup that I should have been doing?

The "du" shows that I'm using 196G (again, mostly in "base") but pg_database_size shows something like 1/4 that amount, around 50G. I'd like to know if there's something I'm supposed to be doing to cleanup old (possibly deleted) data.

Also, I was running pg_size_pretty(pg_database_size('mydb')) on all the dbs. It runs very fast for most, but just hangs for two of the databases. Is this indicative of some sort of problem? (BTW, the 2 it hangs on are very much like others that it doesn't hang on, so I used those numbers to estimate the 50G)

Thanks in Advance.

Re: DB files, sizes and cleanup

From

Bill Moran

Date:

17 December 2010, 13:17:17

In response to "Gauthier, Dave" <dave.gauthier@intel.com>:

> Hi:
>
> I'm trying to justify disk space for a new linux server they're going to give me for my Postgres instance.  When I do
a"du" of the place I installed the older instance on the system that is to be replaced, I see that the vast, vast
majorityofthe space goes to the contents of the "base" dir.  In there are a bunch of files with integers for names
(iod's?).  And some of those have millions of files inside. 
>
> Is this normal?  Should there be millions of files in some of these "base" directories?
> Is this indicative of some sort of problem or lack of cleanup that I should have been doing?
>
> The "du" shows that I'm using 196G (again, mostly in "base") but pg_database_size shows something like 1/4 that
amount,around 50G.  I'd like to know if there's something I'm supposed to be doing to cleanup old (possibly deleted)
data.
>
> Also, I was running pg_size_pretty(pg_database_size('mydb')) on all the dbs.  It runs very fast for most, but just
hangsfor two of the databases.  Is this indicative of some sort of problem?  (BTW, the 2 it hangs on are very much like
othersthat it doesn't hang on, so I used those numbers to estimate the 50G) 

1) Do you have autovacuum running, or do you have a regular vacuum
   scheduled?  Because this seems indicative of no vacuuming, or errors
   in vacuuming, or significantly insufficient vacuuming.
2) Unless your databases contain close to 100G of actual data, that size
   seems unreasonable.
3) pg_database_size() is probably not "hanging", it's probably just taking
   a very long time to stat() millions of files.

Overall, I'm guessing you're not vacuuming your databases on a proper
schedule and that most of that 196G is bloat that doesn't need to be
there.  When bloat gets really bad, you're generally better off dumping
the datbases and restoring them, as a vacuum full might take a very,
very long time.

If you can demonstrate that the cause of this is table bloat, then I
would go through all your databases and do a vacuum full/reindex or
do a dump/restore if the problem is very bad.  Once you have done that,
your du output should be more realistic and more helpful.

Then, take some time to set up appropriate autovacuum settings so the
problem doesn't come back.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: DB files, sizes and cleanup

From

"Gauthier, Dave"

Date:

17 December 2010, 13:33:25

When I restart the DB, it reports... "LOG: autovacuum launcher started".

"ps aux | grep postgres" yields this...

dfg_suse> ps aux | grep postgres

pgdbadm 22656 0.0 0.0 21296 2616 pts/7 S+ Dec16 0:00 /usr/intel/pkgs/postgresql/8.3.4/bin/psql -h fcadsql3.fc.intel.com hsxreuse

pgdbadm 9135 0.0 0.0 50000 5924 pts/10 S 12:22 0:00 /nfs/hd/itools/em64t_linux26/pkgs/postgresql/8.3.4/bin/postgres -D /app/PG/v83

pgdbadm 9146 0.0 0.0 50000 1360 ? Ss 12:22 0:00 postgres: writer process

pgdbadm 9147 0.0 0.0 50000 1156 ? Ss 12:22 0:00 postgres: wal writer process

pgdbadm 9148 0.0 0.0 50000 1316 ? Ss 12:22 0:00 postgres: autovacuum launcher process

pgdbadm 9149 0.0 0.0 18904 1308 ? Ss 12:22 0:00 postgres: stats collector process

pgdbadm 9354 0.0 0.0 2896 760 pts/9 S+ 12:27 0:00 grep postgres

TSo I assu,e it's running?

This is PG v 8.3.4 on linux.

-----Original Message-----
From: Bill Moran [mailto:wmoran@potentialtech.com]
Sent: Friday, December 17, 2010 12:17 PM
To: Gauthier, Dave
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] DB files, sizes and cleanup

In response to "Gauthier, Dave" <dave.gauthier@intel.com>:

> Hi:

> I'm trying to justify disk space for a new linux server they're going to give me for my Postgres instance. When I do a "du" of the place I installed the older instance on the system that is to be replaced, I see that the vast, vast majorityof the space goes to the contents of the "base" dir. In there are a bunch of files with integers for names (iod's ?). And some of those have millions of files inside.

> Is this normal? Should there be millions of files in some of these "base" directories?

> Is this indicative of some sort of problem or lack of cleanup that I should have been doing?

> The "du" shows that I'm using 196G (again, mostly in "base") but pg_database_size shows something like 1/4 that amount, around 50G. I'd like to know if there's something I'm supposed to be doing to cleanup old (possibly deleted) data.

> Also, I was running pg_size_pretty(pg_database_size('mydb')) on all the dbs. It runs very fast for most, but just hangs for two of the databases. Is this indicative of some sort of problem? (BTW, the 2 it hangs on are very much like others that it doesn't hang on, so I used those numbers to estimate the 50G)

1) Do you have autovacuum running, or do you have a regular vacuum

scheduled? Because this seems indicative of no vacuuming, or errors

in vacuuming, or significantly insufficient vacuuming.

2) Unless your databases contain close to 100G of actual data, that size

seems unreasonable.

3) pg_database_size() is probably not "hanging", it's probably just taking

a very long time to stat() millions of files.

Overall, I'm guessing you're not vacuuming your databases on a proper

schedule and that most of that 196G is bloat that doesn't need to be

there. When bloat gets really bad, you're generally better off dumping

the datbases and restoring them, as a vacuum full might take a very,

very long time.

If you can demonstrate that the cause of this is table bloat, then I

would go through all your databases and do a vacuum full/reindex or

do a dump/restore if the problem is very bad. Once you have done that,

your du output should be more realistic and more helpful.

Then, take some time to set up appropriate autovacuum settings so the

problem doesn't come back.

Bill Moran

http://www.potentialtech.com

http://people.collaborativefusion.com/~wmoran/

Re: DB files, sizes and cleanup

From

Merlin Moncure

Date:

17 December 2010, 17:18:46

On Fri, Dec 17, 2010 at 12:31 PM, Gauthier, Dave
<dave.gauthier@intel.com> wrote:
> When I restart the DB, it reports... "LOG: autovacuum launcher started".
>
> "ps aux | grep postgres" yields this...
>
> dfg_suse> ps aux | grep postgres
> pgdbadm 22656  0.0  0.0  21296  2616 pts/7    S+   Dec16   0:00
> /usr/intel/pkgs/postgresql/8.3.4/bin/psql -h fcadsql3.fc.intel.com hsxreuse
> pgdbadm  9135  0.0  0.0  50000  5924 pts/10   S    12:22   0:00
> /nfs/hd/itools/em64t_linux26/pkgs/postgresql/8.3.4/bin/postgres -D
> /app/PG/v83
> pgdbadm  9146  0.0  0.0  50000  1360 ?        Ss   12:22   0:00 postgres:
> writer
> process
> pgdbadm  9147  0.0  0.0  50000  1156 ?        Ss   12:22   0:00 postgres:
> wal writer
> process
> pgdbadm  9148  0.0  0.0  50000  1316 ?        Ss   12:22   0:00 postgres:
> autovacuum launcher
> process
> pgdbadm  9149  0.0  0.0  18904  1308 ?        Ss   12:22   0:00 postgres:
> stats collector
> process
> pgdbadm  9354  0.0  0.0   2896   760 pts/9    S+   12:27   0:00 grep
> postgres
>
>
> TSo I assu,e it's running?
>
> This is PG v 8.3.4 on linux.
>
>
>
> -----Original Message-----
> From: Bill Moran [mailto:wmoran@potentialtech.com]
> Sent: Friday, December 17, 2010 12:17 PM
> To: Gauthier, Dave
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] DB files, sizes and cleanup
>
> In response to "Gauthier, Dave" <dave.gauthier@intel.com>:
>
>> Hi:
>>
>> I'm trying to justify disk space for a new linux server they're going to
>> give me for my Postgres instance.  When I do a "du" of the place I installed
>> the older instance on the system that is to be replaced, I see that the
>> vast, vast majorityof the space goes to the contents of the "base" dir.  In
>> there are a bunch of files with integers for names (iod's ?).  And some of
>> those have millions of files inside.
>>
>> Is this normal?  Should there be millions of files in some of these "base"
>> directories?
>> Is this indicative of some sort of problem or lack of cleanup that I
>> should have been doing?
>>
>> The "du" shows that I'm using 196G (again, mostly in "base") but
>> pg_database_size shows something like 1/4 that amount, around 50G.  I'd like
>> to know if there's something I'm supposed to be doing to cleanup old
>> (possibly deleted) data.
>>
>> Also, I was running pg_size_pretty(pg_database_size('mydb')) on all the
>> dbs.  It runs very fast for most, but just hangs for two of the databases.
>> Is this indicative of some sort of problem?  (BTW, the 2 it hangs on are
>> very much like others that it doesn't hang on, so I used those numbers to
>> estimate the 50G)
>
> 1) Do you have autovacuum running, or do you have a regular vacuum
>    scheduled?  Because this seems indicative of no vacuuming, or errors
>    in vacuuming, or significantly insufficient vacuuming.
> 2) Unless your databases contain close to 100G of actual data, that size
>    seems unreasonable.
> 3) pg_database_size() is probably not "hanging", it's probably just taking
>    a very long time to stat() millions of files.
>
> Overall, I'm guessing you're not vacuuming your databases on a proper
> schedule and that most of that 196G is bloat that doesn't need to be
> there.  When bloat gets really bad, you're generally better off dumping
> the datbases and restoring them, as a vacuum full might take a very,
> very long time.
>
> If you can demonstrate that the cause of this is table bloat, then I
> would go through all your databases and do a vacuum full/reindex or
> do a dump/restore if the problem is very bad.  Once you have done that,
> your du output should be more realistic and more helpful.
>
> Then, take some time to set up appropriate autovacuum settings so the
> problem doesn't come back.

Check your logs for warnings about the free space map.  what are
max_fsm_pages and max_fsm_relations set to? how many tables and
indexes do you have approximately?  do you truly have 'millions' of
files?

go into base folder and do:
find | wc -l

merlin

Re: DB files, sizes and cleanup

From

"Gauthier, Dave"

Date:

17 December 2010, 18:22:39

max_fsm_pages = 200000
max_fsm_relations = 12000

There are 12 DBs with roughly 30 tables+indexes each.

There are apparently 2 "bad" DBs.  Both identical in terms of data models (clones with different data).  I've
pg_dummpedone of them to a file, dropped the DB (took a long time as millions of files were deleted) and recreated it.
Itnow has 186 files. 

ls -1 | wc took a while for the other bad one but eventually came up with exactly 7,949,911 files, so yes, millions.
Theother one had millions too before I dropped it.  Something is clearly wrong.  But, since the DB recreate worked for
theother one, I'll do the same thing to fix this one too. 

What I will need to know then is how to prevent this in the future.  It's very odd because the worst of the 2 bad DBs
wasa sister DB to one that's no problem at all.  Here's the picture... 

I have a DB, call it "foo", that gets loaded with a ton of data at night.  The users query the thing readonly all day.
Atmidnight, an empty DB called "foo_standby", which is identical to "foo" in terms of data model is reloaded from
scratch. It takes hours.  But when it's done, I do a few rename databases to swap "foo" with "foo_standby" (really just
aname swap).  "foo_standby" serves as a live backup of yesterday's data.  Come the next midnight, I truncate all the
tablesand start the process all over again.  

I say all this because "foo" is the DB with 8 million files in it but "foo_standby" has 186 files.  Looks like one of
thesethings is getting vacuumed fine while the other is carrying baggage.   

I can't remember, but perhaps one of these 2 is a carry-over from an earlier version of PG (8.1 maybe, or maybe even
7.something). Maybe it had, and still has the millions of files and the vacuum isn't getting to them? 

Anyway, your advise on what to set in postgres.conf to make sure this is working would be greatly appreciated.

Thanks for the interest and advise !

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: Friday, December 17, 2010 4:19 PM
To: Gauthier, Dave
Cc: Bill Moran; pgsql-general@postgresql.org
Subject: Re: [GENERAL] DB files, sizes and cleanup

On Fri, Dec 17, 2010 at 12:31 PM, Gauthier, Dave
<dave.gauthier@intel.com> wrote:
> When I restart the DB, it reports... "LOG: autovacuum launcher started".
>
> "ps aux | grep postgres" yields this...
>
> dfg_suse> ps aux | grep postgres
> pgdbadm 22656  0.0  0.0  21296  2616 pts/7    S+   Dec16   0:00
> /usr/intel/pkgs/postgresql/8.3.4/bin/psql -h fcadsql3.fc.intel.com hsxreuse
> pgdbadm  9135  0.0  0.0  50000  5924 pts/10   S    12:22   0:00
> /nfs/hd/itools/em64t_linux26/pkgs/postgresql/8.3.4/bin/postgres -D
> /app/PG/v83
> pgdbadm  9146  0.0  0.0  50000  1360 ?        Ss   12:22   0:00 postgres:
> writer
> process
> pgdbadm  9147  0.0  0.0  50000  1156 ?        Ss   12:22   0:00 postgres:
> wal writer
> process
> pgdbadm  9148  0.0  0.0  50000  1316 ?        Ss   12:22   0:00 postgres:
> autovacuum launcher
> process
> pgdbadm  9149  0.0  0.0  18904  1308 ?        Ss   12:22   0:00 postgres:
> stats collector
> process
> pgdbadm  9354  0.0  0.0   2896   760 pts/9    S+   12:27   0:00 grep
> postgres
>
>
> TSo I assu,e it's running?
>
> This is PG v 8.3.4 on linux.
>
>
>
> -----Original Message-----
> From: Bill Moran [mailto:wmoran@potentialtech.com]
> Sent: Friday, December 17, 2010 12:17 PM
> To: Gauthier, Dave
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] DB files, sizes and cleanup
>
> In response to "Gauthier, Dave" <dave.gauthier@intel.com>:
>
>> Hi:
>>
>> I'm trying to justify disk space for a new linux server they're going to
>> give me for my Postgres instance.  When I do a "du" of the place I installed
>> the older instance on the system that is to be replaced, I see that the
>> vast, vast majorityof the space goes to the contents of the "base" dir.  In
>> there are a bunch of files with integers for names (iod's ?).  And some of
>> those have millions of files inside.
>>
>> Is this normal?  Should there be millions of files in some of these "base"
>> directories?
>> Is this indicative of some sort of problem or lack of cleanup that I
>> should have been doing?
>>
>> The "du" shows that I'm using 196G (again, mostly in "base") but
>> pg_database_size shows something like 1/4 that amount, around 50G.  I'd like
>> to know if there's something I'm supposed to be doing to cleanup old
>> (possibly deleted) data.
>>
>> Also, I was running pg_size_pretty(pg_database_size('mydb')) on all the
>> dbs.  It runs very fast for most, but just hangs for two of the databases.
>> Is this indicative of some sort of problem?  (BTW, the 2 it hangs on are
>> very much like others that it doesn't hang on, so I used those numbers to
>> estimate the 50G)
>
> 1) Do you have autovacuum running, or do you have a regular vacuum
>    scheduled?  Because this seems indicative of no vacuuming, or errors
>    in vacuuming, or significantly insufficient vacuuming.
> 2) Unless your databases contain close to 100G of actual data, that size
>    seems unreasonable.
> 3) pg_database_size() is probably not "hanging", it's probably just taking
>    a very long time to stat() millions of files.
>
> Overall, I'm guessing you're not vacuuming your databases on a proper
> schedule and that most of that 196G is bloat that doesn't need to be
> there.  When bloat gets really bad, you're generally better off dumping
> the datbases and restoring them, as a vacuum full might take a very,
> very long time.
>
> If you can demonstrate that the cause of this is table bloat, then I
> would go through all your databases and do a vacuum full/reindex or
> do a dump/restore if the problem is very bad.  Once you have done that,
> your du output should be more realistic and more helpful.
>
> Then, take some time to set up appropriate autovacuum settings so the
> problem doesn't come back.

Check your logs for warnings about the free space map.  what are
max_fsm_pages and max_fsm_relations set to? how many tables and
indexes do you have approximately?  do you truly have 'millions' of
files?

go into base folder and do:
find | wc -l

merlin

Re: DB files, sizes and cleanup

From

Tom Lane

Date:

17 December 2010, 18:43:32

"Gauthier, Dave" <dave.gauthier@intel.com> writes:
> ls -1 | wc took a while for the other bad one but eventually came up
> with exactly 7,949,911 files, so yes, millions.  The other one had
> millions too before I dropped it.  Something is clearly wrong.

No doubt, but it's impossible to tell what from just the bare statement
that you had a lot of unexpected files.

Did you happen to notice the timestamps on those files --- eg, were
there recent ones, or were they all old?  Do you use a lot of temporary
tables?  Have you had a lot of crashes in this database?

>> This is PG v 8.3.4 on linux.

It wouldn't be a bad idea to update to 8.3.something-recent.

            regards, tom lane

Re: DB files, sizes and cleanup

From

Merlin Moncure

Date:

18 December 2010, 14:29:23

On Fri, Dec 17, 2010 at 5:22 PM, Gauthier, Dave <dave.gauthier@intel.com> wrote:
> max_fsm_pages = 200000
> max_fsm_relations = 12000
>
> There are 12 DBs with roughly 30 tables+indexes each.
>
> There are apparently 2 "bad" DBs.  Both identical in terms of data models (clones with different data).  I've
pg_dummpedone of them to a file, dropped the DB (took a long time as millions of files were deleted) and recreated it.
 Itnow has 186 files. 
>
> ls -1 | wc took a while for the other bad one but eventually came up with exactly 7,949,911 files, so yes, millions.
 Theother one had millions too before I dropped it.  Something is clearly wrong.  But, since the DB recreate worked for
theother one, I'll do the same thing to fix this one too. 
>
> What I will need to know then is how to prevent this in the future.  It's very odd because the worst of the 2 bad DBs
wasa sister DB to one that's no problem at all.  Here's the picture... 
>
> I have a DB, call it "foo", that gets loaded with a ton of data at night.  The users query the thing readonly all
day. At midnight, an empty DB called "foo_standby", which is identical to "foo" in terms of data model is reloaded from
scratch. It takes hours.  But when it's done, I do a few rename databases to swap "foo" with "foo_standby" (really just
aname swap).  "foo_standby" serves as a live backup of yesterday's data.  Come the next midnight, I truncate all the
tablesand start the process all over again. 

maybe something in this process is leaking files.  if I was in your
shoes, I'd recreate the database from scratch, then watch the file
count carefully and look for unusual growth.  this is probably not the
case, but if it is in fact a backend bug it will turn up again right
away.

anything else interesting jump out about these files? for example, are
there a lot of 0 byte files?

merlin

Re: DB files, sizes and cleanup

From

Jim Nasby

Date:

19 December 2010, 14:07:39

On Dec 17, 2010, at 4:22 PM, Gauthier, Dave wrote:
> max_fsm_pages = 200000

Don't know that this is related, but that count is pretty low. If you run a vacuumdb -av and capture the output, the
veryend will tell you how many FSM pages you actually need; it's very possible 200,000 isn't enough. 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: DB files, sizes and cleanup

From

"Gauthier, Dave"

Date:

19 December 2010, 22:50:30

I recreated the 2 "bad" DBs (the ones with millions of files in them).  So far, they're behaving.

Thanks for the help.

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: Saturday, December 18, 2010 1:29 PM
To: Gauthier, Dave
Cc: Bill Moran; pgsql-general@postgresql.org
Subject: Re: [GENERAL] DB files, sizes and cleanup

On Fri, Dec 17, 2010 at 5:22 PM, Gauthier, Dave <dave.gauthier@intel.com> wrote:
> max_fsm_pages = 200000
> max_fsm_relations = 12000
>
> There are 12 DBs with roughly 30 tables+indexes each.
>
> There are apparently 2 "bad" DBs.  Both identical in terms of data models (clones with different data).  I've
pg_dummpedone of them to a file, dropped the DB (took a long time as millions of files were deleted) and recreated it.
 Itnow has 186 files. 
>
> ls -1 | wc took a while for the other bad one but eventually came up with exactly 7,949,911 files, so yes, millions.
 Theother one had millions too before I dropped it.  Something is clearly wrong.  But, since the DB recreate worked for
theother one, I'll do the same thing to fix this one too. 
>
> What I will need to know then is how to prevent this in the future.  It's very odd because the worst of the 2 bad DBs
wasa sister DB to one that's no problem at all.  Here's the picture... 
>
> I have a DB, call it "foo", that gets loaded with a ton of data at night.  The users query the thing readonly all
day. At midnight, an empty DB called "foo_standby", which is identical to "foo" in terms of data model is reloaded from
scratch. It takes hours.  But when it's done, I do a few rename databases to swap "foo" with "foo_standby" (really just
aname swap).  "foo_standby" serves as a live backup of yesterday's data.  Come the next midnight, I truncate all the
tablesand start the process all over again. 

maybe something in this process is leaking files.  if I was in your
shoes, I'd recreate the database from scratch, then watch the file
count carefully and look for unusual growth.  this is probably not the
case, but if it is in fact a backend bug it will turn up again right
away.

anything else interesting jump out about these files? for example, are
there a lot of 0 byte files?

merlin