Thread: VACUUM not doing its job?

VACUUM not doing its job?

From
"Kristian Eide"
Date:
Hi, I have a PostgreSQL 7.2.1 database which normally (just after a
pg_restore) takes about 700-800MB of disk space. Now, the problem is that
the database grows quite quickly when in use, although we don't put very
much data in. Granted, there is quite a few records deleted and inserted,
but the total data volume grows only slowly.

Three week ago we did a pg_dump/pg_restore, and after pgsql/data/ took about
750MB. Now it uses 2,4GB of disk space! If we do a new complete dump and
restore the volume will decrease to about 800MB.

We of course do a 'vacuumdb -a -z' every day, but this does not seem to help
much unfortunately. The database is in use 24/7 so a full vacuum is not an
option. What we do now is simply a full dump/restore about once a month,
because the database slows to a crawl as the data volume grows too large
(this seems to be because it loads large amouts of data from disk for each
query, probably because the data postgre use no longer fit in the disk
cache).

Anyway, what could be causing this problem and what can we do about it? The
dump/restore option is not attractive in the long run for obvious reasons.


Regards,

Kristian



Re: VACUUM not doing its job?

From
Ken Corey
Date:
I've run into this myself.  Tom lane helped me out.

In my case, it was the fact that indexes don't release the space of
indexes of deleted rows.  So, if you have a table that has a lot of
inserts/deletes, your indexes will grow incredibly fast.

The way to see what your biggest items are: 

select * from pg_class order by relpages desc;

If your tables are active with lots of inserts/deletes, the biggest
things will likely be indexes.

The only way that I know to recover this space is to drop the indexes
and recreate them.  Vacuum didn't touch them for me.

-Ken

On Sat, 2002-08-03 at 21:53, Kristian Eide wrote:
> Three week ago we did a pg_dump/pg_restore, and after pgsql/data/ took about
> 750MB. Now it uses 2,4GB of disk space! If we do a new complete dump and
> restore the volume will decrease to about 800MB.





Re: VACUUM not doing its job?

From
"Kristian Eide"
Date:
> In my case, it was the fact that indexes don't release the space of
> indexes of deleted rows.  So, if you have a table that has a lot of
> inserts/deletes, your indexes will grow incredibly fast.
>
> The way to see what your biggest items are:
> select * from pg_class order by relpages desc;

Yes, I already suspected this could be at least part of the reason, and your
SQL query confirms it. However, dropping and re-creating my biggest indexes
only reclaims about 500MB, this still leaves about 1GB unaccounted for and I
can't see how my remaining (small) indexes can be responsible for this (btw:
do you know how much diskspace one 'relpage' use?).

Given that I have lots of deletes/inserts, is there anything besides the
indexes which could use this much space?

> The only way that I know to recover this space is to drop the indexes
> and recreate them.  Vacuum didn't touch them for me.

This is not good as the database is in use 24/7, and without the indexes
everything comes to a screeching halt. This means I probably will have to
stop the programs using the database for the time it takes to re-create the
indexes; this is better than having to dump/restore everything however :)

Are there any plans to also vacuum the indexes in a future version of
Postgre (otherwise an excellent piece of software!) ?


Regards,

Kristian



Re: VACUUM not doing its job?

From
Joe Conway
Date:
Kristian Eide wrote:
> This is not good as the database is in use 24/7, and without the indexes
> everything comes to a screeching halt. This means I probably will have to
> stop the programs using the database for the time it takes to re-create the
> indexes; this is better than having to dump/restore everything however :)

Try REINDEX. I don't think that precludes (at least) read access.
http://www.postgresql.org/idocs/index.php?sql-reindex.html

You should also search the archives for threads on free space maps. You 
most likely need to increase yours. In particular, see:
http://archives.postgresql.org/pgsql-general/2002-07/msg00972.php

HTH,

Joe



Re: VACUUM not doing its job?

From
"Kristian Eide"
Date:
> You should also search the archives for threads on free space maps. You
> most likely need to increase yours. In particular, see:
>    http://archives.postgresql.org/pgsql-general/2002-07/msg00972.php

Thanks, very helpful, although there does not seem to be much description of
what the two free space map options in postgresql.conf actually do. Doing a
VACUUM ANALYZE VERBOSE on my largest table gives me:

NOTICE:  Pages 43681: Changed 208, Empty 0; Tup 1921387: Vac 61669, Keep 0,
UnUsed 1362341.

I will try to increase the 'max_fsm_pages' option from 10000 to 500000 and
see if that helps.


Regards,

Kristian



Re: VACUUM not doing its job?

From
Joe Conway
Date:
Kristian Eide wrote:
> Thanks, very helpful, although there does not seem to be much description of
> what the two free space map options in postgresql.conf actually do. Doing a
> VACUUM ANALYZE VERBOSE on my largest table gives me:
> 
> NOTICE:  Pages 43681: Changed 208, Empty 0; Tup 1921387: Vac 61669, Keep 0,
> UnUsed 1362341.
> 
> I will try to increase the 'max_fsm_pages' option from 10000 to 500000 and
> see if that helps.
> 

Note that you'll need to do a vacuum full *first* to recover the lost 
space, since the free space map is populated as the tuples are actually 
freed, I believe. After that you can adjust 'max_fsm_pages' and your 
vacuum frequency to achieve an equilibrium.

Joe



Re: VACUUM not doing its job?

From
Robert Treat
Date:
On Sat, 2002-08-03 at 19:39, Kristian Eide wrote:
> > You should also search the archives for threads on free space maps. You
> > most likely need to increase yours. In particular, see:
> >    http://archives.postgresql.org/pgsql-general/2002-07/msg00972.php
> 
> Thanks, very helpful, although there does not seem to be much description of
> what the two free space map options in postgresql.conf actually do. Doing a
> VACUUM ANALYZE VERBOSE on my largest table gives me:
> 
> NOTICE:  Pages 43681: Changed 208, Empty 0; Tup 1921387: Vac 61669, Keep 0,
> UnUsed 1362341.
> 
> I will try to increase the 'max_fsm_pages' option from 10000 to 500000 and
> see if that helps.
> 
> 
> Regards,
> 
> Kristian
> 

Note the high size of your unused value, this is a sure sign that your
not vacuuming as often as you are filling up your free space map.
Remember that for every insert/update/deletion pg will create an unused
tuple in the db. While the f_s_m attempts to keep track of these, once
it runs out of space, those unused tuples have little chance of being
recovered.  I'm trying to piece together some documentation on this, but
for now I'd recommend increasing the frequency of your vacuuming, which
should have very little performance impact and keep overall database
size smaller than if you increase the f_s_m. Remember also that if you
have a smaller subset of tables that generate most of your "tuple
turnover" that you can run vacuum on these tables specifically rather
than your whole db. hth,

Robert Treat




Re: VACUUM not doing its job?

From
"Sander Steffann"
Date:
> This is not good as the database is in use 24/7, and without the indexes
> everything comes to a screeching halt. This means I probably will have to
> stop the programs using the database for the time it takes to re-create
the
> indexes; this is better than having to dump/restore everything however :)

You can try using REINDEX instead of dropping and creating the index. It
takes an exclusive lock on the table (instead of a write-lock during CREATE
INDEX) so your application will have to wait during reindex, but maybe it is
faster... When you use the reindexdb script in the contrib dir you can even
put it in a crontab.

Sander




Re: VACUUM not doing its job?

From
Jochem van Dieten
Date:
Kristian Eide wrote:
>>You should also search the archives for threads on free space maps. You
>>most likely need to increase yours. In particular, see:
>>   http://archives.postgresql.org/pgsql-general/2002-07/msg00972.php
> 
> 
> Thanks, very helpful, although there does not seem to be much description of
> what the two free space map options in postgresql.conf actually do. Doing a
> VACUUM ANALYZE VERBOSE on my largest table gives me:
> 
> NOTICE:  Pages 43681: Changed 208, Empty 0; Tup 1921387: Vac 61669, Keep 0,
> UnUsed 1362341.
> 
> I will try to increase the 'max_fsm_pages' option from 10000 to 500000 and
> see if that helps.

The default could indeed be a bit low for some installations. With 
10,000 pages in the FSM and a page being 8 KB this is only 80 MB at most 
(presuming pages are completely free). So whenever you update more than 
80 MB between vacuums, you are sure to have a disk space leak, and even 
if you update less but the pages are not completely empty you will leak 
diskspace.

So the first thing I would do is start with running vacuum more often. 
Since it is just a vaccum and not a vacuum full it does not lock the 
tables anyway, and the key appears to be whether the amount of updates 
between vacuums exceeds max_fsm_pages * blocksize * free space per page.

Also, maybe 500000 is a bit over the top. 500,000 pages of 8 KB each 
make 4 GB, and even after 3 weeks your database was 'only' 2.4 GB (of 
which half a GB were indexes that aren't touched by vacuum anyway).


How about starting with an hourly vacuum with a max_fsm_pages that you 
calculate from the maximum amount of data you expect to be entered 
within an hour (with some added safety for pages not being completely 
emtpty etc.). Say you expect a maximum insert activity of 80 MB an hour, 
that is 10,000 complete pages, with a contingency for pages being only 
10% empty that means a max_fsm_pages of 100,000.
(Yet on the other hand, 100,000 can hold a theoretical 800 MB, so it 
seems absurd high compared to the total size of your database when it 
has just had a vacuum full.)

Also, since as you can see this is just some theoretical exercise (and I 
am not a developer, just somebody who reads mailinglists and hopes to 
someday convert some Oracle applications to PostgreSQL) and everybody is 
desperate for numbers, maybe you can post back the results of whatever 
you did in a few weeks?

Jochem