Thread: VACUUM not doing its job?
Hi, I have a PostgreSQL 7.2.1 database which normally (just after a pg_restore) takes about 700-800MB of disk space. Now, the problem is that the database grows quite quickly when in use, although we don't put very much data in. Granted, there is quite a few records deleted and inserted, but the total data volume grows only slowly. Three week ago we did a pg_dump/pg_restore, and after pgsql/data/ took about 750MB. Now it uses 2,4GB of disk space! If we do a new complete dump and restore the volume will decrease to about 800MB. We of course do a 'vacuumdb -a -z' every day, but this does not seem to help much unfortunately. The database is in use 24/7 so a full vacuum is not an option. What we do now is simply a full dump/restore about once a month, because the database slows to a crawl as the data volume grows too large (this seems to be because it loads large amouts of data from disk for each query, probably because the data postgre use no longer fit in the disk cache). Anyway, what could be causing this problem and what can we do about it? The dump/restore option is not attractive in the long run for obvious reasons. Regards, Kristian
I've run into this myself. Tom lane helped me out. In my case, it was the fact that indexes don't release the space of indexes of deleted rows. So, if you have a table that has a lot of inserts/deletes, your indexes will grow incredibly fast. The way to see what your biggest items are: select * from pg_class order by relpages desc; If your tables are active with lots of inserts/deletes, the biggest things will likely be indexes. The only way that I know to recover this space is to drop the indexes and recreate them. Vacuum didn't touch them for me. -Ken On Sat, 2002-08-03 at 21:53, Kristian Eide wrote: > Three week ago we did a pg_dump/pg_restore, and after pgsql/data/ took about > 750MB. Now it uses 2,4GB of disk space! If we do a new complete dump and > restore the volume will decrease to about 800MB.
> In my case, it was the fact that indexes don't release the space of > indexes of deleted rows. So, if you have a table that has a lot of > inserts/deletes, your indexes will grow incredibly fast. > > The way to see what your biggest items are: > select * from pg_class order by relpages desc; Yes, I already suspected this could be at least part of the reason, and your SQL query confirms it. However, dropping and re-creating my biggest indexes only reclaims about 500MB, this still leaves about 1GB unaccounted for and I can't see how my remaining (small) indexes can be responsible for this (btw: do you know how much diskspace one 'relpage' use?). Given that I have lots of deletes/inserts, is there anything besides the indexes which could use this much space? > The only way that I know to recover this space is to drop the indexes > and recreate them. Vacuum didn't touch them for me. This is not good as the database is in use 24/7, and without the indexes everything comes to a screeching halt. This means I probably will have to stop the programs using the database for the time it takes to re-create the indexes; this is better than having to dump/restore everything however :) Are there any plans to also vacuum the indexes in a future version of Postgre (otherwise an excellent piece of software!) ? Regards, Kristian
Kristian Eide wrote: > This is not good as the database is in use 24/7, and without the indexes > everything comes to a screeching halt. This means I probably will have to > stop the programs using the database for the time it takes to re-create the > indexes; this is better than having to dump/restore everything however :) Try REINDEX. I don't think that precludes (at least) read access. http://www.postgresql.org/idocs/index.php?sql-reindex.html You should also search the archives for threads on free space maps. You most likely need to increase yours. In particular, see: http://archives.postgresql.org/pgsql-general/2002-07/msg00972.php HTH, Joe
> You should also search the archives for threads on free space maps. You > most likely need to increase yours. In particular, see: > http://archives.postgresql.org/pgsql-general/2002-07/msg00972.php Thanks, very helpful, although there does not seem to be much description of what the two free space map options in postgresql.conf actually do. Doing a VACUUM ANALYZE VERBOSE on my largest table gives me: NOTICE: Pages 43681: Changed 208, Empty 0; Tup 1921387: Vac 61669, Keep 0, UnUsed 1362341. I will try to increase the 'max_fsm_pages' option from 10000 to 500000 and see if that helps. Regards, Kristian
Kristian Eide wrote: > Thanks, very helpful, although there does not seem to be much description of > what the two free space map options in postgresql.conf actually do. Doing a > VACUUM ANALYZE VERBOSE on my largest table gives me: > > NOTICE: Pages 43681: Changed 208, Empty 0; Tup 1921387: Vac 61669, Keep 0, > UnUsed 1362341. > > I will try to increase the 'max_fsm_pages' option from 10000 to 500000 and > see if that helps. > Note that you'll need to do a vacuum full *first* to recover the lost space, since the free space map is populated as the tuples are actually freed, I believe. After that you can adjust 'max_fsm_pages' and your vacuum frequency to achieve an equilibrium. Joe
On Sat, 2002-08-03 at 19:39, Kristian Eide wrote: > > You should also search the archives for threads on free space maps. You > > most likely need to increase yours. In particular, see: > > http://archives.postgresql.org/pgsql-general/2002-07/msg00972.php > > Thanks, very helpful, although there does not seem to be much description of > what the two free space map options in postgresql.conf actually do. Doing a > VACUUM ANALYZE VERBOSE on my largest table gives me: > > NOTICE: Pages 43681: Changed 208, Empty 0; Tup 1921387: Vac 61669, Keep 0, > UnUsed 1362341. > > I will try to increase the 'max_fsm_pages' option from 10000 to 500000 and > see if that helps. > > > Regards, > > Kristian > Note the high size of your unused value, this is a sure sign that your not vacuuming as often as you are filling up your free space map. Remember that for every insert/update/deletion pg will create an unused tuple in the db. While the f_s_m attempts to keep track of these, once it runs out of space, those unused tuples have little chance of being recovered. I'm trying to piece together some documentation on this, but for now I'd recommend increasing the frequency of your vacuuming, which should have very little performance impact and keep overall database size smaller than if you increase the f_s_m. Remember also that if you have a smaller subset of tables that generate most of your "tuple turnover" that you can run vacuum on these tables specifically rather than your whole db. hth, Robert Treat
> This is not good as the database is in use 24/7, and without the indexes > everything comes to a screeching halt. This means I probably will have to > stop the programs using the database for the time it takes to re-create the > indexes; this is better than having to dump/restore everything however :) You can try using REINDEX instead of dropping and creating the index. It takes an exclusive lock on the table (instead of a write-lock during CREATE INDEX) so your application will have to wait during reindex, but maybe it is faster... When you use the reindexdb script in the contrib dir you can even put it in a crontab. Sander
Kristian Eide wrote: >>You should also search the archives for threads on free space maps. You >>most likely need to increase yours. In particular, see: >> http://archives.postgresql.org/pgsql-general/2002-07/msg00972.php > > > Thanks, very helpful, although there does not seem to be much description of > what the two free space map options in postgresql.conf actually do. Doing a > VACUUM ANALYZE VERBOSE on my largest table gives me: > > NOTICE: Pages 43681: Changed 208, Empty 0; Tup 1921387: Vac 61669, Keep 0, > UnUsed 1362341. > > I will try to increase the 'max_fsm_pages' option from 10000 to 500000 and > see if that helps. The default could indeed be a bit low for some installations. With 10,000 pages in the FSM and a page being 8 KB this is only 80 MB at most (presuming pages are completely free). So whenever you update more than 80 MB between vacuums, you are sure to have a disk space leak, and even if you update less but the pages are not completely empty you will leak diskspace. So the first thing I would do is start with running vacuum more often. Since it is just a vaccum and not a vacuum full it does not lock the tables anyway, and the key appears to be whether the amount of updates between vacuums exceeds max_fsm_pages * blocksize * free space per page. Also, maybe 500000 is a bit over the top. 500,000 pages of 8 KB each make 4 GB, and even after 3 weeks your database was 'only' 2.4 GB (of which half a GB were indexes that aren't touched by vacuum anyway). How about starting with an hourly vacuum with a max_fsm_pages that you calculate from the maximum amount of data you expect to be entered within an hour (with some added safety for pages not being completely emtpty etc.). Say you expect a maximum insert activity of 80 MB an hour, that is 10,000 complete pages, with a contingency for pages being only 10% empty that means a max_fsm_pages of 100,000. (Yet on the other hand, 100,000 can hold a theoretical 800 MB, so it seems absurd high compared to the total size of your database when it has just had a vacuum full.) Also, since as you can see this is just some theoretical exercise (and I am not a developer, just somebody who reads mailinglists and hopes to someday convert some Oracle applications to PostgreSQL) and everybody is desperate for numbers, maybe you can post back the results of whatever you did in a few weeks? Jochem