Thread: VACUUM process running for a long time
hi i've got the database (about 300G) and it's still growing. i am inserting new data (about 2G/day) into the database (there is only one table there) and i'm also deleting about 2G/day (data older than month). the documentation says, one should run VACUUM if there are many changes in the database, but the vacuumdb never finishes sooner than the new data should be imported. is there any technique that can solve this problem? thanks fous
Jan Krcmar wrote: > hi > > i've got the database (about 300G) and it's still growing. > > i am inserting new data (about 2G/day) into the database (there is > only one table there) and i'm also deleting about 2G/day (data older > than month). > > the documentation says, one should run VACUUM if there are many > changes in the database, but the vacuumdb never finishes sooner than > the new data should be imported. > > is there any technique that can solve this problem? > your table is currently in a messy state, as its apparently not been vacuumed (what version of postgres is this, anything since 8.1 should have autovacuum running by default). in theory your table has about 60GB of data in it, the fact that its 300GB indicates there's a lot of 'dead' tuples. You might consider partitioning this table by date, either by day or by week, and instead of deleting old rows, drop entire old partitions
2010/4/14 John R Pierce <pierce@hogranch.com>: > Jan Krcmar wrote: >> >> hi >> >> i've got the database (about 300G) and it's still growing. >> >> i am inserting new data (about 2G/day) into the database (there is >> only one table there) and i'm also deleting about 2G/day (data older >> than month). >> >> the documentation says, one should run VACUUM if there are many >> changes in the database, but the vacuumdb never finishes sooner than >> the new data should be imported. >> >> is there any technique that can solve this problem? >> > > your table is currently in a messy state, as its apparently not been > vacuumed (what version of postgres is this, anything since 8.1 should have i'm using postgresql-server-8.4.2-1PGDG.rhel5 autovacuum is running, but used space is always rising > autovacuum running by default). in theory your table has about 60GB of > data in it, the fact that its 300GB indicates there's a lot of 'dead' > tuples. the database was dumper&recreated&restored about 2 weeks ago (this removes allocated "empty" space, isn't it?). dump had about 250G i agree that there should be some 'dead' tuples, but how should i unallocate them? > You might consider partitioning this table by date, either by day or by > week, and instead of deleting old rows, drop entire old partitions this is not really good workaround... > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Jan Krcmar wrote: >> You might consider partitioning this table by date, either by day or by >> week, and instead of deleting old rows, drop entire old partitions >> > this is not really good workaround... > It is in fact the only good workaround for your problem, which you'll eventually come to realize if you struggle with this class of problem for long enough. You can continue to fight with autovacuum forever, but it's a battle you'll never quite win if you're deleting 2GB per day. Even if you get vacuum running often enough to clean up the space, you'll still have a constant struggle to keep your indexes working efficiently. Or you can partition by day or week and make the entire problem go away. Dropping an old partition requires no vacuum cleanup and leaves behind no index issues. It really is the right solution here if you want to solve this problem once, rather than continuing to fight it a little every single day forever. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Wednesday 14 April 2010, Jan Krcmar <honza801@gmail.com> wrote: > > > You might consider partitioning this table by date, either by day or by > > week, and instead of deleting old rows, drop entire old partitions > > this is not really good workaround... Actually it's a very good workaround, that a lot of people use for exactly this purpose. It's a lot less disk I/O than delete+vacuum even when you're not experiencing bloat.
Hi
>
> > You might consider partitioning this table by date, either by day or by
> > week, and instead of deleting old rows, drop entire old partitions
>
> this is not really good workaround...
As a First choice, This is a very good workaround for your present situation.
As a second choice, Setting the maintenance_work_mem will give a performance boost but we can only increase the memory upto 2GB. This parameter Sets the limit for the amount that autovacuum, manual vacuum, bulk index build and other maintenance routines are permitted to use. Setting it to a moderately high value will increase the efficiency of vacuum and other operations.
But i go with first choice..
Regards
Raghavendra
>Actually it's a very good workaround, that a lot of people use for exactly
> > You might consider partitioning this table by date, either by day or by
> > week, and instead of deleting old rows, drop entire old partitions
>
> this is not really good workaround...
this purpose. It's a lot less disk I/O than delete+vacuum even when you're
not experiencing bloat.
--Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Wednesday 14 April 2010 16.01:39 Jan Krcmar wrote: > the documentation says, one should run VACUUM if there are many > changes in the database, but the vacuumdb never finishes sooner than > the new data should be imported. > > is there any technique that can solve this problem? -> vacuum can run concurrently to other stuff, so it's not necessary to wait before it finishes. -> in most cases, autovacuum should do the Right Thing(tm) atomatically, so you should not need to call vacuum manually. This is with a recent pg version. Do you use a (very) old version with autovacuum? Is your db server running hot and can't really keep up with inserting data as soon as vacuum starts running? Note that the pg documentation contains lots of useful information about tuning autovacuum. Without knowing how your table looks and how your data entry happens (in peaks? or always at about the same rate?) we probably can't help you much more. cheers -- vbi > > thanks > fous -- featured product: PostgreSQL - http://postgresql.org
Attachment
hi 2010/4/14 Adrian von Bidder <avbidder@fortytwo.ch>: > -> vacuum can run concurrently to other stuff, so it's not necessary to > wait before it finishes. > -> in most cases, autovacuum should do the Right Thing(tm) atomatically, so > you should not need to call vacuum manually. > > This is with a recent pg version. Do you use a (very) old version with > autovacuum? Is your db server running hot and can't really keep up with > inserting data as soon as vacuum starts running? > > Note that the pg documentation contains lots of useful information about > tuning autovacuum. Without knowing how your table looks and how your data > entry happens (in peaks? or always at about the same rate?) we probably > can't help you much more. > > cheers > -- vbi > i'm doing one big insert per day, and one big delete per day anyway, i've found, this article http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html could the partitioning be helpfull for this situation? i guess, that it does the same as the others advised. inserts are stored into specific table and obsolete table is deleted and allocated space is freed. am i right? f.
On Thursday 15 April 2010 15.56:20 Jan Krcmar wrote: > i'm doing one big insert per day, and one big delete per day > > anyway, i've found, this article > http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html > > could the partitioning be helpfull for this situation? Yes, I'm quite sure that partitioning will be a big help in this scenario. cheers -- vbi -- this email is protected by a digital signature: http://fortytwo.ch/gpg
Attachment
On Wed, Apr 14, 2010 at 8:01 AM, Jan Krcmar <honza801@gmail.com> wrote: > hi > > i've got the database (about 300G) and it's still growing. > > i am inserting new data (about 2G/day) into the database (there is > only one table there) and i'm also deleting about 2G/day (data older > than month). > > the documentation says, one should run VACUUM if there are many > changes in the database, but the vacuumdb never finishes sooner than > the new data should be imported. > > is there any technique that can solve this problem? Are you running autovacuum? Has it been tuned to be more aggresive than the default. I've got some large heavily updated dbs for which I've had to turn down the autovacuum_vacuum_cost_delay to 2 or 5 ms to get it to keep up. But I've got a pretty good IO subsystem that can handle the more aggresive autovacuum. If you're doing one big insert and one big delete a day, then you should be able to just kick off a regular vacuum at the end of the delete, with low cost_delay and higher cost_limit that might keep up. However, if you're on the edge on your IO subsystem then it isn't gonna help much because it's gonna slow down the system too much.