Thread: VACUUM process running for a long time

VACUUM process running for a long time

From
Jan Krcmar
Date:
hi

i've got the database (about 300G) and it's still growing.

i am inserting new data (about 2G/day) into the database (there is
only one table there) and i'm also deleting about 2G/day (data older
than month).

the documentation says, one should run VACUUM if there are many
changes in the database, but the vacuumdb never finishes sooner than
the new data should be imported.

is there any technique that can solve this problem?

thanks
fous

Re: VACUUM process running for a long time

From
John R Pierce
Date:
Jan Krcmar wrote:
> hi
>
> i've got the database (about 300G) and it's still growing.
>
> i am inserting new data (about 2G/day) into the database (there is
> only one table there) and i'm also deleting about 2G/day (data older
> than month).
>
> the documentation says, one should run VACUUM if there are many
> changes in the database, but the vacuumdb never finishes sooner than
> the new data should be imported.
>
> is there any technique that can solve this problem?
>

your table is currently in a messy state, as its apparently not been
vacuumed (what version of postgres is this, anything since 8.1 should
have autovacuum running by default).    in theory your table has about
60GB of data in it, the fact that its 300GB indicates there's a lot of
'dead' tuples.

You might consider partitioning this table by date, either by day or by
week, and instead of deleting old rows, drop entire old partitions



Re: VACUUM process running for a long time

From
Jan Krcmar
Date:
2010/4/14 John R Pierce <pierce@hogranch.com>:
> Jan Krcmar wrote:
>>
>> hi
>>
>> i've got the database (about 300G) and it's still growing.
>>
>> i am inserting new data (about 2G/day) into the database (there is
>> only one table there) and i'm also deleting about 2G/day (data older
>> than month).
>>
>> the documentation says, one should run VACUUM if there are many
>> changes in the database, but the vacuumdb never finishes sooner than
>> the new data should be imported.
>>
>> is there any technique that can solve this problem?
>>
>
> your table is currently in a messy state, as its apparently not been
> vacuumed (what version of postgres is this, anything since 8.1 should have
i'm using postgresql-server-8.4.2-1PGDG.rhel5
autovacuum is running, but used space is always rising

> autovacuum running by default).    in theory your table has about 60GB of
> data in it, the fact that its 300GB indicates there's a lot of 'dead'
> tuples.
the database was dumper&recreated&restored about 2 weeks ago (this
removes allocated "empty" space, isn't it?). dump had about 250G

i agree that there should be some 'dead' tuples, but how should i
unallocate them?

> You might consider partitioning this table by date, either by day or by
> week, and instead of deleting old rows, drop entire old partitions
this is not really good workaround...

>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: VACUUM process running for a long time

From
Greg Smith
Date:
Jan Krcmar wrote:
>> You might consider partitioning this table by date, either by day or by
>> week, and instead of deleting old rows, drop entire old partitions
>>
> this is not really good workaround...
>

It is in fact the only good workaround for your problem, which you'll
eventually come to realize if you struggle with this class of problem
for long enough.  You can continue to fight with autovacuum forever, but
it's a battle you'll never quite win if you're deleting 2GB per day.
Even if you get vacuum running often enough to clean up the space,
you'll still have a constant struggle to keep your indexes working
efficiently.

Or you can partition by day or week and make the entire problem go
away.  Dropping an old partition requires no vacuum cleanup and leaves
behind no index issues.  It really is the right solution here if you
want to solve this problem once, rather than continuing to fight it a
little every single day forever.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


Re: VACUUM process running for a long time

From
Alan Hodgson
Date:
On Wednesday 14 April 2010, Jan Krcmar <honza801@gmail.com> wrote:
>
> > You might consider partitioning this table by date, either by day or by
> > week, and instead of deleting old rows, drop entire old partitions
>
> this is not really good workaround...

Actually it's a very good workaround, that a lot of people use for exactly
this purpose. It's a lot less disk I/O than delete+vacuum even when you're
not experiencing bloat.

Re: VACUUM process running for a long time

From
raghavendra t
Date:
Hi
 
>
> > You might consider partitioning this table by date, either by day or by
> > week, and instead of deleting old rows, drop entire old partitions
>
> this is not really good workaround...
 
As a First choice, This is a very good workaround for your present situation.
 
As a second choice, Setting the maintenance_work_mem will give a performance boost but we can only increase the memory upto 2GB. This parameter Sets the limit for the amount that autovacuum, manual vacuum, bulk index build and other maintenance routines are permitted to use. Setting it to a moderately high value will increase the efficiency of vacuum and other operations.
 
But i go with first choice..
 
Regards
Raghavendra
 
 
 
 
>
> > You might consider partitioning this table by date, either by day or by
> > week, and instead of deleting old rows, drop entire old partitions
>
> this is not really good workaround...

Actually it's a very good workaround, that a lot of people use for exactly
this purpose. It's a lot less disk I/O than delete+vacuum even when you're
not experiencing bloat.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: VACUUM process running for a long time

From
Adrian von Bidder
Date:
On Wednesday 14 April 2010 16.01:39 Jan Krcmar wrote:

> the documentation says, one should run VACUUM if there are many
> changes in the database, but the vacuumdb never finishes sooner than
> the new data should be imported.
>
> is there any technique that can solve this problem?

 -> vacuum can run concurrently to other stuff, so it's not necessary to
wait before it finishes.
 -> in most cases, autovacuum should do the Right Thing(tm) atomatically, so
you should not need to call vacuum manually.

This is with a recent pg version.  Do you use a (very) old version with
autovacuum?  Is your db server running hot and can't really keep up with
inserting data as soon as vacuum starts running?

Note that the pg documentation contains lots of useful information about
tuning autovacuum.  Without knowing how your table looks and how your data
entry happens (in peaks?  or always at about the same rate?) we probably
can't help you much more.

cheers
-- vbi



>
> thanks
> fous

--
featured product: PostgreSQL - http://postgresql.org

Attachment

Re: VACUUM process running for a long time

From
Jan Krcmar
Date:
hi

2010/4/14 Adrian von Bidder <avbidder@fortytwo.ch>:
>  -> vacuum can run concurrently to other stuff, so it's not necessary to
> wait before it finishes.
>  -> in most cases, autovacuum should do the Right Thing(tm) atomatically, so
> you should not need to call vacuum manually.
>
> This is with a recent pg version.  Do you use a (very) old version with
> autovacuum?  Is your db server running hot and can't really keep up with
> inserting data as soon as vacuum starts running?
>
> Note that the pg documentation contains lots of useful information about
> tuning autovacuum.  Without knowing how your table looks and how your data
> entry happens (in peaks?  or always at about the same rate?) we probably
> can't help you much more.
>
> cheers
> -- vbi
>

i'm doing one big insert per day, and one big delete per day

anyway, i've found, this article
http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html

could the partitioning be helpfull for this situation?
i guess, that it does the same as the others advised. inserts are
stored into specific table and obsolete table is deleted and allocated
space is freed. am i right?

f.

Re: VACUUM process running for a long time

From
Adrian von Bidder
Date:
On Thursday 15 April 2010 15.56:20 Jan Krcmar wrote:
> i'm doing one big insert per day, and one big delete per day
>
> anyway, i've found, this article
> http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html
>
> could the partitioning be helpfull for this situation?

Yes, I'm quite sure that partitioning will be a big help in this scenario.

cheers
-- vbi

--
this email is protected by a digital signature: http://fortytwo.ch/gpg

Attachment

Re: VACUUM process running for a long time

From
Scott Marlowe
Date:
On Wed, Apr 14, 2010 at 8:01 AM, Jan Krcmar <honza801@gmail.com> wrote:
> hi
>
> i've got the database (about 300G) and it's still growing.
>
> i am inserting new data (about 2G/day) into the database (there is
> only one table there) and i'm also deleting about 2G/day (data older
> than month).
>
> the documentation says, one should run VACUUM if there are many
> changes in the database, but the vacuumdb never finishes sooner than
> the new data should be imported.
>
> is there any technique that can solve this problem?

Are you running autovacuum?  Has it been tuned to be more aggresive
than the default.  I've got some large heavily updated dbs for which
I've had to turn down the autovacuum_vacuum_cost_delay to 2 or 5 ms to
get it to keep up.  But I've got a pretty good IO subsystem that can
handle the more aggresive autovacuum.

If you're doing one big insert and one big delete a day, then you
should be able to just kick off a regular vacuum at the end of the
delete, with low cost_delay and higher cost_limit that might keep up.
However, if you're on the edge on your IO subsystem then it isn't
gonna help much because it's gonna slow down the system too much.