Thread: Vaccuum best practice: cronjob or autovaccuum?

Vaccuum best practice: cronjob or autovaccuum?

From

"Phoenix Kiula"

Date:

28 August 2008, 08:53:26

On our database of about 5GB we vaccuum all of our 12 tables (only one
is huge, all others have about 100,000 rows or so) every hour or so.

But we also have autovaccuum enabled. Is this okay? Do the two vaccuum
processes contradict each other, or add unnecessary load to the
system?

The reason we introduced the cronjob we felt was that the autovaccuum
was not really doing its job. I wonder if anyone can share some
insight on whether these settings are good for a DB that is basically
24x7:

autovacuum                   = on
autovacuum_vacuum_cost_delay = 20
vacuum_cost_delay            = 20
autovacuum_naptime           = 10
stats_start_collector        = on
stats_row_level              = on
autovacuum_vacuum_threshold  = 75
autovacuum_analyze_threshold = 25
autovacuum_analyze_scale_factor  = 0.02
autovacuum_vacuum_scale_factor   = 0.01

Thanks! I've read up some websites for this kind of tuning, but often
the info is for older versions of DBs and some of the tweaks have
changed since. I am on 8.2.9.

Re: Vaccuum best practice: cronjob or autovaccuum?

From

Andrew Sullivan

Date:

28 August 2008, 10:22:09

On Thu, Aug 28, 2008 at 07:53:17PM +0800, Phoenix Kiula wrote:
> On our database of about 5GB we vaccuum all of our 12 tables (only one
> is huge, all others have about 100,000 rows or so) every hour or so.
>
> But we also have autovaccuum enabled. Is this okay? Do the two vaccuum
> processes contradict each other, or add unnecessary load to the
> system?

They do the latter.  Especially vacuuming every hour, you're foiling
the autovacuum and just introducing completely unnecessary load.

> The reason we introduced the cronjob we felt was that the autovaccuum
> was not really doing its job.

What gave you this feeling?

What's the "churn rate" on these tables (i.e. how fast do dead rows
get created?)

My suspicion is that, using the 8.2 autovacuum, your "huge" table
needs to be vacuumed manually, and the others just left to
autovacuum.   But that's just a guess at the moment.

A
--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: Vaccuum best practice: cronjob or autovaccuum?

From

Joao Ferreira gmail

Date:

28 August 2008, 11:47:44

On Thu, 2008-08-28 at 19:53 +0800, Phoenix Kiula wrote:
> On our database of about 5GB we vaccuum all of our 12 tables (only one
> is huge, all others have about 100,000 rows or so) every hour or so.

if you refer to manual VACUUM or VACUUM FULL every hour is probably too
much. You should aim your vacuum full for about 1ce per week.

>
> But we also have autovaccuum enabled. Is this okay? Do the two vaccuum
> processes contradict each other, or add unnecessary load to the
> system?

read the manuals in www.postgresql.org

specifically read this seciton:

http://www.postgresql.org/docs/8.3/static/maintenance.html

you'll find that once in a while (start at once/week and build up or
down from there) you can/should:

- vacuum full
- reindex your tables
- reindex your indexes

>
> The reason we introduced the cronjob we felt was that the autovaccuum
> was not really doing its job.

how did you realise that ? turn off the cron job, wait a few days. In
the meanwhile monitor your disk space ocupation (du
-sh /var/lib?/pgsql....????/base/)

if you see that size growing and the total row count (select count(*)
from whatever) isn't gorwing you need external vacuums

>  I wonder if anyone can share some
> insight on whether these settings are good for a DB that is basically
> 24x7:

like someone sayd: it's not the 24x7. it's the: how many tuples get
DELETEd or UPDTATEd (for the case of autovacuum) in one day, for
example.

If you find that your db updates/deletes many tuples per hour

>
> autovacuum                   = on
> autovacuum_vacuum_cost_delay = 20
> vacuum_cost_delay            = 20
> autovacuum_naptime           = 10
> stats_start_collector        = on
> stats_row_level              = on
> autovacuum_vacuum_threshold  = 75
> autovacuum_analyze_threshold = 25

> autovacuum_analyze_scale_factor  = 0.02
> autovacuum_vacuum_scale_factor   = 0.01

these two can be tricky: if your database is very big, then 2% of 'very
big' is 'a lot of changes' before autovacuum even tries to vacuum.

read the documentation about these two.

you might want to consider using scale factors of 0 and increase just a
bit both thresholds; p. ex

autovacuum_vacuum_threshold  = 20000
autovacuum_analyze_threshold = 10000

Autovacuum is something that you adjust to your needs;

Another tip: edit your postgresql.conf and bring the debug levels to:

log_min_messages = debug3
log_min_error_statement = debug3

then just follow the logfile (location depends on
distro; /var/log/postgresql;  /var/pgsql/data/; whatever)

IMPORTANT: don't leave the debug3 forever: it is very verbose and _will_
eat up your disc in no time;

you might want to do something like this:

tail -f logfile | grep vac

or

tail -f logfile | grep threshold

take your time :) autovacuum requires a bit of dedication but, in the
end it works fine (preventig disc space growth)

note also that VACUUM has nothing to do with REINDEXing and REINDEXing
also frees a considerable amount of disk space in certain cases.

very important: read the docs:

http://www.postgresql.org/docs/8.3/static/maintenance.html

In my case I have autovaccum with scale factors 0 and naptime 600; also
a cron job for vacuum full and reindex everything once a week (during
the night). its working fine on a db with about 2 Giga and average 10000
deletes a day and well above 200000 INSERTs/UPDATEs per day.

cheers
joao

PS: I'm saying all this because I'm just going through this process
myself and I think I'm on the right track. things are starting to work
the way I want them too :)

>
> Thanks! I've read up some websites for this kind of tuning, but often
> the info is for older versions of DBs and some of the tweaks have
> changed since. I am on 8.2.9.
>

Re: Vaccuum best practice: cronjob or autovaccuum?

From

"Jaime Casanova"

Date:

28 August 2008, 12:17:20

On Thu, Aug 28, 2008 at 9:47 AM, Joao Ferreira gmail
<joao.miguel.c.ferreira@gmail.com> wrote:
>
> http://www.postgresql.org/docs/8.3/static/maintenance.html
>
> you'll find that once in a while (start at once/week and build up or
> down from there) you can/should:
>
> - vacuum full
> - reindex your tables
> - reindex your indexes
>

Actually the manuals doesn't recomend VACUUM FULL for routine
maintenance anymore
"""
For those not using autovacuum, one approach is to schedule a
database-wide VACUUM once a day during low-usage period, supplemented
by more frequent vacuuming of heavily-updated tables if necessary.
(Some installations with extremely high update rates vacuum their
busiest tables as often as once every few minutes.) If you have
multiple databases in a cluster, don't forget to VACUUM each one; the
program vacuumdb might be helpful.

VACUUM FULL is recommended for cases where you know you have deleted
the majority of rows in a table, so that the steady-state size of the
table can be shrunk substantially with VACUUM FULL's more aggressive
approach. Use plain VACUUM, not VACUUM FULL, for routine vacuuming for
space recovery.
"""

--
regards,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. (593) 87171157

Re: Vaccuum best practice: cronjob or autovaccuum?

From

Bill Moran

Date:

28 August 2008, 12:28:00

In response to Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com>:
>
> On Thu, 2008-08-28 at 19:53 +0800, Phoenix Kiula wrote:
> > On our database of about 5GB we vaccuum all of our 12 tables (only one
> > is huge, all others have about 100,000 rows or so) every hour or so.
>
> if you refer to manual VACUUM or VACUUM FULL every hour is probably too
> much. You should aim your vacuum full for about 1ce per week.

This statement is ambiguous.  Advice about VACUUM is _very_ different
than advice about VACUUM FULL.

Running a normal VACUUM once an hour may be exactly what you need, but
you didn't give enough evidence one way or the other.  Look into the
various system catalog tables to see how much bloat your tables and
indexes have and track that over time to see if autovac and/or your
cron vacuum is keeping things in check.

> > But we also have autovaccuum enabled. Is this okay? Do the two vaccuum
> > processes contradict each other, or add unnecessary load to the
> > system?

If you're cron jobs are vacuuming enough, then autovac will simply
take a few CPU cycles to realize that nothing needs to be done.

> you'll find that once in a while (start at once/week and build up or
> down from there) you can/should:
>
> - vacuum full
> - reindex your tables
> - reindex your indexes

Whoah there.  How on earth did you derive that from those documents?
That's a pretty lousy maintenance plan with lots of unneeded overhead,
unless you're database has a very edge-case access pattern.

VACUUM FULL really only needs done in extreme cases where massive data
bloat is experienced, and not expected to happen again.  If massive data
bloat is routine, you're probably better off letting ordinary VACUUM
maintain a consistent level of free space on tables.

REINDEXING seems nice at times, but I've yet to see any evidence that
it's necessary.  There are probably some corner cases, but I've not seen
them documented.  I tried to document what I thought was a corner case
once, and was unable to come up with anything conclusive.

I don't even know what you mean by "reindex your indexes"

> > The reason we introduced the cronjob we felt was that the autovaccuum
> > was not really doing its job.
>
> how did you realise that ? turn off the cron job, wait a few days. In
> the meanwhile monitor your disk space ocupation (du
> -sh /var/lib?/pgsql....????/base/)

I agree with most of this, although it'll probably be easier to look into
PostgreSQL system tables to watch the actual table sizes:
select relname, relpages from pg_class
 where relkind='i' and relname not like 'pg_%' order by relname;

You may find that autovacuum handles everything well except for one or
two tables, in which case you can optimize your cron job to vacuum just
those tables.

> >  I wonder if anyone can share some
> > insight on whether these settings are good for a DB that is basically
> > 24x7:
>
> like someone sayd: it's not the 24x7. it's the: how many tuples get
> DELETEd or UPDTATEd (for the case of autovacuum) in one day, for
> example.

The 24/7 aspect _is_ part of the equation.  VACUUM FULL takes out an
exclusive lock while doing it's work, which is not appropriate for 24/7
operation.  Routine VACUUM FULL is _never_ appropriate for a 24/7
operation (added to the fact that it's seldom appropriate at all).

[snip lots of information that is good and I have nothing to add to]

> In my case I have autovaccum with scale factors 0 and naptime 600; also
> a cron job for vacuum full and reindex everything once a week (during
> the night). its working fine on a db with about 2 Giga and average 10000
> deletes a day and well above 200000 INSERTs/UPDATEs per day.

You're implying that the DB has down time at night, by this comment, which
means you have different capabilities on what you can do than the original
poster, who does _not_ want to run VACUUM FULL at night if he wants to
maintain 24/7 operation.

That being said, I still doubt that VACUUM FULL is necessary.  Have you
_tried_ running with normal vacuum alone?  Have you benchmarked the system
to demonstrate that VACUUM FULL is really helping anything?

That being said, if you have a big outage window every week where you can
afford to run whatever you want without interfering with other users, feel
free to continue as you are.  It won't cause you any problems.

The naptime at 600 is probably a bad idea.  If you only have one user
database on this system, then it only gets investigated by autovac once
every 40 minutes (template0 ... template1 ... postgres ... yourdb)
Consider that autovac uses very little resources when it determines that
it has no work to do.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Vaccuum best practice: cronjob or autovaccuum?

From

Alvaro Herrera

Date:

28 August 2008, 12:50:54

For the record:

Bill Moran escribió:

> The naptime at 600 is probably a bad idea.  If you only have one user
> database on this system, then it only gets investigated by autovac once
> every 40 minutes (template0 ... template1 ... postgres ... yourdb)
> Consider that autovac uses very little resources when it determines that
> it has no work to do.

Note that on 8.3, the meaning of naptime has changed -- it would be
effectively "check each database once every 10 minutes" if set at 600.
This was changed precisely because the previous semantics were difficult
to explain/use.

The OP is using 8.2 though so it doesn't apply here.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Vaccuum best practice: cronjob or autovaccuum?

From

Bill Moran

Date:

28 August 2008, 13:05:11

In response to Alvaro Herrera <alvherre@commandprompt.com>:

> For the record:
>
> Bill Moran escribió:
>
> > The naptime at 600 is probably a bad idea.  If you only have one user
> > database on this system, then it only gets investigated by autovac once
> > every 40 minutes (template0 ... template1 ... postgres ... yourdb)
> > Consider that autovac uses very little resources when it determines that
> > it has no work to do.
>
> Note that on 8.3, the meaning of naptime has changed -- it would be
> effectively "check each database once every 10 minutes" if set at 600.
> This was changed precisely because the previous semantics were difficult
> to explain/use.

Ooo ... that's an exciting "gotcha"!

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Vaccuum best practice: cronjob or autovaccuum?

From

Joao Ferreira gmail

Date:

28 August 2008, 13:15:15

Hello,

I'dd like to apologise about my short knowledge of VACUUM FULL and
REINDEX.

I'm just stating what I do in my case. I don not know if it is a corner
case or not.

I've been dealing with this specific application which is very demanding
for Postgres for about 2 years.

When autovacuum was introduced, I kept the weekly VACUUM FULL because it
efectively brings disk ocupatio down, dispite it grows back after a few
hours. It's just re-assuring to me to make sure that at least one of the
vacuums it's working when I see the weekly drop of disk ocupation.

On Thu, 2008-08-28 at 11:27 -0400, Bill Moran wrote:
> In response to Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com>:
> >
> > On Thu, 2008-08-28 at 19:53 +0800, Phoenix Kiula wrote:
> > > On our database of about 5GB we vaccuum all of our 12 tables (only one
> > > is huge, all others have about 100,000 rows or so) every hour or so.
> >
> > if you refer to manual VACUUM or VACUUM FULL every hour is probably too
> > much. You should aim your vacuum full for about 1ce per week.
>
> This statement is ambiguous.  Advice about VACUUM is _very_ different
> than advice about VACUUM FULL.
>
> Running a normal VACUUM once an hour may be exactly what you need, but
> you didn't give enough evidence one way or the other.  Look into the
> various system catalog tables to see how much bloat your tables and
> indexes have and track that over time to see if autovac and/or your
> cron vacuum is keeping things in check.
>
> > > But we also have autovaccuum enabled. Is this okay? Do the two vaccuum
> > > processes contradict each other, or add unnecessary load to the
> > > system?
>
> If you're cron jobs are vacuuming enough, then autovac will simply
> take a few CPU cycles to realize that nothing needs to be done.
>
> > you'll find that once in a while (start at once/week and build up or
> > down from there) you can/should:
> >
> > - vacuum full
> > - reindex your tables
> > - reindex your indexes
> Whoah there.  How on earth did you derive that from those documents?

I did not say I did. I just advised the original poster to read the
docs. I'm not trying to say 'this is the way'. I'm trynig to say 'read
the docs; the docs are good'.

but, BTW,

quoting:

http://www.postgresql.org/docs/8.3/static/routine-reindex.html

---------------------------------------------
The potential for bloat in non-B-tree indexes has not been well
characterized.
----------------------------------------------------

-------------------------------------------------
It is a good idea to keep an eye on the index's physical size when using
any non-B-tree index type.
----------------------------------------------------

----------------------------------------------------
Also, for B-tree indexes a freshly-constructed index is somewhat faster
to access than one that has been updated many times, because logically
adjacent pages are usually also physically adjacent in a newly built
index. (This consideration does not currently apply to non-B-tree
indexes.) It might be worthwhile to reindex periodically just to improve
access speed.
----------------------------------------------------------

I did some tests (lets say about 50 INSERT/UPDATES per second and somw
thousande DELETEs once in a while).

Query execution time DROPs 10 times after REINDEXING and VACUUM

I've seen 600Megas DROP in disk size ocupation just by reindexing;
additionally about 500 Megas drop by VACCUMING; this was on a 100Megas
of usefull data.

> That's a pretty lousy maintenance plan with lots of unneeded overhead,
> unless you're database has a very edge-case access pattern.

!? Many people mention these 'edeg-case access pattern', and 'corner
cases'. I don't now if mine is such. My appliucation executes patterns
of varying INSERTING and UPDATING but it must overcome stress tests with
about 100 UPDATES mixed with INSERTS per second. Additionaly it DELETEs
up to 10000 records once every night.

It's a 'near-realtime' log store which stores log information for 'you
name it' logs in Linux based systems. It also removes old data once a
night.
>
> VACUUM FULL really only needs done in extreme cases where massive data
> bloat is experienced, and not expected to happen again.  If massive data
> bloat is routine, you're probably better off letting ordinary VACUUM
> maintain a consistent level of free space on tables.

well. I've seen massive data and index bloating in my application. In
some cases REINDEXING and VACUUM FULL ing was the only way to clear up
things.

but you're probably right. I'll give it a try. thx

>
> REINDEXING seems nice at times, but I've yet to see any evidence that
> it's necessary.  There are probably some corner cases, but I've not seen
> them documented.

neither did I. REINDEXING helps me keep query execution time low.
Othewise it will increase to levels I cannot have. We use 'modest'
processors and 'modest' storage, in hioghly dependable embedded systems.

>  I tried to document what I thought was a corner case
> once, and was unable to come up with anything conclusive.
>
> I don't even know what you mean by "reindex your indexes"

I mean this:

http://www.postgresql.org/docs/8.3/static/sql-reindex.html

this operation, on my table and my indexes, is helping me effectiuvely
keep query execution times down.

>
> > > The reason we introduced the cronjob we felt was that the autovaccuum
> > > was not really doing its job.
> >
> > how did you realise that ? turn off the cron job, wait a few days. In
> > the meanwhile monitor your disk space ocupation (du
> > -sh /var/lib?/pgsql....????/base/)
>
> I agree with most of this, although it'll probably be easier to look into
> PostgreSQL system tables to watch the actual table sizes:
> select relname, relpages from pg_class
>  where relkind='i' and relname not like 'pg_%' order by relname;
>
> You may find that autovacuum handles everything well except for one or
> two tables, in which case you can optimize your cron job to vacuum just
> those tables.
>
> > >  I wonder if anyone can share some
> > > insight on whether these settings are good for a DB that is basically
> > > 24x7:
> >
> > like someone sayd: it's not the 24x7. it's the: how many tuples get
> > DELETEd or UPDTATEd (for the case of autovacuum) in one day, for
> > example.
>
> The 24/7 aspect _is_ part of the equation.

yes. I agree. sorry for the confusion. I did not mean that 24x7 was not
important. I meant that it is not releveant to the efectiveness of any
of the vacuum types.

And yes I agree that VACUM FULL uses locking which will bring the system
unusable for a certain amount of time.

Re: Vaccuum best practice: cronjob or autovaccuum?

From

Andrew Sullivan

Date:

28 August 2008, 13:27:55

On Thu, Aug 28, 2008 at 05:15:06PM +0100, Joao Ferreira gmail wrote:

> When autovacuum was introduced, I kept the weekly VACUUM FULL because it
> efectively brings disk ocupatio down, dispite it grows back after a few
> hours. It's just re-assuring to me to make sure that at least one of the
> vacuums it's working when I see the weekly drop of disk ocupation.

It actually bloats your index and hurts you.  Don't do that.  If the
number of tuples in your various tables is staying about the same,
your size on disk should stabilise.  That's the thing you want to see.

A

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: Vaccuum best practice: cronjob or autovaccuum?

From

Bill Moran

Date:

28 August 2008, 13:48:56

In response to Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com>:
>
> When autovacuum was introduced, I kept the weekly VACUUM FULL because it
> effectively brings disk occupation down, despite it grows back after a few
> hours. It's just re-assuring to me to make sure that at least one of the
> vacuums it's working when I see the weekly drop of disk occupation.

Note the "it grows back after a few hours" is unneeded load on your
system.  You just spent a lot of IO shrinking a file that PG is now going
to spend a lot of IO re-enlarging.  Thus you made PG work harder for a
while after the VACUUM FULL in order to get the table back to a stable
size.

> quoting:
>
> http://www.postgresql.org/docs/8.3/static/routine-reindex.html
>
> ---------------------------------------------
> The potential for bloat in non-B-tree indexes has not been well
> characterized.
> ----------------------------------------------------
>
> -------------------------------------------------
> It is a good idea to keep an eye on the index's physical size when using
> any non-B-tree index type.
> ----------------------------------------------------

Both special cases.  B-tree's are default.

> ----------------------------------------------------
> Also, for B-tree indexes a freshly-constructed index is somewhat faster
> to access than one that has been updated many times, because logically
> adjacent pages are usually also physically adjacent in a newly built
> index. (This consideration does not currently apply to non-B-tree
> indexes.) It might be worthwhile to reindex periodically just to improve
> access speed.
> ----------------------------------------------------------
>
> I did some tests (lets say about 50 INSERT/UPDATES per second and somw
> thousande DELETEs once in a while).
>
> Query execution time DROPs 10 times after REINDEXING and VACUUM

This would be interesting to this conversation if you _only_ did the
REINDEX.  The addition of the VACUUM makes it ambiguous as to which
command actually made the improvement.

> I've seen 600Megas DROP in disk size ocupation just by reindexing;

Is that consistent, or was it an extraordinary case?  It's pretty amazing
to imagine 600M of index bloat on a 100M database ... how many indexes do
you have?  Are you sure you don't have duplicate indexes or some other
issue?

> additionally about 500 Megas drop by VACCUMING; this was on a 100Megas
> of usefull data.

It's probably because you're only vacuuming once a week.

> > That's a pretty lousy maintenance plan with lots of unneeded overhead,
> > unless you're database has a very edge-case access pattern.
>
> !? Many people mention these 'edeg-case access pattern', and 'corner
> cases'. I don't now if mine is such. My appliucation executes patterns
> of varying INSERTING and UPDATING but it must overcome stress tests with
> about 100 UPDATES mixed with INSERTS per second. Additionaly it DELETEs
> up to 10000 records once every night.
>
> It's a 'near-realtime' log store which stores log information for 'you
> name it' logs in Linux based systems. It also removes old data once a
> night.

That is a fairly edge-case use that's actually documented in the docs you
reference to be likely to cause abnormal amounts of bloat.  However, it's
just one index on one table (unless you have a bunch of indexes on that
table that are all continually increasing?)  Unless there's something I'm
missing, it hardly justifies reindexing the entire database.

> > VACUUM FULL really only needs done in extreme cases where massive data
> > bloat is experienced, and not expected to happen again.  If massive data
> > bloat is routine, you're probably better off letting ordinary VACUUM
> > maintain a consistent level of free space on tables.
>
> well. I've seen massive data and index bloating in my application. In
> some cases REINDEXING and VACUUM FULL ing was the only way to clear up
> things.

It's possible that the stable size for you tables has more free space than
actual data.  While that seems crazy, it's really no different than having
a file server with 4x as much disk space as people normally are using.

> > REINDEXING seems nice at times, but I've yet to see any evidence that
> > it's necessary.  There are probably some corner cases, but I've not seen
> > them documented.
>
> neither did I. REINDEXING helps me keep query execution time low.
> Othewise it will increase to levels I cannot have. We use 'modest'
> processors and 'modest' storage, in highly dependable embedded systems.

Oh ... well this might change a lot.  Depending on what you mean by
"embedded", you may be using experience that's _very_ specialized to
make generalizations from.

The RAM/CPU/storage/etc constraints on anything I would call "embedded"
are different than just about anyone else would be dealing with for
tuning a DB server.  If you're looking to tune PG to run well on an
embedded platform, my advice would be to throw out everything you've
heard and test extensively for yourself -- which sounds like what
you've done.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023