Re: Vaccuum best practice: cronjob or autovaccuum? - Mailing list pgsql-general

From Joao Ferreira gmail
Subject Re: Vaccuum best practice: cronjob or autovaccuum?
Date
Msg-id 1219934843.9846.24.camel@jmf-ubuntu
Whole thread Raw
In response to Vaccuum best practice: cronjob or autovaccuum?  ("Phoenix Kiula" <phoenix.kiula@gmail.com>)
Responses Re: Vaccuum best practice: cronjob or autovaccuum?
Re: Vaccuum best practice: cronjob or autovaccuum?
List pgsql-general
On Thu, 2008-08-28 at 19:53 +0800, Phoenix Kiula wrote:
> On our database of about 5GB we vaccuum all of our 12 tables (only one
> is huge, all others have about 100,000 rows or so) every hour or so.

if you refer to manual VACUUM or VACUUM FULL every hour is probably too
much. You should aim your vacuum full for about 1ce per week.

>
> But we also have autovaccuum enabled. Is this okay? Do the two vaccuum
> processes contradict each other, or add unnecessary load to the
> system?

read the manuals in www.postgresql.org

specifically read this seciton:

http://www.postgresql.org/docs/8.3/static/maintenance.html

you'll find that once in a while (start at once/week and build up or
down from there) you can/should:

- vacuum full
- reindex your tables
- reindex your indexes

>
> The reason we introduced the cronjob we felt was that the autovaccuum
> was not really doing its job.

how did you realise that ? turn off the cron job, wait a few days. In
the meanwhile monitor your disk space ocupation (du
-sh /var/lib?/pgsql....????/base/)

if you see that size growing and the total row count (select count(*)
from whatever) isn't gorwing you need external vacuums

>  I wonder if anyone can share some
> insight on whether these settings are good for a DB that is basically
> 24x7:

like someone sayd: it's not the 24x7. it's the: how many tuples get
DELETEd or UPDTATEd (for the case of autovacuum) in one day, for
example.

If you find that your db updates/deletes many tuples per hour

>
> autovacuum                   = on
> autovacuum_vacuum_cost_delay = 20
> vacuum_cost_delay            = 20
> autovacuum_naptime           = 10
> stats_start_collector        = on
> stats_row_level              = on
> autovacuum_vacuum_threshold  = 75
> autovacuum_analyze_threshold = 25


> autovacuum_analyze_scale_factor  = 0.02
> autovacuum_vacuum_scale_factor   = 0.01

these two can be tricky: if your database is very big, then 2% of 'very
big' is 'a lot of changes' before autovacuum even tries to vacuum.

read the documentation about these two.

you might want to consider using scale factors of 0 and increase just a
bit both thresholds; p. ex

autovacuum_vacuum_threshold  = 20000
autovacuum_analyze_threshold = 10000

Autovacuum is something that you adjust to your needs;

Another tip: edit your postgresql.conf and bring the debug levels to:

log_min_messages = debug3
log_min_error_statement = debug3

then just follow the logfile (location depends on
distro; /var/log/postgresql;  /var/pgsql/data/; whatever)

IMPORTANT: don't leave the debug3 forever: it is very verbose and _will_
eat up your disc in no time;

you might want to do something like this:

tail -f logfile | grep vac

or

tail -f logfile | grep threshold


take your time :) autovacuum requires a bit of dedication but, in the
end it works fine (preventig disc space growth)

note also that VACUUM has nothing to do with REINDEXing and REINDEXing
also frees a considerable amount of disk space in certain cases.

very important: read the docs:

http://www.postgresql.org/docs/8.3/static/maintenance.html

In my case I have autovaccum with scale factors 0 and naptime 600; also
a cron job for vacuum full and reindex everything once a week (during
the night). its working fine on a db with about 2 Giga and average 10000
deletes a day and well above 200000 INSERTs/UPDATEs per day.

cheers
joao

PS: I'm saying all this because I'm just going through this process
myself and I think I'm on the right track. things are starting to work
the way I want them too :)

>
> Thanks! I've read up some websites for this kind of tuning, but often
> the info is for older versions of DBs and some of the tweaks have
> changed since. I am on 8.2.9.
>


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Feature Request: additional extension to UPDATE
Next
From: "John T. Dow"
Date:
Subject: WAL file questions - how to relocate on Windows, how to replay after total loss, etc