Home > mailing lists

Re: Vacuums on large busy databases - Mailing list pgsql-performance

From	Jeff Davis
Subject	Re: Vacuums on large busy databases
Date	September 14, 2006 21:35:26
Msg-id	1158280502.29889.183.camel@dogma.v10.wvs Whole thread Raw
In response to	Re: Vacuums on large busy databases (Francisco Reyes <lists@stringsutils.com>)
List	pgsql-performance

Tree view

On Thu, 2006-09-14 at 19:30 -0400, Francisco Reyes wrote:
> Will have to talk to the developers. In particular for every insert there
> are updates. I know they have at least one table that gets udpated to have
> summarized totals.
>

If the table being updated is small, you have no problems at all. VACUUM
that table frequently, and the big tables rarely. If the big tables are
only INSERTs and SELECTs, the only reason to VACUUM is to avoid the xid
wraparound. See:

<http://www.postgresql.org/docs/8.1/static/maintenance.html>

See which tables need VACUUM, and how often. Use the statistics to see
if VACUUMing will gain you anything before you do it.

> One of the reasons I was doing the vacuumdb of the entire DB was to get the
> number of shared-buffers. Now that I have an idea of how much I need I will
> likely do something along the lines of what you suggest. One full for
> everything at night and during the days perhaps do the tables that get more
> updated. I also set more aggresive values on autovacuum so that should help
> some too.

Why VACUUM FULL? That is generally not needed. Re-evaluate whether
you're gaining things with all these VACUUMs.

> > You can run ANALYZE more frequently on all the
> > tables, because it does not have to read the entire table and doesn't
> > interfere with the rest of the operations.
>
> On a related question. Right now I have my autovacuums set as:
> autovacuum_vacuum_threshold = 50000
> autovacuum_analyze_threshold = 100000
> autovacuum_vacuum_scale_factor = 0.05
> autovacuum_analyze_scale_factor = 0.1
>
> Based on what you described above then I could set my analyze values to the
> same as the vacuum to have something like
> autovacuum_vacuum_threshold = 50000
> autovacuum_analyze_threshold = 50000
> autovacuum_vacuum_scale_factor = 0.05
> autovacuum_analyze_scale_factor = 0.05
>
> For DBs with hundreds of GBs would it be better to get
> autovacuum_analyze_scale_factor to even 0.01? The permanent DB is over 200GB
> and growing.. the 100GB ones are staging.. By the time we have finished
> migrating all the data from the old system it will be at least 300GB. 0.01
> is still 3GB.. pretty sizable.

Just test how long an ANALYZE takes, and compare that to how quickly
your statistics get out of date. As long as postgres is choosing correct
plans, you are ANALYZE-ing often enough.

ANALYZE takes statistical samples to avoid reading the whole table, so
it's really not a major influence on performance in my experience.

> Do the thresholds tabke presedence over the scale factors? Is it basically
> if either one of them gets hit that the action will take place?

u = number of tuples UPDATE-ed or DELETE-ed (i.e. dead tuples)
r = the (estimated) number of total live tuples in the relation

In a loop, autovacuum checks to see if u >
(r*autovacuum_vacuum_scale_factor + autovacuum_vacuum_threshold), and if
so, it runs VACUUM. If not, it sleeps. It works the same way for
ANALYZE.

So, in a large table, the scale_factor is the dominant term. In a small
table, the threshold is the dominant term. But both are taken into
account.

Regards,
    Jeff Davis

pgsql-performance by date:

From: "Steinar H. Gunderson"
Date: 14 September 2006, 21:32:52
Subject: Re: sql-bench

From: Jeff Davis
Date: 14 September 2006, 21:52:22
Subject: Re: Vacuums on large busy databases

Re: Vacuums on large busy databases - Mailing list pgsql-performance

Previous

Next