Re: Vacuums on large busy databases - Mailing list pgsql-performance

From Jeff Davis
Subject Re: Vacuums on large busy databases
Date
Msg-id 1158280502.29889.183.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: Vacuums on large busy databases  (Francisco Reyes <lists@stringsutils.com>)
List pgsql-performance
On Thu, 2006-09-14 at 19:30 -0400, Francisco Reyes wrote:
> Will have to talk to the developers. In particular for every insert there
> are updates. I know they have at least one table that gets udpated to have
> summarized totals.
>

If the table being updated is small, you have no problems at all. VACUUM
that table frequently, and the big tables rarely. If the big tables are
only INSERTs and SELECTs, the only reason to VACUUM is to avoid the xid
wraparound. See:

<http://www.postgresql.org/docs/8.1/static/maintenance.html>

See which tables need VACUUM, and how often. Use the statistics to see
if VACUUMing will gain you anything before you do it.

> One of the reasons I was doing the vacuumdb of the entire DB was to get the
> number of shared-buffers. Now that I have an idea of how much I need I will
> likely do something along the lines of what you suggest. One full for
> everything at night and during the days perhaps do the tables that get more
> updated. I also set more aggresive values on autovacuum so that should help
> some too.

Why VACUUM FULL? That is generally not needed. Re-evaluate whether
you're gaining things with all these VACUUMs.

> > You can run ANALYZE more frequently on all the
> > tables, because it does not have to read the entire table and doesn't
> > interfere with the rest of the operations.
>
> On a related question. Right now I have my autovacuums set as:
> autovacuum_vacuum_threshold = 50000
> autovacuum_analyze_threshold = 100000
> autovacuum_vacuum_scale_factor = 0.05
> autovacuum_analyze_scale_factor = 0.1
>
> Based on what you described above then I could set my analyze values to the
> same as the vacuum to have something like
> autovacuum_vacuum_threshold = 50000
> autovacuum_analyze_threshold = 50000
> autovacuum_vacuum_scale_factor = 0.05
> autovacuum_analyze_scale_factor = 0.05
>
> For DBs with hundreds of GBs would it be better to get
> autovacuum_analyze_scale_factor to even 0.01? The permanent DB is over 200GB
> and growing.. the 100GB ones are staging.. By the time we have finished
> migrating all the data from the old system it will be at least 300GB. 0.01
> is still 3GB.. pretty sizable.

Just test how long an ANALYZE takes, and compare that to how quickly
your statistics get out of date. As long as postgres is choosing correct
plans, you are ANALYZE-ing often enough.

ANALYZE takes statistical samples to avoid reading the whole table, so
it's really not a major influence on performance in my experience.

> Do the thresholds tabke presedence over the scale factors? Is it basically
> if either one of them gets hit that the action will take place?

u = number of tuples UPDATE-ed or DELETE-ed (i.e. dead tuples)
r = the (estimated) number of total live tuples in the relation

In a loop, autovacuum checks to see if u >
(r*autovacuum_vacuum_scale_factor + autovacuum_vacuum_threshold), and if
so, it runs VACUUM. If not, it sleeps. It works the same way for
ANALYZE.

So, in a large table, the scale_factor is the dominant term. In a small
table, the threshold is the dominant term. But both are taken into
account.

Regards,
    Jeff Davis


pgsql-performance by date:

Previous
From: "Steinar H. Gunderson"
Date:
Subject: Re: sql-bench
Next
From: Jeff Davis
Date:
Subject: Re: Vacuums on large busy databases