Re: autovacuum suggestions for 500,000,000+ row tables? - Mailing list pgsql-performance

From Josh Berkus
Subject Re: autovacuum suggestions for 500,000,000+ row tables?
Date
Msg-id 200506201046.42062.josh@agliodbs.com
Whole thread Raw
In response to autovacuum suggestions for 500,000,000+ row tables?  (Alex Stapleton <alexs@advfn.com>)
Responses Re: autovacuum suggestions for 500,000,000+ row tables?
List pgsql-performance
Alex,

> Hi, i'm trying to optimise our autovacuum configuration so that it
> vacuums / analyzes some of our larger tables better. It has been set
> to the default settings for quite some time. We never delete
> anything  (well not often, and not much) from the tables, so I am not
> so worried about the VACUUM status, but I am wary of XID wraparound
> nuking us at some point if we don't sort vacuuming out so we VACUUM
> at least once every year ;)

I personally don't use autovaccuum on very large databases.   For DW,
vacuuming is far better tied to ETL operations or a clock schedule of
downtime.

XID wraparound may be further away than you think.   Try checking
pg_controldata, which will give you the current XID, and you can calculate
how long you are away from wraparound.  I just tested a 200G data warehouse
and figured out that we are 800 months away from wraparound, despite hourly
ETL.

> However not running ANALYZE for such huge
> periods of time is probably impacting the statistics accuracy
> somewhat, and I have seen some unusually slow queries at times.
> Anyway, does anyone think we might benefit from a more aggressive
> autovacuum configuration?

Hmmm, good point, you could use autovacuum for ANALYZE only.  Just set the
VACUUM settings preposterously high (like 10x) so it never runs.   Then it'll
run ANALYZE only.   I generally threshold 200, multiple 0.1x for analyze;
that is, re-analyze after 200+10% of rows have changed.

--
Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

Previous
From: John Arbash Meinel
Date:
Subject: Re: autovacuum suggestions for 500,000,000+ row tables?
Next
From: Dan Harris
Date:
Subject: investigating slow queries through pg_stat_activity