Re: tuning autovacuum - Mailing list pgsql-hackers

From Greg Smith
Subject Re: tuning autovacuum
Date
Msg-id 4DF0EC72.5000003@2ndquadrant.com
Whole thread Raw
In response to Re: tuning autovacuum  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: tuning autovacuum
List pgsql-hackers
Robert Haas wrote:
> Well, if there are more tables that need vacuuming than there are
> workers available at any given time, there will be a delay.  We
> probably don't keep track of that delay at present, but we could.
>   

There are at least four interesting numbers to collect each time 
autovacuum runs:

1) This one, when was the threshold crossed.  I believe one of the AV 
workers would have to pause periodically to update these if they're all 
busy doing work.
2) What time did the last autovacuum start at
3) How many dead rows were there at the point when it started
4) When did the last autovacuum end (currently the only value stored)

There may be a 5th piece of state I haven't looked at yet worth 
exposing/saving, something related to how much work was skipped by the 
partial vacuum logic introduced in 8.4.  I haven't looked at that code 
enough to know which is the right metric to measure its effectiveness 
by, but I have tis gut feel it's eventually going to be critical for 
distinguishing between the various common types of vacuum-heavy 
workloads that show up.

All of these need to be stored in a system table/view, so that an admin 
can run a query to answer questions like:

-What is AV doing right now?
-How far behind is AV on tables it needs to clean but hasn't even 
started on?
-How long is the average AV taking on my big tables?
-As I change the AV parameters, what does it do to the runtimes against 
my big tables?

As someone who is found by a lot of people whose problems revolve around 
databases with heavy writes or update churn, limitations in the current 
state of tracking what autovacuum does have been moving way up my 
priority list the last year.  I now have someone who is always running 
autovacuum on the same table, 24x7.  It finishes every two days, and 
when it does the 20% threshold is already crossed for it to start 
again.  The "wait until a worker was available" problem isn't there, but 
I need a good wasy to track all of the other three things to have a hope 
of improving their situation.  Right now getting the data I could use 
takes parsing log file output and periodic dumps of pg_stat_user_tables, 
then stitching the whole mess together.

You can't run a heavily updated database in the TB+ range and make sense 
of what autovacuum is doing without a large effort matching output from 
log_autovacuum_min_duration and the stats that are visible in 
pg_stat_user_tables.  It must get easier than that to support the sort 
of bigger tables it's possible to build now.  And if this data starts 
getting tracked, we can start to move toward AV parameters that are 
actually aiming at real-world units, too.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: Core Extensions relocation
Next
From: Tom Lane
Date:
Subject: Re: Core Extensions relocation