Re: scheduling autovacuum at lean hours only. - Mailing list pgsql-performance

From Chris Browne
Subject Re: scheduling autovacuum at lean hours only.
Date
Msg-id 87wsbu8cha.fsf@dba2.int.libertyrms.com
Whole thread Raw
In response to scheduling autovacuum at lean hours only.  (Rajesh Kumar Mallah <mallah.rajesh@gmail.com>)
Responses Re: scheduling autovacuum at lean hours only.  (Rajesh Kumar Mallah <mallah.rajesh@gmail.com>)
List pgsql-performance
mallah.rajesh@gmail.com (Rajesh Kumar Mallah) writes:
> why is it not a good idea to give end users control over when they
> want to run it ?

It's not a particularly good idea to give end users things that they
are likely then to *immediately* use to shoot themselves in the foot.

Turning off vacuuming "all day" is the sort of thing that is indeed
pretty certain to hurt you when you imagined it was going to help you.

In particular, if you shut off autovac all day, heavily updated tables
with certain sorts of (pretty common!) update patterns are certain to
"bloat up," to the point that you'll need to do CLUSTER/VACUUM FULL on
them.

In effect, the practical effect of "autovacuum at lean hours only" is
more reasonably described as "cancel autovacuum and revert to the
elder policy of requiring users to do manual vacuuming."

It's worth looking at how autovacuum has been evolving over time...

- When it was introduced, 8.0-ish (maybe it was 8.1 when it became
  "official"), it was pretty primitive.

  Autovac was a single process, where you had three controls over
  behaviour:

   - You could run autovac, or not;

   - You could exclude specific tables from being processed by autovac

   - There is a capability to vacuum less aggressively by using
     delays to reduce autovac I/O usage

- In 8.3, it was enhanced to add the notion of having multiple vacuum
  workers

  There was discussion about having one of those workers restrict
  itself to small tables, so that you'd never have the scenario where
  the workers were all busy and a small table that needed vacuuming
  was left unvacuumed for a long time.  It appears that didn't happen,
  which seems unfortunate, but that's life...

You should look at all the "knobs" that *are* offered before deciding
a policy that may be harmful to performance.  As things stand now,
there are a couple of ways I could see tuning this:

 - You might check on the GUC variables autovacuum_vacuum_cost_delay
   and autovacuum_vacuum_cost_limit, which would allow you to restrict
   the I/O cost.

   This might allow you to run autovacuum all the time without
   adversely affecting performance.

 - You might come up with a list of the *LARGE* tables that you don't
   want vacuumed during the day, and set up a cron job that adds/drops
   them from the pg_autovacuum table at the appropriate times.

   This is definitely going to be more finicky, and requires a great
   deal more awareness of the tables being updated by your
   applications.  It makes "autovacuum" a whole lot less "automatic."

There are known enhancements coming up:

 - In 8.4, there is a capability for VACUUM to only process the
   portions of the table known to have been altered.

   That ought to be a better answer than *any* of the fiddling
   suggested, to date.  Right now, a VACUUM on "public.my_huge_table",
   a table 18GB in size, will walk through the entire table, even
   though there were only a handful of pages where tuples were
   invalidated.

   This is almost certainly the single best improvement possible to
   resolve your issue; it seems likely to *directly* address the
   problem, and has the considerable merit of not requiring much if
   any configuration/reconfiguration/scheduling.
--
(reverse (concatenate 'string "gro.mca" "@" "enworbbc"))
http://linuxdatabases.info/info/
"what would  we do without C?  we   would have PASAL,  BASI, OBOL, and
Ommon Lisp." -- #Erik

pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: I/O increase after upgrading to 8.3.5
Next
From: Rajesh Kumar Mallah
Date:
Subject: Re: scheduling autovacuum at lean hours only.