Re: scheduling autovacuum at lean hours only. - Mailing list pgsql-performance

From Rajesh Kumar Mallah
Subject Re: scheduling autovacuum at lean hours only.
Date
Msg-id a97c77030902140358o3e723ea4h14ccdcff747d0da5@mail.gmail.com
Whole thread Raw
In response to Re: scheduling autovacuum at lean hours only.  (Chris Browne <cbbrowne@acm.org>)
List pgsql-performance
Its nice to know the evolution of autovacuum and i  understand that
the suggestion/requirement  of "autovacuum at lean hours only"
was defeating the whole idea.

regds
--rajesh kumar mallah.

On Fri, Feb 13, 2009 at 11:07 PM, Chris Browne <cbbrowne@acm.org> wrote:
> mallah.rajesh@gmail.com (Rajesh Kumar Mallah) writes:
>> why is it not a good idea to give end users control over when they
>> want to run it ?
>
> It's not a particularly good idea to give end users things that they
> are likely then to *immediately* use to shoot themselves in the foot.
>
> Turning off vacuuming "all day" is the sort of thing that is indeed
> pretty certain to hurt you when you imagined it was going to help you.
>
> In particular, if you shut off autovac all day, heavily updated tables
> with certain sorts of (pretty common!) update patterns are certain to
> "bloat up," to the point that you'll need to do CLUSTER/VACUUM FULL on
> them.
>
> In effect, the practical effect of "autovacuum at lean hours only" is
> more reasonably described as "cancel autovacuum and revert to the
> elder policy of requiring users to do manual vacuuming."
>
> It's worth looking at how autovacuum has been evolving over time...
>
> - When it was introduced, 8.0-ish (maybe it was 8.1 when it became
>  "official"), it was pretty primitive.
>
>  Autovac was a single process, where you had three controls over
>  behaviour:
>
>   - You could run autovac, or not;
>
>   - You could exclude specific tables from being processed by autovac
>
>   - There is a capability to vacuum less aggressively by using
>     delays to reduce autovac I/O usage
>
> - In 8.3, it was enhanced to add the notion of having multiple vacuum
>  workers
>
>  There was discussion about having one of those workers restrict
>  itself to small tables, so that you'd never have the scenario where
>  the workers were all busy and a small table that needed vacuuming
>  was left unvacuumed for a long time.  It appears that didn't happen,
>  which seems unfortunate, but that's life...
>
> You should look at all the "knobs" that *are* offered before deciding
> a policy that may be harmful to performance.  As things stand now,
> there are a couple of ways I could see tuning this:
>
>  - You might check on the GUC variables autovacuum_vacuum_cost_delay
>   and autovacuum_vacuum_cost_limit, which would allow you to restrict
>   the I/O cost.
>
>   This might allow you to run autovacuum all the time without
>   adversely affecting performance.
>
>  - You might come up with a list of the *LARGE* tables that you don't
>   want vacuumed during the day, and set up a cron job that adds/drops
>   them from the pg_autovacuum table at the appropriate times.
>
>   This is definitely going to be more finicky, and requires a great
>   deal more awareness of the tables being updated by your
>   applications.  It makes "autovacuum" a whole lot less "automatic."
>
> There are known enhancements coming up:
>
>  - In 8.4, there is a capability for VACUUM to only process the
>   portions of the table known to have been altered.
>
>   That ought to be a better answer than *any* of the fiddling
>   suggested, to date.  Right now, a VACUUM on "public.my_huge_table",
>   a table 18GB in size, will walk through the entire table, even
>   though there were only a handful of pages where tuples were
>   invalidated.
>
>   This is almost certainly the single best improvement possible to
>   resolve your issue; it seems likely to *directly* address the
>   problem, and has the considerable merit of not requiring much if
>   any configuration/reconfiguration/scheduling.
> --
> (reverse (concatenate 'string "gro.mca" "@" "enworbbc"))
> http://linuxdatabases.info/info/
> "what would  we do without C?  we   would have PASAL,  BASI, OBOL, and
> Ommon Lisp." -- #Erik
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

pgsql-performance by date:

Previous
From: Chris Browne
Date:
Subject: Re: scheduling autovacuum at lean hours only.
Next
From: "Peter G."
Date:
Subject: Retrieving data from PostgreSQL to .NET application – performance test – surprising results