Thread: scheduling autovacuum at lean hours only.

scheduling autovacuum at lean hours only.

From
Rajesh Kumar Mallah
Date:
Hi,

Is it possible to configure autovacuum to run only
during certain hours ? We are forced to keep
it off because it pops up during the peak
query hours.

Regds
rajesh kumar mallah.

Re: scheduling autovacuum at lean hours only.

From
Glyn Astill
Date:
> From: Rajesh Kumar Mallah <mallah.rajesh@gmail.com>

> Is it possible to configure autovacuum to run only
> during certain hours ? We are forced to keep
> it off because it pops up during the peak
> query hours.

AFAIK not directly within the conf.

However you could probably set up a shell script to turn it on and off as and when required via cron; just change the
settingto off in the conf file and reload. 




Re: scheduling autovacuum at lean hours only.

From
Guillaume Cottenceau
Date:
Rajesh Kumar Mallah <mallah.rajesh 'at' gmail.com> writes:

> Hi,
>
> Is it possible to configure autovacuum to run only
> during certain hours ? We are forced to keep
> it off because it pops up during the peak
> query hours.

You'd rather configure the delaying process to not alter too much
performance. Autovacuum is really not targeted at running once a
day - I think it is partly because the old vacuuming was too weak
(because too seldom in many cases) that autovaccum was added.

A delaying configuration that works nicely for us without
impacting performance much (tested at the time of 8.2 to cause a
+40% response time during autovacuuming, compared to +300% with
more default values):

vacuum_cost_delay = 150
vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20
vacuum_cost_limit = 1000
autovacuum_vacuum_cost_delay = 300

(Disclaimer: IIRC, Alvaro doesn't like these figures at all)

Of course, these are good for us (bloat is very, very low and
performance impact is not experienced in production), not
necessarily for you. You should conduct your own tests.

Be sure to also consider http://developer.postgresql.org/~wieck/vacuum_cost/

--
Guillaume Cottenceau

Re: scheduling autovacuum at lean hours only.

From
Rajesh Kumar Mallah
Date:
On Wed, Feb 11, 2009 at 7:11 PM, Guillaume Cottenceau <gc@mnc.ch> wrote:
> Rajesh Kumar Mallah <mallah.rajesh 'at' gmail.com> writes:
>
>> Hi,
>>
>> Is it possible to configure autovacuum to run only
>> during certain hours ? We are forced to keep
>> it off because it pops up during the peak
>> query hours.
>
> You'd rather configure the delaying process to not alter too much
> performance. Autovacuum is really not targeted at running once a
> day - I think it is partly because the old vacuuming was too weak
> (because too seldom in many cases) that autovaccum was added.
>
> A delaying configuration that works nicely for us without
> impacting performance much (tested at the time of 8.2 to cause a
> +40% response time during autovacuuming, compared to +300% with
> more default values):
>
> vacuum_cost_delay = 150
> vacuum_cost_page_hit = 1
> vacuum_cost_page_miss = 10
> vacuum_cost_page_dirty = 20
> vacuum_cost_limit = 1000
> autovacuum_vacuum_cost_delay = 300

why is it not a good idea to give end users control over when they
want to run it ?


>
> (Disclaimer: IIRC, Alvaro doesn't like these figures at all)
>
> Of course, these are good for us (bloat is very, very low and
> performance impact is not experienced in production), not
> necessarily for you. You should conduct your own tests.
>
> Be sure to also consider http://developer.postgresql.org/~wieck/vacuum_cost/
>
> --
> Guillaume Cottenceau
>

Re: scheduling autovacuum at lean hours only.

From
Alvaro Herrera
Date:
Rajesh Kumar Mallah escribió:

> why is it not a good idea to give end users control over when they
> want to run it ?

It has never been said that we don't want to give the users control.
It's a matter of writing the code.  If you want to propose a patch to
add the feature, feel free.

Re: scheduling autovacuum at lean hours only.

From
Grzegorz Jaśkiewicz
Date:
On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah
<mallah.rajesh@gmail.com> wrote:

>> vacuum_cost_delay = 150
>> vacuum_cost_page_hit = 1
>> vacuum_cost_page_miss = 10
>> vacuum_cost_page_dirty = 20
>> vacuum_cost_limit = 1000
>> autovacuum_vacuum_cost_delay = 300
>
> why is it not a good idea to give end users control over when they
> want to run it ?

Effectively, you have control over autovacuum via these params.
You have to remember, that autovacuum doesn't cost much, and it makes
planner know more about data.
It's not there to clean up databases, as you might imagine - it is
there to update stats, and mark pages as free.

So make sure you tweak that config fist, because I have a funny
feeling that you just think that vacuuming bogs down your machine, and
_can_ be turned off without any bad consequences, which is simply not
true.


--
GJ

Re: scheduling autovacuum at lean hours only.

From
Ben
Date:
On Feb 11, 2009, at 6:57 AM, Rajesh Kumar Mallah wrote:

> why is it not a good idea to give end users control over when they
> want to run it ?

There's nothing stopping you from just turning off autovacuum and
running vacuum manually. The point of autovacuum is to vacuum "as
needed."


Re: scheduling autovacuum at lean hours only.

From
Scott Carey
Date:
On a large partitioned database, ordinary vacuum is a very very difficult option.

Most of the time on such a system, most tables are dormant with respect to writes and never need to be vacuumed.  A
'dailyvacuum' is likely to take a full day to run on larger systems. Since ordinary vacuum can't be run on subsets of
tableswithout explicitly naming them one at a time (can't just vacuum a schema, tablespace, or use a regex to match
tablenames), good luck using it effectively if you have a few thousand tables in partitions.  You'll have to have
applicationcode or a script with knowledge of all the partition names and which are in need of an analyze/vacuum. 

Autovacuum is good enough in recent versions to be tuned to have very low impact though.  If you have to, rather than
stopand start it, just turn the delay or cost settings up and down during different parts of the day.  More than likely
however,it will be able to keep up with a single set of settings. 
In particular, rather than making the delay longer, make the appropriate cost larger -- page miss or page dirty affect
howmuch I/O it will do, and page hit will mostly affect how much CPU it uses. 

Perhaps a feature request is to have a version of the manual vacuum command that doesn't bother running on tables that
autovacuumwouldn't touch due to insufficient data churn.  This way, at lean hours one can manually vacuum to help an
autovacuumthat was tuned for very low impact 'catch up'. 
Also, if there was some way to make vacuum not stink so badly on tables that were just loaded with pg_load, where it
causeshuge write activity for tables that clearly have no bloat (I believe this is a hint bits thing?). 
________________________________________
From: pgsql-performance-owner@postgresql.org [pgsql-performance-owner@postgresql.org] On Behalf Of Ben
[bench@silentmedia.com]
Sent: Wednesday, February 11, 2009 8:46 AM
To: Rajesh Kumar Mallah
Cc: PostgreSQL Performance
Subject: Re: [PERFORM] scheduling autovacuum at lean hours only.

On Feb 11, 2009, at 6:57 AM, Rajesh Kumar Mallah wrote:

> why is it not a good idea to give end users control over when they
> want to run it ?

There's nothing stopping you from just turning off autovacuum and
running vacuum manually. The point of autovacuum is to vacuum "as
needed."


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: scheduling autovacuum at lean hours only.

From
Rajesh Kumar Mallah
Date:
On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:
> On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah
> <mallah.rajesh@gmail.com> wrote:
>
>>> vacuum_cost_delay = 150
>>> vacuum_cost_page_hit = 1
>>> vacuum_cost_page_miss = 10
>>> vacuum_cost_page_dirty = 20
>>> vacuum_cost_limit = 1000
>>> autovacuum_vacuum_cost_delay = 300
>>
>> why is it not a good idea to give end users control over when they
>> want to run it ?
>
> Effectively, you have control over autovacuum via these params.
> You have to remember, that autovacuum doesn't cost much, and it makes
> planner know more about data.
> It's not there to clean up databases, as you might imagine - it is
> there to update stats, and mark pages as free.
>
> So make sure you tweak that config fist, because I have a funny
> feeling that you just think that vacuuming bogs down your machine, and
> _can_ be turned off without any bad consequences, which is simply not
> true.

our usage pattern is such that peak activity (indicated by load average)
during day time is 10 times during night hours. Autovacuum just puts
more pressure to the system. If less stressing version is used then
it shall take longer to complete one cycle,  which would mean  less
performance for longer time . Less performance queues up queries
and encourages people to re submit their queries which again
adds to bogging up the system.

In our case i feel the hardware is bit underscaled as compared to
load thats why i think running in lean hours is best of both worlds
no performance sacrifices and intelligent vacuuming.

regds
-- mallah.


>
>
> --
> GJ
>

Re: scheduling autovacuum at lean hours only.

From
Ben Chobot
Date:
On Wed, 11 Feb 2009, Scott Carey wrote:

> On a large partitioned database, ordinary vacuum is a very very difficult option.
>
> Most of the time on such a system, most tables are dormant with respect to writes and never need to be vacuumed.  A
'dailyvacuum' is likely to take a full day to run on larger systems. Since ordinary vacuum can't be run on subsets of
tableswithout explicitly naming them one at a time (can't just vacuum a schema, tablespace, or use a regex to match
tablenames), good luck using it effectively if you have a few thousand tables in partitions.  You'll have to have
applicationcode or a script with knowledge of all the partition names and which are in need of an analyze/vacuum. 
>
> Autovacuum is good enough in recent versions to be tuned to have very low impact though.  If you have to, rather than
stopand start it, just turn the delay or cost settings up and down during different parts of the day.  More than likely
however,it will be able to keep up with a single set of settings. 
> In particular, rather than making the delay longer, make the appropriate cost larger -- page miss or page dirty
affecthow much I/O it will do, and page hit will mostly affect how much CPU it uses. 
>
> Perhaps a feature request is to have a version of the manual vacuum command that doesn't bother running on tables
thatautovacuum wouldn't touch due to insufficient data churn.  This way, at lean hours one can manually vacuum to help
anautovacuum that was tuned for very low impact 'catch up'. 
> Also, if there was some way to make vacuum not stink so badly on tables that were just loaded with pg_load, where it
causeshuge write activity for tables that clearly have no bloat (I believe this is a hint bits thing?). 

Oh, I agree with everything you say. I'm just pointing out that if you
really do want control over when things get vacuumed (e.g. you have a
mostly-read database 20 hours a day and then 4 hours of heavy churn at
night) then you can still do that if you want.

Re: scheduling autovacuum at lean hours only.

From
Brad Nicholson
Date:
On Wed, 2009-02-11 at 22:57 +0530, Rajesh Kumar Mallah wrote:
> On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:
> > On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah
> > <mallah.rajesh@gmail.com> wrote:
> >
> >>> vacuum_cost_delay = 150
> >>> vacuum_cost_page_hit = 1
> >>> vacuum_cost_page_miss = 10
> >>> vacuum_cost_page_dirty = 20
> >>> vacuum_cost_limit = 1000
> >>> autovacuum_vacuum_cost_delay = 300
> >>
> >> why is it not a good idea to give end users control over when they
> >> want to run it ?
> >
> > Effectively, you have control over autovacuum via these params.
> > You have to remember, that autovacuum doesn't cost much, and it makes
> > planner know more about data.
> > It's not there to clean up databases, as you might imagine - it is
> > there to update stats, and mark pages as free.
> >
> > So make sure you tweak that config fist, because I have a funny
> > feeling that you just think that vacuuming bogs down your machine, and
> > _can_ be turned off without any bad consequences, which is simply not
> > true.
>
> our usage pattern is such that peak activity (indicated by load average)
> during day time is 10 times during night hours. Autovacuum just puts
> more pressure to the system. If less stressing version is used then
> it shall take longer to complete one cycle,  which would mean  less
> performance for longer time . Less performance queues up queries
> and encourages people to re submit their queries which again
> adds to bogging up the system.

That's not exactly how it works in practise, if tuned properly.  It may
take longer, but it is less intensive while running.

We had one system that had spikes happening due to the exact case you
described - there were noticeably high IO wait times while certain
tables were being vacuumed.  We set the cost delay and the wait times
dropped to the point where it was non-issue.  Vacuums take twice as
long, but there is no measurable impact to the performance.

> In our case i feel the hardware is bit underscaled as compared to
> load thats why i think running in lean hours is best of both worlds
> no performance sacrifices and intelligent vacuuming.

That is a different issue altogether.

Not vacuuming a running system at all during peak hours is not
considered intelligent vacuuming IMHO.  There are plenty of use cases
where small, frequent vacuums keep tables under control at a very low
cost.  Letting them go for extended periods of time without vacuuming
causes bloat and eventual slowdowns to table access which manifest in
higher IO usage across the board.

If you really are dead set on vacuuming only at night, you may want to
do a careful analysis of which tables need to be vacuumed and when, and
trigger manual vacuums from cron.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: scheduling autovacuum at lean hours only.

From
Rajesh Kumar Mallah
Date:
On Wed, Feb 11, 2009 at 11:30 PM, Brad Nicholson
<bnichols@ca.afilias.info> wrote:
> On Wed, 2009-02-11 at 22:57 +0530, Rajesh Kumar Mallah wrote:
>> On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:
>> > On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah
>> > <mallah.rajesh@gmail.com> wrote:
>> >
>> >>> vacuum_cost_delay = 150
>> >>> vacuum_cost_page_hit = 1
>> >>> vacuum_cost_page_miss = 10
>> >>> vacuum_cost_page_dirty = 20
>> >>> vacuum_cost_limit = 1000
>> >>> autovacuum_vacuum_cost_delay = 300
>> >>
>> >> why is it not a good idea to give end users control over when they
>> >> want to run it ?
>> >
>> > Effectively, you have control over autovacuum via these params.
>> > You have to remember, that autovacuum doesn't cost much, and it makes
>> > planner know more about data.
>> > It's not there to clean up databases, as you might imagine - it is
>> > there to update stats, and mark pages as free.
>> >
>> > So make sure you tweak that config fist, because I have a funny
>> > feeling that you just think that vacuuming bogs down your machine, and
>> > _can_ be turned off without any bad consequences, which is simply not
>> > true.
>>
>> our usage pattern is such that peak activity (indicated by load average)
>> during day time is 10 times during night hours. Autovacuum just puts
>> more pressure to the system. If less stressing version is used then
>> it shall take longer to complete one cycle,  which would mean  less
>> performance for longer time . Less performance queues up queries
>> and encourages people to re submit their queries which again
>> adds to bogging up the system.
>
> That's not exactly how it works in practise, if tuned properly.  It may
> take longer, but it is less intensive while running.
>
> We had one system that had spikes happening due to the exact case you
> described - there were noticeably high IO wait times while certain
> tables were being vacuumed.  We set the cost delay and the wait times
> dropped to the point where it was non-issue.


I think i can take this route and monitor the io activity
during vacuums. thanks everyone for their suggestions.

-- mallah.


>Vacuums take twice as
> long, but there is no measurable impact to the performance.
>
>> In our case i feel the hardware is bit underscaled as compared to
>> load thats why i think running in lean hours is best of both worlds
>> no performance sacrifices and intelligent vacuuming.
>
> That is a different issue altogether.
>
> Not vacuuming a running system at all during peak hours is not
> considered intelligent vacuuming IMHO.  There are plenty of use cases
> where small, frequent vacuums keep tables under control at a very low
> cost.  Letting them go for extended periods of time without vacuuming
> causes bloat and eventual slowdowns to table access which manifest in
> higher IO usage across the board.
>
> If you really are dead set on vacuuming only at night, you may want to
> do a careful analysis of which tables need to be vacuumed and when, and
> trigger manual vacuums from cron.
>
> --
> Brad Nicholson  416-673-4106
> Database Administrator, Afilias Canada Corp.
>
>

Re: scheduling autovacuum at lean hours only.

From
Guillaume Cottenceau
Date:
Rajesh Kumar Mallah <mallah.rajesh 'at' gmail.com> writes:

> our usage pattern is such that peak activity (indicated by load average)
> during day time is 10 times during night hours. Autovacuum just puts
> more pressure to the system. If less stressing version is used then

Yet it may allow a more stable performance in the mid/long term.
Not vacuuming enough, even during peak activity, may lead to
bloat in your DB and poor peak performance even without any
vacuuming, only because the DB will have to scan through a lot of
dead tuples. Unless you're doing full vacuums overnight?

> it shall take longer to complete one cycle,  which would mean  less
> performance for longer time . Less performance queues up queries
> and encourages people to re submit their queries which again
> adds to bogging up the system.

I think this user "problem" should be handled at the application
level. You need to add some "still working" pages/icons etc, or
get sure one running query from a single user prevents another
query from the same user to be run concurrently.

> In our case i feel the hardware is bit underscaled as compared to
> load thats why i think running in lean hours is best of both worlds
> no performance sacrifices and intelligent vacuuming.

IMHO you should still measure how much bloat you produce with
such a strategy. I can talk first-hand, because with only nightly
vacuuming and not so much traffic, we had ever growing bloat
which in the end led to minuscule performance (the culprit was
shared with untuned FSM and friends).

--
Guillaume Cottenceau

Re: scheduling autovacuum at lean hours only.

From
Chris Browne
Date:
mallah.rajesh@gmail.com (Rajesh Kumar Mallah) writes:
> why is it not a good idea to give end users control over when they
> want to run it ?

It's not a particularly good idea to give end users things that they
are likely then to *immediately* use to shoot themselves in the foot.

Turning off vacuuming "all day" is the sort of thing that is indeed
pretty certain to hurt you when you imagined it was going to help you.

In particular, if you shut off autovac all day, heavily updated tables
with certain sorts of (pretty common!) update patterns are certain to
"bloat up," to the point that you'll need to do CLUSTER/VACUUM FULL on
them.

In effect, the practical effect of "autovacuum at lean hours only" is
more reasonably described as "cancel autovacuum and revert to the
elder policy of requiring users to do manual vacuuming."

It's worth looking at how autovacuum has been evolving over time...

- When it was introduced, 8.0-ish (maybe it was 8.1 when it became
  "official"), it was pretty primitive.

  Autovac was a single process, where you had three controls over
  behaviour:

   - You could run autovac, or not;

   - You could exclude specific tables from being processed by autovac

   - There is a capability to vacuum less aggressively by using
     delays to reduce autovac I/O usage

- In 8.3, it was enhanced to add the notion of having multiple vacuum
  workers

  There was discussion about having one of those workers restrict
  itself to small tables, so that you'd never have the scenario where
  the workers were all busy and a small table that needed vacuuming
  was left unvacuumed for a long time.  It appears that didn't happen,
  which seems unfortunate, but that's life...

You should look at all the "knobs" that *are* offered before deciding
a policy that may be harmful to performance.  As things stand now,
there are a couple of ways I could see tuning this:

 - You might check on the GUC variables autovacuum_vacuum_cost_delay
   and autovacuum_vacuum_cost_limit, which would allow you to restrict
   the I/O cost.

   This might allow you to run autovacuum all the time without
   adversely affecting performance.

 - You might come up with a list of the *LARGE* tables that you don't
   want vacuumed during the day, and set up a cron job that adds/drops
   them from the pg_autovacuum table at the appropriate times.

   This is definitely going to be more finicky, and requires a great
   deal more awareness of the tables being updated by your
   applications.  It makes "autovacuum" a whole lot less "automatic."

There are known enhancements coming up:

 - In 8.4, there is a capability for VACUUM to only process the
   portions of the table known to have been altered.

   That ought to be a better answer than *any* of the fiddling
   suggested, to date.  Right now, a VACUUM on "public.my_huge_table",
   a table 18GB in size, will walk through the entire table, even
   though there were only a handful of pages where tuples were
   invalidated.

   This is almost certainly the single best improvement possible to
   resolve your issue; it seems likely to *directly* address the
   problem, and has the considerable merit of not requiring much if
   any configuration/reconfiguration/scheduling.
--
(reverse (concatenate 'string "gro.mca" "@" "enworbbc"))
http://linuxdatabases.info/info/
"what would  we do without C?  we   would have PASAL,  BASI, OBOL, and
Ommon Lisp." -- #Erik

Re: scheduling autovacuum at lean hours only.

From
Rajesh Kumar Mallah
Date:
Its nice to know the evolution of autovacuum and i  understand that
the suggestion/requirement  of "autovacuum at lean hours only"
was defeating the whole idea.

regds
--rajesh kumar mallah.

On Fri, Feb 13, 2009 at 11:07 PM, Chris Browne <cbbrowne@acm.org> wrote:
> mallah.rajesh@gmail.com (Rajesh Kumar Mallah) writes:
>> why is it not a good idea to give end users control over when they
>> want to run it ?
>
> It's not a particularly good idea to give end users things that they
> are likely then to *immediately* use to shoot themselves in the foot.
>
> Turning off vacuuming "all day" is the sort of thing that is indeed
> pretty certain to hurt you when you imagined it was going to help you.
>
> In particular, if you shut off autovac all day, heavily updated tables
> with certain sorts of (pretty common!) update patterns are certain to
> "bloat up," to the point that you'll need to do CLUSTER/VACUUM FULL on
> them.
>
> In effect, the practical effect of "autovacuum at lean hours only" is
> more reasonably described as "cancel autovacuum and revert to the
> elder policy of requiring users to do manual vacuuming."
>
> It's worth looking at how autovacuum has been evolving over time...
>
> - When it was introduced, 8.0-ish (maybe it was 8.1 when it became
>  "official"), it was pretty primitive.
>
>  Autovac was a single process, where you had three controls over
>  behaviour:
>
>   - You could run autovac, or not;
>
>   - You could exclude specific tables from being processed by autovac
>
>   - There is a capability to vacuum less aggressively by using
>     delays to reduce autovac I/O usage
>
> - In 8.3, it was enhanced to add the notion of having multiple vacuum
>  workers
>
>  There was discussion about having one of those workers restrict
>  itself to small tables, so that you'd never have the scenario where
>  the workers were all busy and a small table that needed vacuuming
>  was left unvacuumed for a long time.  It appears that didn't happen,
>  which seems unfortunate, but that's life...
>
> You should look at all the "knobs" that *are* offered before deciding
> a policy that may be harmful to performance.  As things stand now,
> there are a couple of ways I could see tuning this:
>
>  - You might check on the GUC variables autovacuum_vacuum_cost_delay
>   and autovacuum_vacuum_cost_limit, which would allow you to restrict
>   the I/O cost.
>
>   This might allow you to run autovacuum all the time without
>   adversely affecting performance.
>
>  - You might come up with a list of the *LARGE* tables that you don't
>   want vacuumed during the day, and set up a cron job that adds/drops
>   them from the pg_autovacuum table at the appropriate times.
>
>   This is definitely going to be more finicky, and requires a great
>   deal more awareness of the tables being updated by your
>   applications.  It makes "autovacuum" a whole lot less "automatic."
>
> There are known enhancements coming up:
>
>  - In 8.4, there is a capability for VACUUM to only process the
>   portions of the table known to have been altered.
>
>   That ought to be a better answer than *any* of the fiddling
>   suggested, to date.  Right now, a VACUUM on "public.my_huge_table",
>   a table 18GB in size, will walk through the entire table, even
>   though there were only a handful of pages where tuples were
>   invalidated.
>
>   This is almost certainly the single best improvement possible to
>   resolve your issue; it seems likely to *directly* address the
>   problem, and has the considerable merit of not requiring much if
>   any configuration/reconfiguration/scheduling.
> --
> (reverse (concatenate 'string "gro.mca" "@" "enworbbc"))
> http://linuxdatabases.info/info/
> "what would  we do without C?  we   would have PASAL,  BASI, OBOL, and
> Ommon Lisp." -- #Erik
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>