Thread: scheduling autovacuum at lean hours only.
Hi, Is it possible to configure autovacuum to run only during certain hours ? We are forced to keep it off because it pops up during the peak query hours. Regds rajesh kumar mallah.
> From: Rajesh Kumar Mallah <mallah.rajesh@gmail.com> > Is it possible to configure autovacuum to run only > during certain hours ? We are forced to keep > it off because it pops up during the peak > query hours. AFAIK not directly within the conf. However you could probably set up a shell script to turn it on and off as and when required via cron; just change the settingto off in the conf file and reload.
Rajesh Kumar Mallah <mallah.rajesh 'at' gmail.com> writes: > Hi, > > Is it possible to configure autovacuum to run only > during certain hours ? We are forced to keep > it off because it pops up during the peak > query hours. You'd rather configure the delaying process to not alter too much performance. Autovacuum is really not targeted at running once a day - I think it is partly because the old vacuuming was too weak (because too seldom in many cases) that autovaccum was added. A delaying configuration that works nicely for us without impacting performance much (tested at the time of 8.2 to cause a +40% response time during autovacuuming, compared to +300% with more default values): vacuum_cost_delay = 150 vacuum_cost_page_hit = 1 vacuum_cost_page_miss = 10 vacuum_cost_page_dirty = 20 vacuum_cost_limit = 1000 autovacuum_vacuum_cost_delay = 300 (Disclaimer: IIRC, Alvaro doesn't like these figures at all) Of course, these are good for us (bloat is very, very low and performance impact is not experienced in production), not necessarily for you. You should conduct your own tests. Be sure to also consider http://developer.postgresql.org/~wieck/vacuum_cost/ -- Guillaume Cottenceau
On Wed, Feb 11, 2009 at 7:11 PM, Guillaume Cottenceau <gc@mnc.ch> wrote: > Rajesh Kumar Mallah <mallah.rajesh 'at' gmail.com> writes: > >> Hi, >> >> Is it possible to configure autovacuum to run only >> during certain hours ? We are forced to keep >> it off because it pops up during the peak >> query hours. > > You'd rather configure the delaying process to not alter too much > performance. Autovacuum is really not targeted at running once a > day - I think it is partly because the old vacuuming was too weak > (because too seldom in many cases) that autovaccum was added. > > A delaying configuration that works nicely for us without > impacting performance much (tested at the time of 8.2 to cause a > +40% response time during autovacuuming, compared to +300% with > more default values): > > vacuum_cost_delay = 150 > vacuum_cost_page_hit = 1 > vacuum_cost_page_miss = 10 > vacuum_cost_page_dirty = 20 > vacuum_cost_limit = 1000 > autovacuum_vacuum_cost_delay = 300 why is it not a good idea to give end users control over when they want to run it ? > > (Disclaimer: IIRC, Alvaro doesn't like these figures at all) > > Of course, these are good for us (bloat is very, very low and > performance impact is not experienced in production), not > necessarily for you. You should conduct your own tests. > > Be sure to also consider http://developer.postgresql.org/~wieck/vacuum_cost/ > > -- > Guillaume Cottenceau >
Rajesh Kumar Mallah escribió: > why is it not a good idea to give end users control over when they > want to run it ? It has never been said that we don't want to give the users control. It's a matter of writing the code. If you want to propose a patch to add the feature, feel free.
On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah <mallah.rajesh@gmail.com> wrote: >> vacuum_cost_delay = 150 >> vacuum_cost_page_hit = 1 >> vacuum_cost_page_miss = 10 >> vacuum_cost_page_dirty = 20 >> vacuum_cost_limit = 1000 >> autovacuum_vacuum_cost_delay = 300 > > why is it not a good idea to give end users control over when they > want to run it ? Effectively, you have control over autovacuum via these params. You have to remember, that autovacuum doesn't cost much, and it makes planner know more about data. It's not there to clean up databases, as you might imagine - it is there to update stats, and mark pages as free. So make sure you tweak that config fist, because I have a funny feeling that you just think that vacuuming bogs down your machine, and _can_ be turned off without any bad consequences, which is simply not true. -- GJ
On Feb 11, 2009, at 6:57 AM, Rajesh Kumar Mallah wrote: > why is it not a good idea to give end users control over when they > want to run it ? There's nothing stopping you from just turning off autovacuum and running vacuum manually. The point of autovacuum is to vacuum "as needed."
On a large partitioned database, ordinary vacuum is a very very difficult option. Most of the time on such a system, most tables are dormant with respect to writes and never need to be vacuumed. A 'dailyvacuum' is likely to take a full day to run on larger systems. Since ordinary vacuum can't be run on subsets of tableswithout explicitly naming them one at a time (can't just vacuum a schema, tablespace, or use a regex to match tablenames), good luck using it effectively if you have a few thousand tables in partitions. You'll have to have applicationcode or a script with knowledge of all the partition names and which are in need of an analyze/vacuum. Autovacuum is good enough in recent versions to be tuned to have very low impact though. If you have to, rather than stopand start it, just turn the delay or cost settings up and down during different parts of the day. More than likely however,it will be able to keep up with a single set of settings. In particular, rather than making the delay longer, make the appropriate cost larger -- page miss or page dirty affect howmuch I/O it will do, and page hit will mostly affect how much CPU it uses. Perhaps a feature request is to have a version of the manual vacuum command that doesn't bother running on tables that autovacuumwouldn't touch due to insufficient data churn. This way, at lean hours one can manually vacuum to help an autovacuumthat was tuned for very low impact 'catch up'. Also, if there was some way to make vacuum not stink so badly on tables that were just loaded with pg_load, where it causeshuge write activity for tables that clearly have no bloat (I believe this is a hint bits thing?). ________________________________________ From: pgsql-performance-owner@postgresql.org [pgsql-performance-owner@postgresql.org] On Behalf Of Ben [bench@silentmedia.com] Sent: Wednesday, February 11, 2009 8:46 AM To: Rajesh Kumar Mallah Cc: PostgreSQL Performance Subject: Re: [PERFORM] scheduling autovacuum at lean hours only. On Feb 11, 2009, at 6:57 AM, Rajesh Kumar Mallah wrote: > why is it not a good idea to give end users control over when they > want to run it ? There's nothing stopping you from just turning off autovacuum and running vacuum manually. The point of autovacuum is to vacuum "as needed." -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: > On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah > <mallah.rajesh@gmail.com> wrote: > >>> vacuum_cost_delay = 150 >>> vacuum_cost_page_hit = 1 >>> vacuum_cost_page_miss = 10 >>> vacuum_cost_page_dirty = 20 >>> vacuum_cost_limit = 1000 >>> autovacuum_vacuum_cost_delay = 300 >> >> why is it not a good idea to give end users control over when they >> want to run it ? > > Effectively, you have control over autovacuum via these params. > You have to remember, that autovacuum doesn't cost much, and it makes > planner know more about data. > It's not there to clean up databases, as you might imagine - it is > there to update stats, and mark pages as free. > > So make sure you tweak that config fist, because I have a funny > feeling that you just think that vacuuming bogs down your machine, and > _can_ be turned off without any bad consequences, which is simply not > true. our usage pattern is such that peak activity (indicated by load average) during day time is 10 times during night hours. Autovacuum just puts more pressure to the system. If less stressing version is used then it shall take longer to complete one cycle, which would mean less performance for longer time . Less performance queues up queries and encourages people to re submit their queries which again adds to bogging up the system. In our case i feel the hardware is bit underscaled as compared to load thats why i think running in lean hours is best of both worlds no performance sacrifices and intelligent vacuuming. regds -- mallah. > > > -- > GJ >
On Wed, 11 Feb 2009, Scott Carey wrote: > On a large partitioned database, ordinary vacuum is a very very difficult option. > > Most of the time on such a system, most tables are dormant with respect to writes and never need to be vacuumed. A 'dailyvacuum' is likely to take a full day to run on larger systems. Since ordinary vacuum can't be run on subsets of tableswithout explicitly naming them one at a time (can't just vacuum a schema, tablespace, or use a regex to match tablenames), good luck using it effectively if you have a few thousand tables in partitions. You'll have to have applicationcode or a script with knowledge of all the partition names and which are in need of an analyze/vacuum. > > Autovacuum is good enough in recent versions to be tuned to have very low impact though. If you have to, rather than stopand start it, just turn the delay or cost settings up and down during different parts of the day. More than likely however,it will be able to keep up with a single set of settings. > In particular, rather than making the delay longer, make the appropriate cost larger -- page miss or page dirty affecthow much I/O it will do, and page hit will mostly affect how much CPU it uses. > > Perhaps a feature request is to have a version of the manual vacuum command that doesn't bother running on tables thatautovacuum wouldn't touch due to insufficient data churn. This way, at lean hours one can manually vacuum to help anautovacuum that was tuned for very low impact 'catch up'. > Also, if there was some way to make vacuum not stink so badly on tables that were just loaded with pg_load, where it causeshuge write activity for tables that clearly have no bloat (I believe this is a hint bits thing?). Oh, I agree with everything you say. I'm just pointing out that if you really do want control over when things get vacuumed (e.g. you have a mostly-read database 20 hours a day and then 4 hours of heavy churn at night) then you can still do that if you want.
On Wed, 2009-02-11 at 22:57 +0530, Rajesh Kumar Mallah wrote: > On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: > > On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah > > <mallah.rajesh@gmail.com> wrote: > > > >>> vacuum_cost_delay = 150 > >>> vacuum_cost_page_hit = 1 > >>> vacuum_cost_page_miss = 10 > >>> vacuum_cost_page_dirty = 20 > >>> vacuum_cost_limit = 1000 > >>> autovacuum_vacuum_cost_delay = 300 > >> > >> why is it not a good idea to give end users control over when they > >> want to run it ? > > > > Effectively, you have control over autovacuum via these params. > > You have to remember, that autovacuum doesn't cost much, and it makes > > planner know more about data. > > It's not there to clean up databases, as you might imagine - it is > > there to update stats, and mark pages as free. > > > > So make sure you tweak that config fist, because I have a funny > > feeling that you just think that vacuuming bogs down your machine, and > > _can_ be turned off without any bad consequences, which is simply not > > true. > > our usage pattern is such that peak activity (indicated by load average) > during day time is 10 times during night hours. Autovacuum just puts > more pressure to the system. If less stressing version is used then > it shall take longer to complete one cycle, which would mean less > performance for longer time . Less performance queues up queries > and encourages people to re submit their queries which again > adds to bogging up the system. That's not exactly how it works in practise, if tuned properly. It may take longer, but it is less intensive while running. We had one system that had spikes happening due to the exact case you described - there were noticeably high IO wait times while certain tables were being vacuumed. We set the cost delay and the wait times dropped to the point where it was non-issue. Vacuums take twice as long, but there is no measurable impact to the performance. > In our case i feel the hardware is bit underscaled as compared to > load thats why i think running in lean hours is best of both worlds > no performance sacrifices and intelligent vacuuming. That is a different issue altogether. Not vacuuming a running system at all during peak hours is not considered intelligent vacuuming IMHO. There are plenty of use cases where small, frequent vacuums keep tables under control at a very low cost. Letting them go for extended periods of time without vacuuming causes bloat and eventual slowdowns to table access which manifest in higher IO usage across the board. If you really are dead set on vacuuming only at night, you may want to do a careful analysis of which tables need to be vacuumed and when, and trigger manual vacuums from cron. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
On Wed, Feb 11, 2009 at 11:30 PM, Brad Nicholson <bnichols@ca.afilias.info> wrote: > On Wed, 2009-02-11 at 22:57 +0530, Rajesh Kumar Mallah wrote: >> On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: >> > On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah >> > <mallah.rajesh@gmail.com> wrote: >> > >> >>> vacuum_cost_delay = 150 >> >>> vacuum_cost_page_hit = 1 >> >>> vacuum_cost_page_miss = 10 >> >>> vacuum_cost_page_dirty = 20 >> >>> vacuum_cost_limit = 1000 >> >>> autovacuum_vacuum_cost_delay = 300 >> >> >> >> why is it not a good idea to give end users control over when they >> >> want to run it ? >> > >> > Effectively, you have control over autovacuum via these params. >> > You have to remember, that autovacuum doesn't cost much, and it makes >> > planner know more about data. >> > It's not there to clean up databases, as you might imagine - it is >> > there to update stats, and mark pages as free. >> > >> > So make sure you tweak that config fist, because I have a funny >> > feeling that you just think that vacuuming bogs down your machine, and >> > _can_ be turned off without any bad consequences, which is simply not >> > true. >> >> our usage pattern is such that peak activity (indicated by load average) >> during day time is 10 times during night hours. Autovacuum just puts >> more pressure to the system. If less stressing version is used then >> it shall take longer to complete one cycle, which would mean less >> performance for longer time . Less performance queues up queries >> and encourages people to re submit their queries which again >> adds to bogging up the system. > > That's not exactly how it works in practise, if tuned properly. It may > take longer, but it is less intensive while running. > > We had one system that had spikes happening due to the exact case you > described - there were noticeably high IO wait times while certain > tables were being vacuumed. We set the cost delay and the wait times > dropped to the point where it was non-issue. I think i can take this route and monitor the io activity during vacuums. thanks everyone for their suggestions. -- mallah. >Vacuums take twice as > long, but there is no measurable impact to the performance. > >> In our case i feel the hardware is bit underscaled as compared to >> load thats why i think running in lean hours is best of both worlds >> no performance sacrifices and intelligent vacuuming. > > That is a different issue altogether. > > Not vacuuming a running system at all during peak hours is not > considered intelligent vacuuming IMHO. There are plenty of use cases > where small, frequent vacuums keep tables under control at a very low > cost. Letting them go for extended periods of time without vacuuming > causes bloat and eventual slowdowns to table access which manifest in > higher IO usage across the board. > > If you really are dead set on vacuuming only at night, you may want to > do a careful analysis of which tables need to be vacuumed and when, and > trigger manual vacuums from cron. > > -- > Brad Nicholson 416-673-4106 > Database Administrator, Afilias Canada Corp. > >
Rajesh Kumar Mallah <mallah.rajesh 'at' gmail.com> writes: > our usage pattern is such that peak activity (indicated by load average) > during day time is 10 times during night hours. Autovacuum just puts > more pressure to the system. If less stressing version is used then Yet it may allow a more stable performance in the mid/long term. Not vacuuming enough, even during peak activity, may lead to bloat in your DB and poor peak performance even without any vacuuming, only because the DB will have to scan through a lot of dead tuples. Unless you're doing full vacuums overnight? > it shall take longer to complete one cycle, which would mean less > performance for longer time . Less performance queues up queries > and encourages people to re submit their queries which again > adds to bogging up the system. I think this user "problem" should be handled at the application level. You need to add some "still working" pages/icons etc, or get sure one running query from a single user prevents another query from the same user to be run concurrently. > In our case i feel the hardware is bit underscaled as compared to > load thats why i think running in lean hours is best of both worlds > no performance sacrifices and intelligent vacuuming. IMHO you should still measure how much bloat you produce with such a strategy. I can talk first-hand, because with only nightly vacuuming and not so much traffic, we had ever growing bloat which in the end led to minuscule performance (the culprit was shared with untuned FSM and friends). -- Guillaume Cottenceau
mallah.rajesh@gmail.com (Rajesh Kumar Mallah) writes: > why is it not a good idea to give end users control over when they > want to run it ? It's not a particularly good idea to give end users things that they are likely then to *immediately* use to shoot themselves in the foot. Turning off vacuuming "all day" is the sort of thing that is indeed pretty certain to hurt you when you imagined it was going to help you. In particular, if you shut off autovac all day, heavily updated tables with certain sorts of (pretty common!) update patterns are certain to "bloat up," to the point that you'll need to do CLUSTER/VACUUM FULL on them. In effect, the practical effect of "autovacuum at lean hours only" is more reasonably described as "cancel autovacuum and revert to the elder policy of requiring users to do manual vacuuming." It's worth looking at how autovacuum has been evolving over time... - When it was introduced, 8.0-ish (maybe it was 8.1 when it became "official"), it was pretty primitive. Autovac was a single process, where you had three controls over behaviour: - You could run autovac, or not; - You could exclude specific tables from being processed by autovac - There is a capability to vacuum less aggressively by using delays to reduce autovac I/O usage - In 8.3, it was enhanced to add the notion of having multiple vacuum workers There was discussion about having one of those workers restrict itself to small tables, so that you'd never have the scenario where the workers were all busy and a small table that needed vacuuming was left unvacuumed for a long time. It appears that didn't happen, which seems unfortunate, but that's life... You should look at all the "knobs" that *are* offered before deciding a policy that may be harmful to performance. As things stand now, there are a couple of ways I could see tuning this: - You might check on the GUC variables autovacuum_vacuum_cost_delay and autovacuum_vacuum_cost_limit, which would allow you to restrict the I/O cost. This might allow you to run autovacuum all the time without adversely affecting performance. - You might come up with a list of the *LARGE* tables that you don't want vacuumed during the day, and set up a cron job that adds/drops them from the pg_autovacuum table at the appropriate times. This is definitely going to be more finicky, and requires a great deal more awareness of the tables being updated by your applications. It makes "autovacuum" a whole lot less "automatic." There are known enhancements coming up: - In 8.4, there is a capability for VACUUM to only process the portions of the table known to have been altered. That ought to be a better answer than *any* of the fiddling suggested, to date. Right now, a VACUUM on "public.my_huge_table", a table 18GB in size, will walk through the entire table, even though there were only a handful of pages where tuples were invalidated. This is almost certainly the single best improvement possible to resolve your issue; it seems likely to *directly* address the problem, and has the considerable merit of not requiring much if any configuration/reconfiguration/scheduling. -- (reverse (concatenate 'string "gro.mca" "@" "enworbbc")) http://linuxdatabases.info/info/ "what would we do without C? we would have PASAL, BASI, OBOL, and Ommon Lisp." -- #Erik
Its nice to know the evolution of autovacuum and i understand that the suggestion/requirement of "autovacuum at lean hours only" was defeating the whole idea. regds --rajesh kumar mallah. On Fri, Feb 13, 2009 at 11:07 PM, Chris Browne <cbbrowne@acm.org> wrote: > mallah.rajesh@gmail.com (Rajesh Kumar Mallah) writes: >> why is it not a good idea to give end users control over when they >> want to run it ? > > It's not a particularly good idea to give end users things that they > are likely then to *immediately* use to shoot themselves in the foot. > > Turning off vacuuming "all day" is the sort of thing that is indeed > pretty certain to hurt you when you imagined it was going to help you. > > In particular, if you shut off autovac all day, heavily updated tables > with certain sorts of (pretty common!) update patterns are certain to > "bloat up," to the point that you'll need to do CLUSTER/VACUUM FULL on > them. > > In effect, the practical effect of "autovacuum at lean hours only" is > more reasonably described as "cancel autovacuum and revert to the > elder policy of requiring users to do manual vacuuming." > > It's worth looking at how autovacuum has been evolving over time... > > - When it was introduced, 8.0-ish (maybe it was 8.1 when it became > "official"), it was pretty primitive. > > Autovac was a single process, where you had three controls over > behaviour: > > - You could run autovac, or not; > > - You could exclude specific tables from being processed by autovac > > - There is a capability to vacuum less aggressively by using > delays to reduce autovac I/O usage > > - In 8.3, it was enhanced to add the notion of having multiple vacuum > workers > > There was discussion about having one of those workers restrict > itself to small tables, so that you'd never have the scenario where > the workers were all busy and a small table that needed vacuuming > was left unvacuumed for a long time. It appears that didn't happen, > which seems unfortunate, but that's life... > > You should look at all the "knobs" that *are* offered before deciding > a policy that may be harmful to performance. As things stand now, > there are a couple of ways I could see tuning this: > > - You might check on the GUC variables autovacuum_vacuum_cost_delay > and autovacuum_vacuum_cost_limit, which would allow you to restrict > the I/O cost. > > This might allow you to run autovacuum all the time without > adversely affecting performance. > > - You might come up with a list of the *LARGE* tables that you don't > want vacuumed during the day, and set up a cron job that adds/drops > them from the pg_autovacuum table at the appropriate times. > > This is definitely going to be more finicky, and requires a great > deal more awareness of the tables being updated by your > applications. It makes "autovacuum" a whole lot less "automatic." > > There are known enhancements coming up: > > - In 8.4, there is a capability for VACUUM to only process the > portions of the table known to have been altered. > > That ought to be a better answer than *any* of the fiddling > suggested, to date. Right now, a VACUUM on "public.my_huge_table", > a table 18GB in size, will walk through the entire table, even > though there were only a handful of pages where tuples were > invalidated. > > This is almost certainly the single best improvement possible to > resolve your issue; it seems likely to *directly* address the > problem, and has the considerable merit of not requiring much if > any configuration/reconfiguration/scheduling. > -- > (reverse (concatenate 'string "gro.mca" "@" "enworbbc")) > http://linuxdatabases.info/info/ > "what would we do without C? we would have PASAL, BASI, OBOL, and > Ommon Lisp." -- #Erik > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >