Thread: autovacuum disk IO
Hi.
I've noticed that autovac. process worked more than 10 minutes, during this zabbix logged more than 90% IO disk utilization on db volume....
===========>29237 2016-03-02 15:17:23 EET 00000 [24-1]LOG: automatic vacuum of table "lb_upr.public._reference32": index scans: 1pages: 0 removed, 263307 remaintuples: 298 removed, 1944753 remain, 0 are dead but not yet removablebuffer usage: 67814 hits, 265465 misses, 15647 dirtiedavg read rate: 3.183 MB/s, avg write rate: 0.188 MB/ssystem usage: CPU 5.34s/6.27u sec elapsed 651.57 secIs it possible to log autovac. io impact during it execution?
Is there any way to limit or "nice" autovac. process?
Thanks to all for any help.
Hi
2016-03-02 16:25 GMT+01:00 Artem Tomyuk <admin@leboutique.com>:
Hi.I've noticed that autovac. process worked more than 10 minutes, during this zabbix logged more than 90% IO disk utilization on db volume....===========>29237 2016-03-02 15:17:23 EET 00000 [24-1]LOG: automatic vacuum of table "lb_upr.public._reference32": index scans: 1pages: 0 removed, 263307 remaintuples: 298 removed, 1944753 remain, 0 are dead but not yet removablebuffer usage: 67814 hits, 265465 misses, 15647 dirtiedavg read rate: 3.183 MB/s, avg write rate: 0.188 MB/ssystem usage: CPU 5.34s/6.27u sec elapsed 651.57 secIs it possible to log autovac. io impact during it execution?Is there any way to limit or "nice" autovac. process?Thanks to all for any help.
maybe offtopic - there is known problem of Zabbix. Any limits for vacuum are usually way to hell.
But more times the partitioning helps to Zabbix
https://www.zabbix.org/wiki/Higher_performant_partitioning_in_PostgreSQL
https://www.zabbix.org/wiki/Higher_performant_partitioning_in_PostgreSQL
Regards
Pavel
On Wed, Mar 2, 2016 at 8:25 AM, Artem Tomyuk <admin@leboutique.com> wrote: > Hi. > > I've noticed that autovac. process worked more than 10 minutes, during this > zabbix logged more than 90% IO disk utilization on db volume.... > > ===========>29237 2016-03-02 15:17:23 EET 00000 [24-1]LOG: automatic > vacuum of table "lb_upr.public._reference32": index scans: 1 > pages: 0 removed, 263307 remain > tuples: 298 removed, 1944753 remain, 0 are dead but not yet removable > buffer usage: 67814 hits, 265465 misses, 15647 dirtied > avg read rate: 3.183 MB/s, avg write rate: 0.188 MB/s > system usage: CPU 5.34s/6.27u sec elapsed 651.57 sec > > Is it possible to log autovac. io impact during it execution? > Is there any way to limit or "nice" autovac. process? I'll assume you're running a fairly recent version of postgresql. There are a few settings that adjust how hard autovacuum works when it's working. autovacuum_max_workers tells autovacuum how many threads to vacuum with. Lowering this will limit the impact of autovacuum, but generally the default setting of 3 is reasonable on most machines. autovacuum_vacuum_cost_delay sets how to wail between internal rounds. Raising this makes autovacuum take bigger pauses internally. The default of 20ms is usually large enough to keep you out of trouble, but feel free to raise it and see if your IO utilization lowers. autovacuum_vacuum_cost_limit sets a limit to how much work to do between the pauses set by the cost delay above. Lowering this will cause autovac to do less work between pauses. Most of the time I'm adjusting these I'm making vacuum more aggressive, not less aggressive because vacuum falling behind is a problem on the large, fast production systems I work on. In your case you want to watch for when autovacuum IS running, and using a tool like vmstat or iostat or iotop, watch it for % utilization. You can then adjust cost delay and cost limit to make it less aggressive and see if your io util goes down. Note though that 90% utilization isn't 100% so it's not likely flooding the IO. But if you say raise cost delay from 20 to 40ms, it might drop to 75% or so. The primary goal here is to arrive at numbers that left autovacuum keep up with reclaiming the discarded tuples in the database without getting in the way of the workload. If your workload isn't slowing down, or isn't slowing down very much, during autobvacuum then you're OK.
On Wed, Mar 2, 2016 at 8:45 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Wed, Mar 2, 2016 at 8:25 AM, Artem Tomyuk <admin@leboutique.com> wrote: >> Hi. >> >> I've noticed that autovac. process worked more than 10 minutes, during this >> zabbix logged more than 90% IO disk utilization on db volume.... >> >> ===========>29237 2016-03-02 15:17:23 EET 00000 [24-1]LOG: automatic >> vacuum of table "lb_upr.public._reference32": index scans: 1 >> pages: 0 removed, 263307 remain >> tuples: 298 removed, 1944753 remain, 0 are dead but not yet removable >> buffer usage: 67814 hits, 265465 misses, 15647 dirtied >> avg read rate: 3.183 MB/s, avg write rate: 0.188 MB/s >> system usage: CPU 5.34s/6.27u sec elapsed 651.57 sec >> >> Is it possible to log autovac. io impact during it execution? >> Is there any way to limit or "nice" autovac. process? > > I'll assume you're running a fairly recent version of postgresql. > > There are a few settings that adjust how hard autovacuum works when > it's working. > > autovacuum_max_workers tells autovacuum how many threads to vacuum > with. Lowering this will limit the impact of autovacuum, but generally > the default setting of 3 is reasonable on most machines. > > autovacuum_vacuum_cost_delay sets how to wail between internal rounds. > Raising this makes autovacuum take bigger pauses internally. The > default of 20ms is usually large enough to keep you out of trouble, > but feel free to raise it and see if your IO utilization lowers. > > autovacuum_vacuum_cost_limit sets a limit to how much work to do > between the pauses set by the cost delay above. Lowering this will > cause autovac to do less work between pauses. > > Most of the time I'm adjusting these I'm making vacuum more > aggressive, not less aggressive because vacuum falling behind is a > problem on the large, fast production systems I work on. In your case > you want to watch for when autovacuum IS running, and using a tool > like vmstat or iostat or iotop, watch it for % utilization. You can > then adjust cost delay and cost limit to make it less aggressive and > see if your io util goes down. > > Note though that 90% utilization isn't 100% so it's not likely > flooding the IO. But if you say raise cost delay from 20 to 40ms, it > might drop to 75% or so. The primary goal here is to arrive at numbers > that left autovacuum keep up with reclaiming the discarded tuples in > the database without getting in the way of the workload. > > If your workload isn't slowing down, or isn't slowing down very much, > during autobvacuum then you're OK. Just to add a point here. If you're machine can't keep up with production load AND the job of vacuuming, then your IO subsystem is too slow and needs upgrading. The difference between a pair of spinning 7200RPM drives and a pair of enterprise class SSDs (always with power off safe writing etc, consumer SSDs can eat your data on power off) can be truly huge. I've seen improvements from a few hundred transactions per second to thousands of transactions per second by a simple upgrade like that.
Il 02/03/2016 16:49, Scott Marlowe ha scritto: > On Wed, Mar 2, 2016 at 8:45 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Wed, Mar 2, 2016 at 8:25 AM, Artem Tomyuk <admin@leboutique.com> wrote: >>> Hi. >>> >>> I've noticed that autovac. process worked more than 10 minutes, during this >>> zabbix logged more than 90% IO disk utilization on db volume.... >>> >>> ===========>29237 2016-03-02 15:17:23 EET 00000 [24-1]LOG: automatic >>> vacuum of table "lb_upr.public._reference32": index scans: 1 >>> pages: 0 removed, 263307 remain >>> tuples: 298 removed, 1944753 remain, 0 are dead but not yet removable >>> buffer usage: 67814 hits, 265465 misses, 15647 dirtied >>> avg read rate: 3.183 MB/s, avg write rate: 0.188 MB/s >>> system usage: CPU 5.34s/6.27u sec elapsed 651.57 sec >>> >>> Is it possible to log autovac. io impact during it execution? >>> Is there any way to limit or "nice" autovac. process? >> I'll assume you're running a fairly recent version of postgresql. >> >> There are a few settings that adjust how hard autovacuum works when >> it's working. >> >> autovacuum_max_workers tells autovacuum how many threads to vacuum >> with. Lowering this will limit the impact of autovacuum, but generally >> the default setting of 3 is reasonable on most machines. >> >> autovacuum_vacuum_cost_delay sets how to wail between internal rounds. >> Raising this makes autovacuum take bigger pauses internally. The >> default of 20ms is usually large enough to keep you out of trouble, >> but feel free to raise it and see if your IO utilization lowers. >> >> autovacuum_vacuum_cost_limit sets a limit to how much work to do >> between the pauses set by the cost delay above. Lowering this will >> cause autovac to do less work between pauses. >> >> Most of the time I'm adjusting these I'm making vacuum more >> aggressive, not less aggressive because vacuum falling behind is a >> problem on the large, fast production systems I work on. In your case >> you want to watch for when autovacuum IS running, and using a tool >> like vmstat or iostat or iotop, watch it for % utilization. You can >> then adjust cost delay and cost limit to make it less aggressive and >> see if your io util goes down. >> >> Note though that 90% utilization isn't 100% so it's not likely >> flooding the IO. But if you say raise cost delay from 20 to 40ms, it >> might drop to 75% or so. The primary goal here is to arrive at numbers >> that left autovacuum keep up with reclaiming the discarded tuples in >> the database without getting in the way of the workload. >> >> If your workload isn't slowing down, or isn't slowing down very much, >> during autobvacuum then you're OK. > Just to add a point here. If you're machine can't keep up with > production load AND the job of vacuuming, then your IO subsystem is > too slow and needs upgrading. The difference between a pair of > spinning 7200RPM drives and a pair of enterprise class SSDs (always > with power off safe writing etc, consumer SSDs can eat your data on > power off) can be truly huge. I've seen improvements from a few > hundred transactions per second to thousands of transactions per > second by a simple upgrade like that. > > ... or maybe add some more RAM to have more disk caching (if you're on *nix).... this worked for me in the past... even if IMHO it's more a temporary "patch" while upgrading (if it can't be done in a hurry) than a real solution... Cheers Moreno.
On Wed, Mar 2, 2016 at 9:11 AM, Moreno Andreo <moreno.andreo@evolu-s.it> wrote: > Il 02/03/2016 16:49, Scott Marlowe ha scritto: >> >> On Wed, Mar 2, 2016 at 8:45 AM, Scott Marlowe <scott.marlowe@gmail.com> >> wrote: >>> >>> On Wed, Mar 2, 2016 at 8:25 AM, Artem Tomyuk <admin@leboutique.com> >>> wrote: >>>> >>>> Hi. >>>> >>>> I've noticed that autovac. process worked more than 10 minutes, during >>>> this >>>> zabbix logged more than 90% IO disk utilization on db volume.... >>>> >>>> ===========>29237 2016-03-02 15:17:23 EET 00000 [24-1]LOG: automatic >>>> vacuum of table "lb_upr.public._reference32": index scans: 1 >>>> pages: 0 removed, 263307 remain >>>> tuples: 298 removed, 1944753 remain, 0 are dead but not yet removable >>>> buffer usage: 67814 hits, 265465 misses, 15647 dirtied >>>> avg read rate: 3.183 MB/s, avg write rate: 0.188 MB/s >>>> system usage: CPU 5.34s/6.27u sec elapsed 651.57 sec >>>> >>>> Is it possible to log autovac. io impact during it execution? >>>> Is there any way to limit or "nice" autovac. process? >>> >>> I'll assume you're running a fairly recent version of postgresql. >>> >>> There are a few settings that adjust how hard autovacuum works when >>> it's working. >>> >>> autovacuum_max_workers tells autovacuum how many threads to vacuum >>> with. Lowering this will limit the impact of autovacuum, but generally >>> the default setting of 3 is reasonable on most machines. >>> >>> autovacuum_vacuum_cost_delay sets how to wail between internal rounds. >>> Raising this makes autovacuum take bigger pauses internally. The >>> default of 20ms is usually large enough to keep you out of trouble, >>> but feel free to raise it and see if your IO utilization lowers. >>> >>> autovacuum_vacuum_cost_limit sets a limit to how much work to do >>> between the pauses set by the cost delay above. Lowering this will >>> cause autovac to do less work between pauses. >>> >>> Most of the time I'm adjusting these I'm making vacuum more >>> aggressive, not less aggressive because vacuum falling behind is a >>> problem on the large, fast production systems I work on. In your case >>> you want to watch for when autovacuum IS running, and using a tool >>> like vmstat or iostat or iotop, watch it for % utilization. You can >>> then adjust cost delay and cost limit to make it less aggressive and >>> see if your io util goes down. >>> >>> Note though that 90% utilization isn't 100% so it's not likely >>> flooding the IO. But if you say raise cost delay from 20 to 40ms, it >>> might drop to 75% or so. The primary goal here is to arrive at numbers >>> that left autovacuum keep up with reclaiming the discarded tuples in >>> the database without getting in the way of the workload. >>> >>> If your workload isn't slowing down, or isn't slowing down very much, >>> during autobvacuum then you're OK. >> >> Just to add a point here. If you're machine can't keep up with >> production load AND the job of vacuuming, then your IO subsystem is >> too slow and needs upgrading. The difference between a pair of >> spinning 7200RPM drives and a pair of enterprise class SSDs (always >> with power off safe writing etc, consumer SSDs can eat your data on >> power off) can be truly huge. I've seen improvements from a few >> hundred transactions per second to thousands of transactions per >> second by a simple upgrade like that. >> >> > ... or maybe add some more RAM to have more disk caching (if you're on > *nix).... this worked for me in the past... even if IMHO it's more a > temporary "patch" while upgrading (if it can't be done in a hurry) than a > real solution... Oh yeah, definitely worth looking at. But RAM can't speed up writes, just reads, so it's very workload dependent. If you're IO subsystem is maxing out on writes, faster drives / IO. If it's maxing out on reads, more memory. But if your dataset is much bigger than memory (say 64GB RAM and a 1TB data store) then more RAM isn't going to be the answer. So as usual, to help out OP we might want to know more about his system. There's a lot of helpful tips for reporting slow queries / performance issues here: https://wiki.postgresql.org/wiki/SlowQueryQuestions
Scott Marlowe wrote: > On Wed, Mar 2, 2016 at 9:11 AM, Moreno Andreo <moreno.andreo@evolu-s.it> wrote: > > ... or maybe add some more RAM to have more disk caching (if you're on > > *nix).... this worked for me in the past... even if IMHO it's more a > > temporary "patch" while upgrading (if it can't be done in a hurry) than a > > real solution... > > Oh yeah, definitely worth looking at. But RAM can't speed up writes, > just reads, so it's very workload dependent. If you're IO subsystem is > maxing out on writes, faster drives / IO. If it's maxing out on reads, > more memory. But if your dataset is much bigger than memory (say 64GB > RAM and a 1TB data store) then more RAM isn't going to be the answer. In the particular case of autovacuum, it may be helpful to create a "ramdisk" and put the stats temp file in it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Il 02/03/2016 19:40, Alvaro Herrera ha scritto: > Scott Marlowe wrote: >> On Wed, Mar 2, 2016 at 9:11 AM, Moreno Andreo <moreno.andreo@evolu-s.it> wrote: >>> ... or maybe add some more RAM to have more disk caching (if you're on >>> *nix).... this worked for me in the past... even if IMHO it's more a >>> temporary "patch" while upgrading (if it can't be done in a hurry) than a >>> real solution... >> Oh yeah, definitely worth looking at. But RAM can't speed up writes, >> just reads, so it's very workload dependent. If you're IO subsystem is >> maxing out on writes, faster drives / IO. If it's maxing out on reads, >> more memory. But if your dataset is much bigger than memory (say 64GB >> RAM and a 1TB data store) then more RAM isn't going to be the answer. > In the particular case of autovacuum, it may be helpful to create a > "ramdisk" and put the stats temp file in it. > Definitely. I my new server (as I've been taught here :-) ) I'm going to put stats in a ramdisk and pg_xlog in another partition.
Le 2 mars 2016 16:25:10 GMT+01:00, Artem Tomyuk <admin@leboutique.com> a écrit : >Hi. > >I've noticed that autovac. process worked more than 10 minutes, during >this >zabbix logged more than 90% IO disk utilization on db volume.... > >===========>29237 2016-03-02 15:17:23 EET 00000 [24-1]LOG: >automatic vacuum of table "lb_upr.public._reference32": index scans: 1 > pages: 0 removed, 263307 remain > tuples: 298 removed, 1944753 remain, 0 are dead but not yet removable > buffer usage: 67814 hits, 265465 misses, 15647 dirtied > avg read rate: 3.183 MB/s, avg write rate: 0.188 MB/s According to what I am reading here, your autovacumm avg read activity was less than 4MB/s...and 188kB/s for writes. I seriously doubt these numbers are high enough to saturate your disks bandwidth. > *system usage: CPU 5.34s/6.27u sec elapsed 651.57 sec* This tells that the autovacumm tooks 651s to complete...but only 5s and 6s of cpu in kernel and userland in total. Autovacuum are slowed down thanks to autovacumm_vacuum_cost_delay param, which explains the 10min run and such a low diskactivity. >Is it possible to log autovac. io impact during it execution? I would bet the problem is elsewhere... /Ioguix