Thread: bgwriter tunables vs pg_stat_bgwriter
Since getting on 8.4 I've been monitoring things fairly closely. I whipped up a quick script to monitor pg_stat_bgwriter and save deltas every minute so I can ensure my bgwriter is beating out the backends for writes (as it is supposed to do). Now, the odd thing I'm running into is this: bgwriter_delay is 100ms (ie 10 times a second, give or take) bgwriter_lru_maxpages is 500 (~5000 pages / second) bgwriter_lru_multiplier is 4 Now, assuming I understand these values right the following is what should typically happen: while(true) { if buffers_written > bgwriter_lru_maxpages or buffers_written > anticipated_pages_needed * bgwriter_lru_multiplier { sleep(bgwriter_delay ms) continue; } ... } so I should not be able to have more than ~5000 bgwriter_clean pages per minute. (this assumes writing takes 0ms, which of course is inaccurate) However, I see this in my stats (they are deltas), and I'm reasonably sure it is not a bug in the code: (timestamp, buffers clean, buffers_checkpoint, buffers backend) 2010-02-17 08:23:51.184018 | 1 | 1686 | 5 2010-02-17 08:22:51.170863 | 15289 | 12676 | 207 2010-02-17 08:21:51.155793 | 38467 | 8993 | 4277 2010-02-17 08:20:51.139199 | 35582 | 0 | 9437 2010-02-17 08:19:51.125025 | 8 | 0 | 3 2010-02-17 08:18:51.111184 | 1140 | 1464 | 6 2010-02-17 08:17:51.098422 | 0 | 1682 | 228 2010-02-17 08:16:51.082804 | 50 | 0 | 6 2010-02-17 08:15:51.067886 | 789 | 0 | 1 perhaps some stats buffering occurring or something or some general misunderstanding of some of these tunables? -- Jeff Trout <jeff@jefftrout.com> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/
On Wed, 2010-02-17 at 08:30 -0500, Jeff wrote: > Since getting on 8.4 I've been monitoring things fairly closely. > I whipped up a quick script to monitor pg_stat_bgwriter and save > deltas every minute so I can ensure my bgwriter is beating out the > backends for writes (as it is supposed to do). > > Now, the odd thing I'm running into is this: > > bgwriter_delay is 100ms (ie 10 times a second, give or take) > bgwriter_lru_maxpages is 500 (~5000 pages / second) > bgwriter_lru_multiplier is 4 > > Now, assuming I understand these values right the following is what > should typically happen: > > while(true) > { > if buffers_written > bgwriter_lru_maxpages > or buffers_written > anticipated_pages_needed * > bgwriter_lru_multiplier > { > sleep(bgwriter_delay ms) > continue; > } > ... > } Correct. > so I should not be able to have more than ~5000 bgwriter_clean pages > per minute. (this assumes writing takes 0ms, which of course is > inaccurate) That works out to 5000/second - 300,000/minute. > However, I see this in my stats (they are deltas), and I'm reasonably > sure it is not a bug in the code: > > (timestamp, buffers clean, buffers_checkpoint, buffers backend) > 2010-02-17 08:23:51.184018 | 1 | 1686 > | 5 > 2010-02-17 08:22:51.170863 | 15289 | 12676 > | 207 > 2010-02-17 08:21:51.155793 | 38467 | 8993 > | 4277 > 2010-02-17 08:20:51.139199 | 35582 | 0 > | 9437 > 2010-02-17 08:19:51.125025 | 8 | 0 > | 3 > 2010-02-17 08:18:51.111184 | 1140 | 1464 > | 6 > 2010-02-17 08:17:51.098422 | 0 | 1682 > | 228 > 2010-02-17 08:16:51.082804 | 50 | 0 > | 6 > 2010-02-17 08:15:51.067886 | 789 | 0 > | 1 > > perhaps some stats buffering occurring or something or some general > misunderstanding of some of these tunables? > > -- > Jeff Trout <jeff@jefftrout.com> > http://www.stuarthamm.net/ > http://www.dellsmartexitin.com/ > > > > -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
Jeff wrote: > while(true) > { > if buffers_written > bgwriter_lru_maxpages > or buffers_written > anticipated_pages_needed * > bgwriter_lru_multiplier > { > sleep(bgwriter_delay ms) > continue; > } > ... > } > > so I should not be able to have more than ~5000 bgwriter_clean pages > per minute. (this assumes writing takes 0ms, which of course is > inaccurate) That's not how the loop is structured. It's actually more like: -Compute anticipated_pages_needed * bgwriter_lru_multiplier -Enter a cleaning loop until that many are confirmed free -or- bgwriter_lru_maxpages is reached -sleep(bgwriter_delay ms) > perhaps some stats buffering occurring or something or some general > misunderstanding of some of these tunables? With bgwriter_lru_maxpages=500 and bgwriter_delay=100ms, you can get up to 5000 pages/second which makes for 300,000 pages/minute. So none of your numbers look funny just via their scale. This is why the defaults are so low--the maximum output of the background writer is quite big even before you adjust it upwards. There are however two bits of stats buffering involved. Stats updates don't become visible instantly, they're buffered and only get their updates pushed out periodically to where clients can see them to reduce overhead. Also, the checkpoint write update happens in one update at the end--not incrementally as the checkpoint progresses. The idea is that you should be able to tell if a checkpoint happened or not during a period of monitoring time. You look to be having checkpoints as often as once per minute right now, so something isn't right--probably checkpoint_segments is too low for your workload. By the way, your monitoring code should be saving maxwritten_clean and buffers_allocated, too. While you may not be doing something with them yet, the former will shed some light on what you're running into now, and the latter is useful later down the road you're walking. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Feb 17, 2010, at 6:23 PM, Greg Smith wrote: > JWith bgwriter_lru_maxpages=500 and bgwriter_delay=100ms, you can > get up to 5000 pages/second which makes for 300,000 pages/minute. > So none of your numbers look funny just via their scale. This is > why the defaults are so low--the maximum output of the background > writer is quite big even before you adjust it upwards. > d'oh! that would be the reason. Sorry folks, nothing to see here :) > There are however two bits of stats buffering involved. Stats > updates don't become visible instantly, they're buffered and only > get their updates pushed out periodically to where clients can see > them to reduce overhead. Also, the checkpoint write update happens > in one update at the end--not incrementally as the checkpoint > progresses. The idea is that you should be able to tell if a > checkpoint happened or not during a period of monitoring time. You > look to be having checkpoints as often as once per minute right now, > so something isn't right--probably checkpoint_segments is too low > for your workload. > checkpoint_segments is currently 32. maybe I'll bump it up - this db does a LOT of writes > By the way, your monitoring code should be saving maxwritten_clean > and buffers_allocated, too. While you may not be doing something > with them yet, the former will shed some light on what you're > running into now, and the latter is useful later down the road > you're walking. It is, I just didn't include them in the mail. -- Jeff Trout <jeff@jefftrout.com> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/