Thread: lowering impact of checkpoints
hi, our system is handling between 600 and 2000 transactions per second. all of them are very small, very fast. typical query runs in under 1ms. yes - sometimes we get queries that take longer than then should get. simple check shows that we have a very visible pattern of every-5-minutes peak. in the minute that there is checkpoint - we get usually 15-20 times more queries "over 500 ms" than in other minutes. we are using 8.2.4 (upgrade will be soon), with these settings: # select name, setting, unit from pg_settings where name ~* 'bgwriter|wal|checkpoint'; name | setting | unit -----------------------+-----------+-------- bgwriter_all_maxpages | 5 | bgwriter_all_percent | 0.333 | [null] bgwriter_delay | 200 | ms bgwriter_lru_maxpages | 5 | bgwriter_lru_percent | 1 | [null] checkpoint_segments | 32 | checkpoint_timeout | 300 | s checkpoint_warning | 30 | s wal_buffers | 128 | 8kB wal_sync_method | fdatasync | [null] (10 rows) is there anything i can change to make it "smoother"? depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
On Tue, 2007-09-25 at 11:58 +0200, hubert depesz lubaczewski wrote: > hi, > our system is handling between 600 and 2000 transactions per second. all > of them are very small, very fast. typical query runs in under 1ms. > yes - sometimes we get queries that take longer than then should get. > simple check shows that we have a very visible pattern of > every-5-minutes peak. > in the minute that there is checkpoint - we get usually 15-20 times more > queries "over 500 ms" than in other minutes. > > we are using 8.2.4 (upgrade will be soon), with these settings: > # select name, setting, unit from pg_settings where name ~* 'bgwriter|wal|checkpoint'; > name | setting | unit > -----------------------+-----------+-------- > bgwriter_all_maxpages | 5 | > bgwriter_all_percent | 0.333 | [null] > bgwriter_delay | 200 | ms > bgwriter_lru_maxpages | 5 | > bgwriter_lru_percent | 1 | [null] > checkpoint_segments | 32 | > checkpoint_timeout | 300 | s > checkpoint_warning | 30 | s > wal_buffers | 128 | 8kB > wal_sync_method | fdatasync | [null] > (10 rows) > > is there anything i can change to make it "smoother"? Sounds like bgwriter is not flushing dirty pages quickly enough, so there is still a lot of work to do at checkpoint time. You probably need to tune it. This can be a tough thing to do properly though. There are no magic values to suggest, as what will work is highly dependent on your hardware and your applications pattern of use. If possible, up the settings for bgwriter_all_percent a *little* and perhaps bgwriter_all_maxpages and see if it helps. You can change these with a reload. If you are doing this on a production system as opposed to a test system, keep a close eye on what is going on, as it is possible that you can make things worse. I would start with something like 2% for bgwriter_all_maxpages and see if that helps things out. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
On Tue, 25 Sep 2007, hubert depesz lubaczewski wrote: > name | setting | unit > -----------------------+-----------+-------- > bgwriter_all_maxpages | 5 | > bgwriter_all_percent | 0.333 | [null] > bgwriter_delay | 200 | ms > bgwriter_lru_maxpages | 5 | > bgwriter_lru_percent | 1 | [null] The background writer can help smooth out checkpoints a bit in 8.2.4, but these settings are barely doing anything; they aren't even in the right ballpark for a system that's doing 600-2000 TPS. You need to be careful here because making the background writer run too often can result in things running slower all the time, and in some cases it can even make the checkpoints worse. I would suggest changing these parameters one at a time, in the following order, and see what happens to the number of slow queries at checkpoint time after each change: bgwriter_lru_maxpages: increase to 500 bgwriter_lru_percent: increase to 5 bgwriter_all_maxpage: increase to 250 bgwriter_all_percent: increase to 2 If these are all positive changes, you might even want to increase these further; potentially you could double all of the above and still not have the settings high enough, and if that's the case you may have to adjust bgwriter_delay downward. Here is a message I'd suggest reading carefully from someone who went through the process you're starting now and ended up with a much more aggressive set of settings even than these: http://archives.postgresql.org/pgsql-hackers/2006-12/msg00383.php As Kevin suggests there, one thing that varies a bit based on the exact work you're doing is the ratio between how heavily you balance the all vs. lru weighting. His final settings use the all writer a bit more heavily than I'd normally recommend, but with his particular system that worked out well. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
"Greg Smith" <gsmith@gregsmith.com> writes: > On Tue, 25 Sep 2007, hubert depesz lubaczewski wrote: > >> name | setting | unit >> -----------------------+-----------+-------- >> bgwriter_all_maxpages | 5 | >> bgwriter_all_percent | 0.333 | [null] >> bgwriter_delay | 200 | ms >> bgwriter_lru_maxpages | 5 | >> bgwriter_lru_percent | 1 | [null] > > The background writer can help smooth out checkpoints a bit in 8.2.4, Not to disagree with anything Greg says here but you should keep in mind that all of these parameters will have to be retuned from scratch with 8.3 which has been optimized somewhat for just this problem. > If these are all positive changes, you might even want to increase these > further; potentially you could double all of the above and still not have the > settings high enough, and if that's the case you may have to adjust > bgwriter_delay downward. I'm surprised you don't start by suggesting lowering bgwriter_delay for a busy dedicated system. Does it cause too much wasted cpu work in the "all" cycle in 8.2? I also wonder if it doesn't make more sense in 8.2 if your goal is to avoid drop-outs to just give up on the lru cycle entirely and set the delay to something like 60s and the all_percent to 100. Effectively saying to flush all dirty buffers once a minute to smooth the checkpoint. I haven't tried doing anything like that though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Tue, 25 Sep 2007, Gregory Stark wrote: > I'm surprised you don't start by suggesting lowering bgwriter_delay for a busy > dedicated system. Does it cause too much wasted cpu work in the "all" cycle in > 8.2? I've just found it easier to sort through this class of problem by getting the maxpages parameters into at least the 200-500 range before even thinking about lowering the delay. There may very well be a different way to approach this problem by making the delay more of a primary tunable. Certainly there's potentially an advantage to lowering the delay in that it gets writes trickling out to disk more regularly. > I also wonder if it doesn't make more sense in 8.2 if your goal is to avoid > drop-outs to just give up on the lru cycle entirely and set the delay to > something like 60s and the all_percent to 100. There are some workloads where flushing the buffers that haven't been used recently in the lru cycle is more useful than what the all scan does; it's hard to figure out whether your system is such a case or not in 8.2 though. In addition, the main problem with using a longer cycle/higher percentage is that the way some operating systems buffer writes favors writing small blocks more frequently. In the Linux case there are situations where writes sit there for a full 30 seconds so getting the physical disk started earlier is a benefit. I'd be concerned that all_percent=100 would end up generating something close to a checkpoint I/O spike every cycle, and that the background writer waiting for that big write to finish might delay checkpoint requests from processing in a timely fashion. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD