Re: checkpoints/bgwriter tuning verification - Mailing list pgsql-general
From | Brad Nicholson |
---|---|
Subject | Re: checkpoints/bgwriter tuning verification |
Date | |
Msg-id | 1256846337.11161.1417.camel@bnicholson-desktop Whole thread Raw |
In response to | checkpoints/bgwriter tuning verification (Vick Khera <vivek@khera.org>) |
List | pgsql-general |
On Thu, 2009-10-29 at 14:46 -0400, Vick Khera wrote: > On my primary DB I'm observing random slowness which just doesn't make > sense to me. The I/O system can easily do 40MB/sec writes, but I'm > only seeing a sustained 5MB/sec, even as the application is stalling > waiting on the DB. > > My only guess is that I'm getting hit by checkpoints too often, and > this is causing some delays. I did a lot of reading and came across > this most excelelent article by Greg Smith > (http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm). > Reading that and the Pg manual and looking at the statistics, here is > what I think I need to do. (Greg, do you have a performance tuning > book? If so, I really want to buy it! Your articles are awesome.) > > current settings for checkpoints: > checkpoint_segments = 32 > checkpoint_timeout = 900 > checkpoint_completion_target = 0.9 > log_checkpoints = on > > In my logs I see this pretty consistently during times of slowness: > > Oct 29 14:17:38 d04 postgres[54902]: [10990-1] LOG: checkpoint starting: xlog > Oct 29 14:22:04 d04 postgres[54902]: [10991-1] LOG: checkpoint > complete: wrote 52828 buffers (24.3%); 0 transaction log file(s) > added, 0 removed, 32 recycled; write=265.881 > Oct 29 14:22:57 d04 postgres[54902]: [10992-1] LOG: checkpoint starting: xlog > Oct 29 14:26:56 d04 postgres[54902]: [10993-1] LOG: checkpoint > complete: wrote 52773 buffers (24.3%); 0 transaction log file(s) > added, 0 removed, 32 recycled; write=234.846 > Oct 29 14:28:32 d04 postgres[54902]: [10994-1] LOG: checkpoint starting: xlog > Oct 29 14:33:32 d04 postgres[54902]: [10995-1] LOG: checkpoint > complete: wrote 53807 buffers (24.7%); 0 transaction log file(s) > added, 0 removed, 32 recycled; write=299.170 > > > basically, the next checkpoint starts within a few seconds of the > prior one completing. That's most likely due to having checkpoint_completion target set near the maximum. You are telling it to stretch the checkpoint out as long as possible. > > The stats show this: > > # select * from pg_stat_bgwriter; > -[ RECORD 1 ]------+----------- > checkpoints_timed | 3236 > checkpoints_req | 83044 > buffers_checkpoint | 1376460896 > buffers_clean | 59124159 > maxwritten_clean | 304410 > buffers_backend | 285595787 > buffers_alloc | 6643047623 > > Based on Greg's article and the above number showing that most > checkpoints are triggered by running out of WAL segments, I should > increase my checkpoint_buffers. Also, based on the article, I should > increase the bgwriter_lru_maxpages (currently at default 100). > > My plan is to bump checkpoint_segments to 48 and reduce > checkpoint_completion_target to 0.7, That's what I would do. I think you getting hit by checkpointing too frequently or stretching the checkpoints out too long (or a combo of both). I'd also keep bumping checkpoint_segements up until you are checkpointing on the timeout. This will give you a longer window between checkpoints - which gives more time to smooth the the checkpoint out. > and bump the > bgwriter_lru_maxpages to 500. I would probably start by increasing the number of WAL segments and decreasing the checkpoint_completion_timeout and see if that solves the problem. I wouldn't change the bgwriter_lru_maxpages at the same time, and wouldn't necessarily increase it by a factor of 5 when doing so. It does look like you need to increase it though. > Can the checkpoint operation actually cause the DB to stop responding > for a few seconds at a time? That seems to be what I observe. > Sometimes for 5 or more seconds one transaction will just stall. Absolutely. The smoothing is supposed to alleviate that, but I'm not sure how well it works with aggressive activity driven checkpoints like you have. Keep up posted, I'm curious. > Thanks for any ideas. > -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
pgsql-general by date: