checkpoints/bgwriter tuning verification - Mailing list pgsql-general

From Vick Khera
Subject checkpoints/bgwriter tuning verification
Date
Msg-id 2968dfd60910291146o6dfa5992i3582ef9a48234be7@mail.gmail.com
Whole thread Raw
Responses Re: checkpoints/bgwriter tuning verification
Re: checkpoints/bgwriter tuning verification
Re: checkpoints/bgwriter tuning verification
List pgsql-general
On my primary DB I'm observing random slowness which just doesn't make
sense to me.  The I/O system can easily do 40MB/sec writes, but I'm
only seeing a sustained 5MB/sec, even as the application is stalling
waiting on the DB.

My only guess is that I'm getting hit by checkpoints too often, and
this is causing some delays.  I did a lot of reading and came across
this most excelelent article by Greg Smith
(http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm).
Reading that and the Pg manual and looking at the statistics, here is
what I think I need to do.  (Greg, do you have a performance tuning
book?  If so, I really want to buy it!  Your articles are awesome.)

current settings for checkpoints:
checkpoint_segments = 32
checkpoint_timeout = 900
checkpoint_completion_target = 0.9
log_checkpoints = on

In my logs I see this pretty consistently during times of slowness:

Oct 29 14:17:38 d04 postgres[54902]: [10990-1] LOG:  checkpoint starting: xlog
Oct 29 14:22:04 d04 postgres[54902]: [10991-1] LOG:  checkpoint
complete: wrote 52828 buffers (24.3%); 0 transaction log file(s)
added, 0 removed, 32 recycled; write=265.881
Oct 29 14:22:57 d04 postgres[54902]: [10992-1] LOG:  checkpoint starting: xlog
Oct 29 14:26:56 d04 postgres[54902]: [10993-1] LOG:  checkpoint
complete: wrote 52773 buffers (24.3%); 0 transaction log file(s)
added, 0 removed, 32 recycled; write=234.846
Oct 29 14:28:32 d04 postgres[54902]: [10994-1] LOG:  checkpoint starting: xlog
Oct 29 14:33:32 d04 postgres[54902]: [10995-1] LOG:  checkpoint
complete: wrote 53807 buffers (24.7%); 0 transaction log file(s)
added, 0 removed, 32 recycled; write=299.170


basically, the next checkpoint starts within a few seconds of the
prior one completing.


The stats show this:

# select * from pg_stat_bgwriter;
-[ RECORD 1 ]------+-----------
checkpoints_timed  | 3236
checkpoints_req    | 83044
buffers_checkpoint | 1376460896
buffers_clean      | 59124159
maxwritten_clean   | 304410
buffers_backend    | 285595787
buffers_alloc      | 6643047623

Based on Greg's article and the above number showing that most
checkpoints are triggered by running out of WAL segments, I should
increase my checkpoint_buffers.  Also, based on the article, I should
increase the bgwriter_lru_maxpages (currently at default 100).

My plan is to bump checkpoint_segments to 48 and reduce
checkpoint_completion_target to 0.7, and bump the
bgwriter_lru_maxpages to 500.

Can the checkpoint operation actually cause the DB to stop responding
for a few seconds at a time?  That seems to be what I observe.
Sometimes for 5 or more seconds one transaction will just stall.

Thanks for any ideas.

pgsql-general by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Can't connect
Next
From: Brad Nicholson
Date:
Subject: Re: checkpoints/bgwriter tuning verification