Re: Checkpoint spikes - Mailing list pgsql-performance

From Greg Smith
Subject Re: Checkpoint spikes
Date
Msg-id 4B1F05F4.6020305@2ndquadrant.com
Whole thread Raw
In response to Re: Checkpoint spikes  (Richard Neill <rn214@cam.ac.uk>)
Responses Re: Checkpoint spikes  (Andres Freund <andres@anarazel.de>)
List pgsql-performance
Richard Neill wrote:
> (does the advice for 8.3 apply unchanged to 8.4?)
Yes; no changes in this area for 8.4.  The main things performance
related that changed between 8.3 and 8.4 are:
1) VACUUM free space management reimplemented so that the max_fsm_*
parameters aren't needed anymore
2) default_statistics_target now starts at 100 instead of 10

> So far, I've set checkpoint_segments to 128, timeout to 10min, and
> completion_target to 0.8. This helps, but not as much as I'd hoped.
Good, if the problem is moving in the right direction you're making
progress.

> But I haven't touched any of the other WAL or BG Writer settings.
> Where should I look next?
>   Should I be looking at the BG Writer settings,
>   or should I look at the Linux VM configuration?
>     (eg changing /proc/sys/vm/dirty_background_ratio from 5 to 1)
I would start by reducing dirty_background_ratio; as RAM sizes climb,
this keeps becoming a bigger issue.  The whole disk flushing code
finally got a major overhaul in the 2.6.32 Linux kernel, I'm hoping this
whole class of problem was improved from the changes made.

Changes to the background writer behavior will probably not work as
you'd expect.  The first thing I'd try it in your situation turning it
off altogether; it can be slightly counterproductive for reducing
checkpoint issues if they're really bad, which yours are.  If that goes
in the wrong direction, experimenting with increasing the maximum pages
and the multiplier might be useful, I wouldn't bet on it helping through.

As Kevin already mentioned, reducing the size of the buffer cache can
help too.  That's worth trying if you're exhausted the other obvious
possibilities.

>   Or would it be most useful to try to move the WAL to a different disk?
On Linux having the WAL on a separate disk can improve things much more
than you might expect, simply because of how brain-dead the filesystem
fsync implementation is.  Reducing the seeks for WAL traffic can help a
lot too.

If you've lowered Linux's caching, tried some BGW tweaks, and moved the
WAL to somewhere else, if latency is still high you may be facing a
hardware upgrade to improve things.  Sometimes these problems just
require more burst write throughput (regardless of how good average
performance looks) and nothing else will substitute.  Hopefully you'll
find a tuning solution before that though.


--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: Optimizing Bitmap Heap Scan.
Next
From: Andres Freund
Date:
Subject: Re: Checkpoint spikes