Re: checkpoints/bgwriter tuning verification - Mailing list pgsql-general

From Greg Smith
Subject Re: checkpoints/bgwriter tuning verification
Date
Msg-id alpine.GSO.2.01.0910292208480.23558@westnet.com
Whole thread Raw
In response to checkpoints/bgwriter tuning verification  (Vick Khera <vivek@khera.org>)
Responses Re: checkpoints/bgwriter tuning verification
Re: checkpoints/bgwriter tuning verification
List pgsql-general
On Thu, 29 Oct 2009, Vick Khera wrote:

> Greg, do you have a performance tuning book?  If so, I really want to
> buy it!  Your articles are awesome.

Give me a few more months...

> basically, the next checkpoint starts within a few seconds of the prior
> one completing.

That's the expected behavior with a high setting for the completion
target.  Each checkpoint finishes just before the next one needs to start.
That's the most you can spread them out, and more spreading generally
equals better behavior.  Usually; always exceptions of course.

> Based on Greg's article and the above number showing that most
> checkpoints are triggered by running out of WAL segments, I should
> increase my checkpoint_buffers.  Also, based on the article, I should
> increase the bgwriter_lru_maxpages (currently at default 100).

You might actually want to decrease bgwriter_lru_maxpages to 0.  The
current background writer designs presumes you have some spare I/O
capacity and want to trade off some write overhead for possibly lower
latency in client backends.  If you're at the point where the disks aren't
keeping up with the checkpoint disk I/O, this could turn
counter-productive.

> My plan is to bump checkpoint_segments to 48 and reduce
> checkpoint_completion_target to 0.7, and bump the
> bgwriter_lru_maxpages to 500.

In your situation, I would increase checkpoint_segments to 64, keep the
completion target at 0.9, and decrease bgwriter_lru_maxpages to 0 so that
feature is turned off altogether.  If you're still not happy, you can try
dropping the completion target too; I've heard one report of that logic
not working out so well in practice, where lower settings actually
resulted in less of an I/O spike.  I would worry about the OS tuning first
though, which brins us to:

> Can the checkpoint operation actually cause the DB to stop responding
> for a few seconds at a time?  That seems to be what I observe.
> Sometimes for 5 or more seconds one transaction will just stall.

There are two ways that that writes can hang:

1) You've gotten to the point in the checkpoint cycle where it's calling
fsync to flush everything out of the filesystem.  At this point you could
potentially have a big chunk of data that needs to go out to disk, and any
other client that needs to write something (or needs a resource locked by
a writer) is stuck behind that.

2) The entire OS write cache is filled.  Once that happens, if a client
tries to write something else, the OS will force it to nap until there's
space again to hold that write.

How that all works really depends on the OS.  I know that under Linux, the
way this is all handled is really bad, and can easily lead to multi-second
hangs.  Generally you can watch the writes accumulate by looking at
/proc/meminfo.  I wrote an example showing how the problem appears and
what I did to improve long hangs one server at
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html and
it refers to a long theory page on Linux's pdflush on my web page.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-general by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Absolute value of intervals
Next
From: Tom Lane
Date:
Subject: Re: Absolute value of intervals