Re: Occasional Slow Commit - Mailing list pgsql-performance

From Greg Smith
Subject Re: Occasional Slow Commit
Date
Msg-id Pine.GSO.4.64.0810290906150.19233@westnet.com
Whole thread Raw
In response to Occasional Slow Commit  ("David Rees" <drees76@gmail.com>)
Responses Re: Occasional Slow Commit  ("David Rees" <drees76@gmail.com>)
List pgsql-performance
On Mon, 27 Oct 2008, David Rees wrote:

> Software: CentOS 4.7, PostgreSQL 8.3.4, Slony-I 1.2.15 (the database
> in question is replicated using slony)
> Hardware: 2x Xeon 5130, 4GB RAM, 6-disk RAID10 15k RPM, BBU on the controller

The CentOS 4.7 kernel will happily buffer about 1.6GB of writes with that
much RAM, and the whole thing can get slammed onto disk during the final
fsync portion of the checkpoint.  What you should do first is confirm
whether or not the slow commits line up with the end of the checkpoint,
which is easy to see if you turn on log_checkpoints.  That gives you
timings for the write and fsync phases of the checkpoint which can also be
informative.

> Reading this page[2] indicates that I may want to increase my
> checkpoint_segments, checkpoint_timeout and bgwriter settings, but
> looking at pg_stat_bgwriter seems to indicate that my current settings
> are probably OK?
>
> # select * from pg_stat_bgwriter;
> checkpoints_timed | checkpoints_req | buffers_checkpoint |
>              3834 |             105 |            3,091,905 |
> buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc
> 25876 |              110 |         2,247,576 |       2,889,873

I reformatted the above to show what's happening a bit better.  Most of
your checkpoints are the timed ones, which are unlikely to cause
interference from a slow commit period (the writes are spread out over 5
minutes in those cases).  It's quite possible the slow periods are coming
only from the occasional requested checkpoints, which normally show up
because checkpoint_segments is too low and you chew through segments too
fast.  If you problems line up with checkpoint time, you would likely gain
some benefit from increasing checkpoint_segments to spread out the
checkpoint writes further; the 10 you're using now is on the low side for
your hardware.

If the problems don't go away after that, you may be suffering from
excessive Linux kernel buffering instead.  I've got a blog entry showing
how I tracked down a similar long pause on a Linux server at
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html you
may find helpful for determining if your issue is this one (which is
pretty common on RHEL systems having relatively large amounts of RAM) or
if it's something else, like the locking you mentioned.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


pgsql-performance by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: Occasional Slow Commit
Next
From: "David Rees"
Date:
Subject: Re: Occasional Slow Commit