Thread: Occasional Slow Commit

Occasional Slow Commit

From
"David Rees"
Date:
Hi,

I've got an OLTP application which occasionally suffers from slow
commit time. The process in question does something like this:

1. Do work
2. begin transaction
3. insert record
4. commit transaction
5. Do more work
6. begin transaction
7. update record
8. commit transaction
9. Do more work

The vast majority of the time, everything runs very quickly. The
median processing time for the whole thing is 7ms.

However, occasionally, processing time will jump up significantly -
the average processing time is around 20ms with the maximum processing
time taking 2-4 seconds for a small percentage of transactions. Ouch!

Turning on statement logging and analyzing the logs of the application
itself shows that step #4 is the culprit of the vast majority of the
slow transactions.

Software: CentOS 4.7, PostgreSQL 8.3.4, Slony-I 1.2.15 (the database
in question is replicated using slony)

Hardware: 2x Xeon 5130, 4GB RAM, 6-disk RAID10 15k RPM, BBU on the controller

Notable configuration changes:
shared_buffers = 800MB
temp_buffers = 200MB
work_mem = 16M
maintenance_work_mem = 800MB
vacuum_cost_delay = 10
checkpoint_segments = 10
effective_cache_size = 2500MB

I found this post[1] from a while back which was informative:

Both situations affect me in that I have Slony which I believe
executes triggers upon commit, and looking at the disk IO stats, there
is an elevated level of IO activity during this time, but it doesn't
appear to be heavy enough to cause the type of delays I am seeing.

Reading this page[2] indicates that I may want to increase my
checkpoint_segments, checkpoint_timeout and bgwriter settings, but
looking at pg_stat_bgwriter seems to indicate that my current settings
are probably OK?

# select * from pg_stat_bgwriter;
 checkpoints_timed | checkpoints_req | buffers_checkpoint |
buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc

-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------
              3834 |             105 |            3091905 |
25876 |              110 |         2247576 |       2889873

Any suggestions on how to proceed and debug the problem from here?

My only other guess is that there is some sort of locking issues going
on which is slowing things down and that it may also be slony related,
as I also see a high number of slony related queries taking longer
than 1 second...

Thanks

-Dave

[1] http://archives.postgresql.org/pgsql-performance/2008-01/msg00005.php
[2] http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

Re: Occasional Slow Commit

From
"Merlin Moncure"
Date:
On Mon, Oct 27, 2008 at 8:23 PM, David Rees <drees76@gmail.com> wrote:
> Hi,
>
> I've got an OLTP application which occasionally suffers from slow
> commit time. The process in question does something like this:
>
> 1. Do work
> 2. begin transaction
> 3. insert record
> 4. commit transaction
> 5. Do more work
> 6. begin transaction
> 7. update record
> 8. commit transaction
> 9. Do more work
>
> The vast majority of the time, everything runs very quickly. The
> median processing time for the whole thing is 7ms.
>
> However, occasionally, processing time will jump up significantly -
> the average processing time is around 20ms with the maximum processing
> time taking 2-4 seconds for a small percentage of transactions. Ouch!
>
> Turning on statement logging and analyzing the logs of the application
> itself shows that step #4 is the culprit of the vast majority of the
> slow transactions.
>
> Software: CentOS 4.7, PostgreSQL 8.3.4, Slony-I 1.2.15 (the database
> in question is replicated using slony)
>
> Hardware: 2x Xeon 5130, 4GB RAM, 6-disk RAID10 15k RPM, BBU on the controller
>
> Notable configuration changes:
> shared_buffers = 800MB
> temp_buffers = 200MB
> work_mem = 16M
> maintenance_work_mem = 800MB
> vacuum_cost_delay = 10
> checkpoint_segments = 10
> effective_cache_size = 2500MB
>
> I found this post[1] from a while back which was informative:
>
> Both situations affect me in that I have Slony which I believe
> executes triggers upon commit, and looking at the disk IO stats, there
> is an elevated level of IO activity during this time, but it doesn't
> appear to be heavy enough to cause the type of delays I am seeing.
>
> Reading this page[2] indicates that I may want to increase my
> checkpoint_segments, checkpoint_timeout and bgwriter settings, but
> looking at pg_stat_bgwriter seems to indicate that my current settings
> are probably OK?
>
> # select * from pg_stat_bgwriter;
>  checkpoints_timed | checkpoints_req | buffers_checkpoint |
> buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc
>
-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------
>              3834 |             105 |            3091905 |
> 25876 |              110 |         2247576 |       2889873
>
> Any suggestions on how to proceed and debug the problem from here?
>
> My only other guess is that there is some sort of locking issues going
> on which is slowing things down and that it may also be slony related,
> as I also see a high number of slony related queries taking longer
> than 1 second...

I bet your problem is disk syncing.  Your xlogs are on the data volume
so any type of burst activity can push back commit times.  If this is
the case, you have basically three solutions to this problem:
*) buy more disks (i's start with pushing the xlogs out to dedicated volume)
*) disable fsync (very unsafe) or synchronous commit (somewhat less unsafe)
*) checkpoint/bgwriter tuning: can provide incremental gains.  This is
not magic...at best you can smooth out bursty checkpoints.  If your
problems are really serious (yours don't seem to be), you have to look
at the previous options.

Have you temporarily disabling slony to see if the problem goes away?
(My guess is it's not slony).

merlin

Re: Occasional Slow Commit

From
Andrew Sullivan
Date:
On Mon, Oct 27, 2008 at 05:23:37PM -0700, David Rees wrote:

> However, occasionally, processing time will jump up significantly -
> the average processing time is around 20ms with the maximum processing
> time taking 2-4 seconds for a small percentage of transactions. Ouch!
>
> Turning on statement logging and analyzing the logs of the application
> itself shows that step #4 is the culprit of the vast majority of the
> slow transactions.

My bet is that you're waiting on checkpoints.  Given that you're on
8.3, start fiddling with the checkpoint_completion_target parameter.
0.7 might help.

A

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: Occasional Slow Commit

From
Greg Smith
Date:
On Mon, 27 Oct 2008, David Rees wrote:

> Software: CentOS 4.7, PostgreSQL 8.3.4, Slony-I 1.2.15 (the database
> in question is replicated using slony)
> Hardware: 2x Xeon 5130, 4GB RAM, 6-disk RAID10 15k RPM, BBU on the controller

The CentOS 4.7 kernel will happily buffer about 1.6GB of writes with that
much RAM, and the whole thing can get slammed onto disk during the final
fsync portion of the checkpoint.  What you should do first is confirm
whether or not the slow commits line up with the end of the checkpoint,
which is easy to see if you turn on log_checkpoints.  That gives you
timings for the write and fsync phases of the checkpoint which can also be
informative.

> Reading this page[2] indicates that I may want to increase my
> checkpoint_segments, checkpoint_timeout and bgwriter settings, but
> looking at pg_stat_bgwriter seems to indicate that my current settings
> are probably OK?
>
> # select * from pg_stat_bgwriter;
> checkpoints_timed | checkpoints_req | buffers_checkpoint |
>              3834 |             105 |            3,091,905 |
> buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc
> 25876 |              110 |         2,247,576 |       2,889,873

I reformatted the above to show what's happening a bit better.  Most of
your checkpoints are the timed ones, which are unlikely to cause
interference from a slow commit period (the writes are spread out over 5
minutes in those cases).  It's quite possible the slow periods are coming
only from the occasional requested checkpoints, which normally show up
because checkpoint_segments is too low and you chew through segments too
fast.  If you problems line up with checkpoint time, you would likely gain
some benefit from increasing checkpoint_segments to spread out the
checkpoint writes further; the 10 you're using now is on the low side for
your hardware.

If the problems don't go away after that, you may be suffering from
excessive Linux kernel buffering instead.  I've got a blog entry showing
how I tracked down a similar long pause on a Linux server at
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html you
may find helpful for determining if your issue is this one (which is
pretty common on RHEL systems having relatively large amounts of RAM) or
if it's something else, like the locking you mentioned.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: Occasional Slow Commit

From
"David Rees"
Date:
On Wed, Oct 29, 2008 at 6:26 AM, Greg Smith <gsmith@gregsmith.com> wrote:
> The CentOS 4.7 kernel will happily buffer about 1.6GB of writes with that
> much RAM, and the whole thing can get slammed onto disk during the final
> fsync portion of the checkpoint.  What you should do first is confirm
> whether or not the slow commits line up with the end of the checkpoint,
> which is easy to see if you turn on log_checkpoints.  That gives you timings
> for the write and fsync phases of the checkpoint which can also be
> informative.

OK, log_checkpoints is turned on to see if any delays correspond to
checkpoint activity...

>> Reading this page[2] indicates that I may want to increase my
>> checkpoint_segments, checkpoint_timeout and bgwriter settings, but
>> looking at pg_stat_bgwriter seems to indicate that my current settings
>> are probably OK?
>>
>> # select * from pg_stat_bgwriter;
>> checkpoints_timed | checkpoints_req | buffers_checkpoint |
>>             3834 |             105 |            3,091,905 |
>> buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc
>> 25876 |              110 |         2,247,576 |       2,889,873
>
> I reformatted the above to show what's happening a bit better.

Sorry, gmail killed the formatting.

> Most of your
> checkpoints are the timed ones, which are unlikely to cause interference
> from a slow commit period (the writes are spread out over 5 minutes in those
> cases).  It's quite possible the slow periods are coming only from the
> occasional requested checkpoints, which normally show up because
> checkpoint_segments is too low and you chew through segments too fast.  If
> you problems line up with checkpoint time, you would likely gain some
> benefit from increasing checkpoint_segments to spread out the checkpoint
> writes further; the 10 you're using now is on the low side for your
> hardware.

OK, I've also bumped up checkpoint_segments to 20 and
checkpoint_completion_target to 0.7 in an effort to reduce the effect
of checkpoints.

> If the problems don't go away after that, you may be suffering from
> excessive Linux kernel buffering instead.  I've got a blog entry showing how
> I tracked down a similar long pause on a Linux server at
> http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html you
> may find helpful for determining if your issue is this one (which is pretty
> common on RHEL systems having relatively large amounts of RAM) or if it's
> something else, like the locking you mentioned.

Ah, interesting. I've also turned down the dirty_ratio and
dirty_background_ratio as suggested, but I don't think this would be
affecting things here. The rate of IO on this server is very low
compared to what it's capable of.

Thanks for the suggestions, I'll report back with results.

-Dave

Re: Occasional Slow Commit

From
"David Rees"
Date:
(Resending this, the first one got bounced by mail.postgresql.org)

On Wed, Oct 29, 2008 at 3:30 PM, David Rees <drees76@gmail.com> wrote:
> On Wed, Oct 29, 2008 at 6:26 AM, Greg Smith <gsmith@gregsmith.com> wrote:
>> What you should do first is confirm
>> whether or not the slow commits line up with the end of the checkpoint,
>> which is easy to see if you turn on log_checkpoints.  That gives you timings
>> for the write and fsync phases of the checkpoint which can also be
>> informative.
>
> OK, log_checkpoints is turned on to see if any delays correspond to
> checkpoint activity...

Well, I'm pretty sure the delays are not checkpoint related. None of
the slow commits line up at all with the end of checkpoints.

The period of high delays occur during the same period of time each
week, but it's not during a particularly high load period on the
systems.

It really seems like there must be something running in the background
that is not showing up on the system activity logs, like a background
RAID scrub or something.

Here are a couple representative checkpoint complete messages from the logs:

2008-10-31 20:12:03 UTC: : : LOG:  checkpoint complete: wrote 285
buffers (0.3%); 0 transaction log file(s) added, 0 removed, 0
recycled; write=57.933 s, sync=0.053 s, total=57.990 s
2008-10-31 20:17:33 UTC: : : LOG:  checkpoint complete: wrote 437
buffers (0.4%); 0 transaction log file(s) added, 0 removed, 0
recycled; write=87.891 s, sync=0.528 s, total=88.444 s
2008-10-31 20:22:05 UTC: : : LOG:  checkpoint complete: wrote 301
buffers (0.3%); 0 transaction log file(s) added, 0 removed, 1
recycled; write=60.774 s, sync=0.033 s, total=60.827 s
2008-10-31 20:27:46 UTC: : : LOG:  checkpoint complete: wrote 504
buffers (0.5%); 0 transaction log file(s) added, 0 removed, 0
recycled; write=101.037 s, sync=0.049 s, total=101.122 s

During this period of time there was probably 100 different queries
writing/commiting data that took longer than a second (3-4 seconds
seems to be the worst).

The RAID controller on this machine is some sort of MegaRAID
controller. I'll have to see if there is some sort of scheduled scan
running during this period of time.

One of the replicate nodes is an identical machine which I just
noticed has the same slony commit slow downs logged even though it's
only receiving data from slony from the primary node. There are two
other nodes listening in on the same subscription, but these two nodes
don't show the same slony commit slow downs, but these machines use a
slightly different raid controller and are about 9 months newer than
the other two.

I'm still hoping that the checkpoint tuning has reduced commit latency
during busy periods, I should have more data after the weekend.

-Dave

Re: Occasional Slow Commit

From
"David Rees"
Date:
On Fri, Oct 31, 2008 at 4:14 PM, David Rees <drees76@gmail.com> wrote:
> Well, I'm pretty sure the delays are not checkpoint related. None of
> the slow commits line up at all with the end of checkpoints.
>
> The period of high delays occur during the same period of time each
> week, but it's not during a particularly high load period on the
> systems.
>
> It really seems like there must be something running in the background
> that is not showing up on the system activity logs, like a background
> RAID scrub or something.

OK, I finally had a chance to dig at this problem some more, and after
futzing around with the MegaCli tools (major PITA, btw), I was able to
confirm that there is a feature called "Patrol Read" on this LSI
Megaraid SAS card which runs a weekly background read scan of the
disks looking for errors. It is during this time period that I get
lots of slow commits and transactions.

FWIW, the card identifies itself from lspci as this:

LSI Logic / Symbios Logic MegaRAID SAS
Subsystem: Intel Corporation SROMBSAS18E RAID Controller

I also found that my write cache was set to WriteThrough instead of
WriteBack, defeating the purpose of having a BBU and that my secondary
server apparently doesn't have a BBU on it. :-(

Anyway, has anyone done any benchmarking of MegaRAID SAS controllers?
I am configuring my arrays to use these settings:

Read Policy: Normal (Normal, Read ahead & Adaptive read head)
Write Policy: Writeback (Writeback, Writethrough)
Disable Writeback if bad BBU
IO Policy: Direct (Direct, Cached)
Disk Cache: Enable (Enable, Disable, Unchanged)

The only setting I'm really concerned about is the Disk Cache setting
- is it safe to assume that the controller will do the right thing
with regards to flushing the disk cache when appropriate to avoid data
loss? LSI RAID cards seem to be pretty well respected, so I'd have to
guess yes.

Thanks

Dave

Re: Occasional Slow Commit

From
Peter Schuller
Date:
> I also found that my write cache was set to WriteThrough instead of
> WriteBack, defeating the purpose of having a BBU and that my secondary
> server apparently doesn't have a BBU on it. :-(

Note also that several RAID controllers will periodically drop the
write-back mode during battery capacity tests. If you care about
consistently/deterministically having full performance (with
white-backed battery protected caching), you probably want to confirm
your controller behavior here.

(I've seen this on at least LSI based controllers in Dell 2950:s, and
also on some 3wares.)

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


Attachment

Re: Occasional Slow Commit

From
"David Rees"
Date:
On Thu, Nov 6, 2008 at 2:21 AM, Peter Schuller
<peter.schuller@infidyne.com> wrote:
>> I also found that my write cache was set to WriteThrough instead of
>> WriteBack, defeating the purpose of having a BBU and that my secondary
>> server apparently doesn't have a BBU on it. :-(
>
> Note also that several RAID controllers will periodically drop the
> write-back mode during battery capacity tests. If you care about
> consistently/deterministically having full performance (with
> white-backed battery protected caching), you probably want to confirm
> your controller behavior here.
>
> (I've seen this on at least LSI based controllers in Dell 2950:s, and
> also on some 3wares.)

I can confirm that this is the case by reviewing the logs stored on
the MegaRAID controller that have a BBU and had WriteBack configured.
The controller also lets you know (using the MegaCli utility) what
setting is configured, and what setting is in effect.

In the case of the machines without a BBU on them, they are configured
to be in WriteBack, but are actually running in WriteThrough.

-Dave

Re: Occasional Slow Commit

From
"Scott Marlowe"
Date:
On Thu, Nov 6, 2008 at 8:47 AM, David Rees <drees76@gmail.com> wrote:
>
> In the case of the machines without a BBU on them, they are configured
> to be in WriteBack, but are actually running in WriteThrough.

I'm pretty sure the LSIs will refuse to actually run in writeback without a BBU.

Re: Occasional Slow Commit

From
"David Rees"
Date:
On Thu, Nov 6, 2008 at 8:07 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Nov 6, 2008 at 8:47 AM, David Rees <drees76@gmail.com> wrote:
>>
>> In the case of the machines without a BBU on them, they are configured
>> to be in WriteBack, but are actually running in WriteThrough.
>
> I'm pretty sure the LSIs will refuse to actually run in writeback without a BBU.

It's configurable (at least on the MegaRAID cards I'm using). There's
an option you can turn on that lets you run in writeback when the BBU
is offline.

-Dave