Re: Slow Query / Check Point Segments - Mailing list pgsql-general

From Greg Smith
Subject Re: Slow Query / Check Point Segments
Date
Msg-id 4B5A5AC6.1090302@2ndquadrant.com
Whole thread Raw
In response to Re: Slow Query / Check Point Segments  (John R Pierce <pierce@hogranch.com>)
Responses Re: Slow Query / Check Point Segments  (John R Pierce <pierce@hogranch.com>)
List pgsql-general
John R Pierce wrote:
> we're having a similar problem with a very update intensive database
> that is part of a 24/7 manufacturing operation (no breathing time
> unless there's an unusual line down situtation)

Your problem is actually a bit different.

> 2010-01-23 01:08:13 MYTLOG:  checkpoint complete: wrote 7535 buffers
> (2.9%); 0 transaction log file(s) added, 0 removed, 20 recycled;
> write=86.510 s, sync=0.260 s, total=86.791 s
> 2010-01-23 01:13:22 MYTLOG:  checkpoint complete: wrote 12786 buffers
> (4.9%); 0 transaction log file(s) added, 0 removed, 17 recycled;
> write=95.028 s, sync=0.135 s, total=95.182 s
> 2010-01-23 01:16:22 MYTLOG:  checkpoint complete: wrote 11720 buffers
> (4.5%); 0 transaction log file(s) added, 0 removed, 19 recycled;
> write=41.854 s, sync=0.640 s, total=42.518 s
> 2010-01-23 01:19:11 MYTLOG:  checkpoint complete: wrote 10436 buffers
> (4.0%); 0 transaction log file(s) added, 0 removed, 20 recycled;
> write=50.330 s, sync=0.510 s, total=50.861 s
>

Your interval between checkpoints isn't too bad--3 minutes on the worst
of these.  And the amount per checkpoint isn't that high either:  12786
buffers=100MB.  Your problem is that the sync on that 100MB write is
taking an awfully long time.  Spacing your checkpoints out further would
help, sure, but I wouldn't be surprised to find that doesn't do much for
you.  Worth a shot, unlikely to hurt anything, just not optimistic about
it being the right think the way I was on the other message.

> This is running on a 16 core UltrasparcIV server, solaris 10, zfs,
> tons of disks on SAN with multiple tablespaces on different ZFS's,
> with the block sizes tuned appropriately

My guess is that there's something wrong with your config such that
writes followed by fsync are taking longer than they should.  When I see
"sync=0.640 s" into a SAN where that sync operation should be near
instant, I'd be looking for issues in the ZFS intent log setup, how much
data you've setup ZFS to write cache, and the write cache policy on the
SAN hardware.  There's something weird going on here, that sync should
be near instant on your hardware with such a small write volume.

Unfortunately, you may have to take the server down to find out exactly
what's going on, which doesn't sound practical for your environment.
(And people wonder why I have near religious zeal about testing disk
hardware before systems go into production)

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


pgsql-general by date:

Previous
From: PG User 2010
Date:
Subject: Re: VACUUM FULL performance issues with pg_largeobject table
Next
From: Craig Ringer
Date:
Subject: Re: Shall I convert all my stored procedure to use "EXECUTE", to ensure I get index-scan