Re: Slow Query / Check Point Segments - Mailing list pgsql-general

From John R Pierce
Subject Re: Slow Query / Check Point Segments
Date
Msg-id 4B59FC84.5000101@hogranch.com
Whole thread Raw
In response to Re: Slow Query / Check Point Segments  (Greg Smith <greg@2ndquadrant.com>)
Responses Re: Slow Query / Check Point Segments  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-general
Greg Smith wrote:
>> 2010-01-22 12:21:48 JSTLOG:  checkpoint complete: wrote 83874 buffers
>> (16.0%); 0 transaction log file(s) added, 0 removed, 32 recycled;
>> write=138.040 s, sync=0.000 s, total=138.063 s
>> 2010-01-22 12:23:32 JSTLOG:  checkpoint complete: wrote 82856 buffers
>> (15.8%); 0 transaction log file(s) added, 0 removed, 32 recycled;
>> write=18.740 s, sync=0.000 s, total=18.783 s
>> 2010-01-22 12:24:26 JSTLOG:  checkpoint complete: wrote 75145 buffers
>> (14.3%); 0 transaction log file(s) added, 0 removed, 32 recycled;
>> write=12.129 s, sync=0.000 s, total=12.132 s
>> 2010-01-22 12:25:30 JSTLOG:  checkpoint complete: wrote 82108 buffers
>> (15.7%); 0 transaction log file(s) added, 0 removed, 32 recycled;
>> write=10.619 s, sync=0.000 s, total=10.621 s
>> 2010-01-22 12:28:03 JSTLOG:  checkpoint complete: wrote 87349 buffers
>> (16.7%); 0 transaction log file(s) added, 0 removed, 32 recycled;
>> write=82.190 s, sync=0.000 s, total=82.192 s
>
> Here you're getting a checkpoint every minute or three, and each of
> them is writing out ~80000 buffers = 625MB.  That is crazy, and no
> wonder your queries are slow--the system is spending all of its time
> doing constant, extremely expensive checkpoints.
> .....

we're having a similar problem with a very update intensive database
that is part of a 24/7 manufacturing operation (no breathing time unless
there's an unusual line down situtation)

a snip of the log...

2010-01-23 01:02:19 MYTLOG:  checkpoint complete: wrote 764 buffers
(0.3%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=31.207 s, sync=0.147 s, total=31.394 s
2010-01-23 01:08:13 MYTLOG:  checkpoint complete: wrote 7535 buffers
(2.9%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=86.510 s, sync=0.260 s, total=86.791 s
2010-01-23 01:13:22 MYTLOG:  checkpoint complete: wrote 12786 buffers
(4.9%); 0 transaction log file(s) added, 0 removed, 17 recycled;
write=95.028 s, sync=0.135 s, total=95.182 s
2010-01-23 01:16:22 MYTLOG:  checkpoint complete: wrote 11720 buffers
(4.5%); 0 transaction log file(s) added, 0 removed, 19 recycled;
write=41.854 s, sync=0.640 s, total=42.518 s
2010-01-23 01:19:11 MYTLOG:  checkpoint complete: wrote 10436 buffers
(4.0%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=50.330 s, sync=0.510 s, total=50.861 s
2010-01-23 01:23:19 MYTLOG:  checkpoint complete: wrote 6446 buffers
(2.5%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=57.195 s, sync=0.583 s, total=57.790 s
2010-01-23 01:27:54 MYTLOG:  checkpoint complete: wrote 4454 buffers
(1.7%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=44.805 s, sync=0.118 s, total=44.943 s
2010-01-23 01:32:36 MYTLOG:  checkpoint complete: wrote 4645 buffers
(1.8%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=50.792 s, sync=0.689 s, total=51.502 s
2010-01-23 01:37:13 MYTLOG:  checkpoint complete: wrote 5722 buffers
(2.2%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=58.077 s, sync=0.606 s, total=58.707 s
2010-01-23 01:41:20 MYTLOG:  checkpoint complete: wrote 5006 buffers
(1.9%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=36.064 s, sync=0.898 s, total=36.986 s
2010-01-23 01:46:14 MYTLOG:  checkpoint complete: wrote 3644 buffers
(1.4%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=54.873 s, sync=0.659 s, total=55.554 s
2010-01-23 01:51:15 MYTLOG:  checkpoint complete: wrote 5534 buffers
(2.1%); 0 transaction log file(s) added, 0 removed, 20 recycled;
write=63.152 s, sync=0.374 s, total=63.546 s


The admin tells me the current parameters are...

checkpoint_segments = 20                # in logfile segments, min 1,
16MB eac
checkpoint_timeout = 5min              # range 30s-1h

and I'm suggesting 60 and 15min, does that seem reasonable?

This is running on a 16 core UltrasparcIV server, solaris 10, zfs, tons
of disks on SAN with multiple tablespaces on different ZFS's, with the
block sizes tuned appropriately





pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Old/New
Next
From: Erik Jones
Date:
Subject: MySQL -> Postgres migration tools?