Re: Huge iowait during checkpoint finish - Mailing list pgsql-general

From Greg Smith
Subject Re: Huge iowait during checkpoint finish
Date
Msg-id 4B47AC9C.2050907@2ndquadrant.com
Whole thread Raw
In response to Huge iowait during checkpoint finish  (Anton Belyaev <anton.belyaev@gmail.com>)
Responses Re: Huge iowait during checkpoint finish  (Scott Marlowe <scott.marlowe@gmail.com>)
Re: Huge iowait during checkpoint finish  (Anton Belyaev <anton.belyaev@gmail.com>)
List pgsql-general
Anton Belyaev wrote:
> I think all the IOwait comes during sync time, which is 80 s,
> according to the log entry.
>

I believe you are correctly diagnosing the issue.  The "sync time" entry
in the log was added there specifically to make it easier to confirm
this problem you're having exists on a given system.

> bgwriter_lru_maxpages = 0 # BG writer is off
> checkpoint_segments = 45
> checkpoint_timeout = 60min
> checkpoint_completion_target = 0.9
>
These are reasonable settings.  You can look at pg_stat_bgwriter to get
more statistics about your checkpoints; grab a snapshot of that now,
another one later, and then compute the difference between the two.
I've got an example of that
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

You should be aiming to have a checkpoint no more than every 5 minutes,
and on a write-heavy system shooting for closer to every 10 is probably
more appropriate.  Do you know how often they're happening on yours?
Two pg_stat_bgwriter snapshots from a couple of hours apart, with a
timestamp on each, can be used to figure that out.

> I had mostly the same config with my 8.3 deployment.
> But hardware is different:
> Disk is software RAID-5 with 3 hard drives.
> Operating system is Ubuntu 9.10 Server x64.
>

Does the new server have a lot more RAM than the 8.3 one?  Some of the
problems in this area get worse the more RAM you've got.

Does the new server use ext4 while the old one used ext3?

Basically, you have a couple of standard issues here:

1) You're using RAID-5, which is not known for good write performance.
Are you sure the disk array performs well on writes?  And if you didn't
benchmark it, you can't be sure.

2) Linux is buffering a lot of writes that are only making it to disk at
checkpoint time.  This could be simply because of (1)--maybe the disk is
always overloaded.  But it's possible this is just due to excessive
Linux buffering being lazy about the writes.  I wrote something about
that topic at
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html
you might find interesting.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


pgsql-general by date:

Previous
From: Guillaume Lelarge
Date:
Subject: Re: pgadmin save password
Next
From: Scott Marlowe
Date:
Subject: Re: Huge iowait during checkpoint finish