Re: checkpoint spikes - Mailing list pgsql-general

From Greg Smith
Subject Re: checkpoint spikes
Date
Msg-id 4C114476.60209@2ndquadrant.com
Whole thread Raw
In response to Re: checkpoint spikes  (Janning <ml@planwerk6.de>)
Responses Re: checkpoint spikes
Re: checkpoint spikes
List pgsql-general
Janning wrote:
> 1. With raising checkpoint_timeout, is there any downgrade other than slower
> after-crash recovery?
>

Checkpoint spikes happen when too much I/O has been saved up for
checkpoint time than the server can handle.  While this is normally
handled by the checkpoint spreading logic, you may find that with your
limited disk configuration there's no other way to handle the problem
but to make checkpoints much more frequent, rather than slower.  At
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm I
write about how you had to tune PostgreSQL 8.0 to 8.2 in order to keep
checkpoint spikes from getting too big.  You may have to follow those
same techniques for your server.  Just don't try to increase the
background writer settings in your case--the 8.3 one is different enough
that you can't tune that the way was suggested for 8.2.

> 2. Is there a way to calculate the after-crash recovery time with a certain
> checkpoint_timeout? How long would be approx. for a checkpoint_timeout of 60
> minutes?
>

Simulate it.  No way to estimate.


> 3. Is it sane to set checkpoint_timeout to 120min or even to 600min?
>

Checkpoints happen when you reach either checkpoint_segments of WAL
written *or* reach checkpoint_timeout, whichever happens first.  You'd
have to set both to extremely large values to get checkpoints to happen
really infrequently.  Which I suspect is the exactly opposite of what
you want--you can't handle the spike from a long delayed checkpoint, and
probably want to tune for shorter and smaller ones instead.

Every now and then we run into someone who had to retune their system to
something like:

shared_buffers=512MB
checkpoint_segments=3

In order to avoid spikes from killing them.  That may be the direction
you have to head.  The longer the time between checkpoints, the bigger
the spike at the end is going to be to some extend; you can't completely
spread that out.


> 5. Does anybody know if I can set dirty_background_ratio to 0.5? As we have 12
> GB RAM and rather slow disks 0,5% would result in a maximum of 61MB dirty
> pages.
>

Nope.  Linux has absolutely terrible controls for this critical
performance parameter.   The sort of multi-second spikes you're seeing
are extremely common and very difficult to get rid of.


> PS: Do I need to post this question on pgsql-perfomance? If so, please let me
> know.
>

That would have been the better list for it originally.  I also wrote
something about a technique that uses pg_stat_bgwriter snapshots to help
model what the server is doing in these cases better you might find
useful on the admin list, it's at
http://archives.postgresql.org/pgsql-admin/2010-06/msg00074.php

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


pgsql-general by date:

Previous
From: Marinos Yannikos
Date:
Subject: Re: server crash => libpq poll() hangs forever (Linux)
Next
From: Paul Baker
Date:
Subject: Postgresql packages in Solaris