Re: WAL partition filling up after high WAL activity - Mailing list pgsql-performance

From Greg Smith
Subject Re: WAL partition filling up after high WAL activity
Date
Msg-id 4EBAA516.2070307@2ndQuadrant.com
Whole thread Raw
In response to WAL partition filling up after high WAL activity  (Richard Yen <richyen@iparadigms.com>)
Responses Re: WAL partition filling up after high WAL activity  (Rafael Martinez <r.m.guerrero@usit.uio.no>)
List pgsql-performance
On 11/07/2011 05:18 PM, Richard Yen wrote:
My biggest question is: we know from the docs that there should be no more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 files.  For us, that would mean no more than 48 files, which equates to 384MB--far lower than the 9.7GB partition size.  **Why would WAL use up so much disk space?**


That's only true if things are operating normally.  There are at least two ways this can fail to be a proper upper limit on space used:

1) You are archiving to a second system, and the archiving isn't keeping up.  Things that haven't been archived can't be re-used, so more disk space is used.

2) Disk I/O is slow, and the checkpoint writes take a significant period of time.  The internal scheduling assumes each individual write will happen without too much delay.  That assumption can easily be untrue on a busy system.  The worst I've seen now are checkpoints that take 6 hours to sync, where the time is supposed to be a few seconds.  Disk space in that case was a giant multiple of checkpoint_segments.  (The source of that problem is very much improved in PostgreSQL 9.1)

The info needed to figure out which category you're in would appear after tuning log_checkpoints on in the postgresql.conf ; you only need to reload the server config after that, doesn't require a restart.  I would guess you have realy long sync times there.

As for what to do about it, checkpoint_segments=16 is a low setting.  You might as well set it to a large number, say 128, and let checkpoints get driven by time instead.  The existing limit isn't working effectively anyway, and having more segments lets the checkpoint spreading code work more evenly.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Subquery in a JOIN not getting restricted?
Next
From: Merlin Moncure
Date:
Subject: Re: Subquery in a JOIN not getting restricted?