Re: Redesigning checkpoint_segments - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: Redesigning checkpoint_segments
Date
Msg-id 54D13B2A.6040808@agliodbs.com
Whole thread Raw
In response to Redesigning checkpoint_segments  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Redesigning checkpoint_segments  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 02/03/2015 07:50 AM, Robert Haas wrote:
> On Tue, Feb 3, 2015 at 10:44 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> That's the whole point of this patch. "max_checkpoint_segments = 10", or
>> "max_checkpoint_segments = 160 MB", means that the system will begin a
>> checkpoint so that when the checkpoint completes, and it truncates away or
>> recycles old WAL, the total size of pg_xlog is 160 MB.
>>
>> That's different from our current checkpoint_segments setting. With
>> checkpoint_segments, the documented formula for calculating the disk usage
>> is (2 + checkpoint_completion_target) * checkpoint_segments * 16 MB. That's
>> a lot less intuitive to set.
> 
> Hmm, that's different from what I was thinking.  We probably shouldn't
> call that max_checkpoint_segments, then.  I got confused and thought
> you were just trying to decouple the number of segments that it takes
> to trigger a checkpoint from the number we keep preallocated.

Wait, what?  Because the new setting is an actual soft maximum, we
*shouldn't* call it a maximum?  Or are you saying something else?

On 02/03/2015 04:25 AM, Heikki Linnakangas wrote:
> On 02/02/2015 04:21 PM, Andres Freund wrote:
>> I think we need to increase checkpoint_timeout too - that's actually
>> just as important for the default experience from my pov. 5 minutes
>> often just unnecessarily generates FPWs en masse.

I have yet to see any serious benchmarking on checkpoint_timeout.  It
does seem that for some workloads on some machines a longer timeout is
better, but I've also seen workloads where a longer timeout decreases
throughput or raises IO.  So absent some hard numbers, I'd be opposed to
changing the default.

>>
>>> I'll open the bidding at 1600MB (aka 100).
>>
>> Fine with me.
>
> I wouldn't object to raising it a little bit, but that's way too high.
> It's entirely possible to have a small database that generates a lot of
> WAL. A table that has only a few rows, but is updated very very
> frequently, for example. And checkpointing such a database is quick too,
> so frequent checkpoints are not a problem. You don't want to end up with
> 1.5 GB of WAL on a 100 MB database.

I suggest 192MB instead (12 segments).  That almost doubles our current
real default, without requiring huge disk space which might surprise
some users.

In practice, checkpoint_segments is impossible to automatically tune
correctly.  So let's be conservative.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Re: Overhauling our interrupt handling (was Escaping from blocked send() reprised.)
Next
From: Jim Nasby
Date:
Subject: Re: Proposal : REINDEX xxx VERBOSE