Re: Redesigning checkpoint_segments - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Redesigning checkpoint_segments
Date
Msg-id 51B08CDD.8030900@vmware.com
Whole thread Raw
In response to Re: Redesigning checkpoint_segments  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-hackers
On 06.06.2013 15:31, Kevin Grittner wrote:
> Heikki Linnakangas<hlinnakangas@vmware.com>  wrote:
>> On 05.06.2013 22:18, Kevin Grittner wrote:
>>> Heikki Linnakangas<hlinnakangas@vmware.com>   wrote:
>>>
>>>> I was not thinking of making it a hard limit. It would be just
>>>> like checkpoint_segments from that point of view - if a
>>>> checkpoint takes a long time, max_wal_size might still be
>>>> exceeded.
>>>
>>> Then I suggest we not use exactly that name.  I feel quite sure we
>>> would get complaints from people if something labeled as "max" was
>>> exceeded -- especially if they set that to the actual size of a
>>> filesystem dedicated to WAL files.
>>
>> You're probably right. Any suggestions for a better name?
>> wal_size_soft_limit?
>
> After reading later posts on the thread, I would be inclined to
> support making it a hard limit and adapting the behavior to match.

Well, that's a lot more difficult to implement. And even if we have a 
hard limit, I think many people would still want to have a soft limit 
that would trigger a checkpoint, but would not stop WAL writes from 
happening. So what would we call that?

I'd love to see a hard limit too, but I see that as an orthogonal feature.

How about calling the (soft) limit "checkpoint_wal_size"? That goes well 
together with checkpoint_timeout, meaning that a checkpoint will be 
triggered if you're about to exceed the given size.

> I'm also concerned about the "spin up" from idle to high activity.
> Perhaps a "min" should also be present, to mitigate repeated short
> checkpoint cycles for "bursty" environments?

With my proposal, you wouldn't get repeated short checkpoint cycles with 
bursts. The checkpoint interval would be controlled by 
checkpoint_timeout, and checkpoint_wal_size. If there is a lot of 
activity, then checkpoints will happen more frequently, as 
checkpoint_wal_size is reached sooner. But it would not depend on the 
activity in previous checkpoint cycles, only the current one, so it 
would not make a difference if you have a continuously high load, or a 
bursty one.

The history would matter for the calculation of how many segments to 
preallocate/recycle, however. Under the proposal, that would be 
calculated separately from checkpoint_wal_size, and for that we'd use 
some kind of a moving average of how many segments were used in previous 
cycles. A min setting might be useful for that. We could also try to 
make WAL file creation cheaper, ie. by using posix_fallocate(), as was 
proposed in another thread, and doing it in bgwriter or walwriter. That 
would make it less important to get the estimate right, from a 
performance point of view, although you'd still want to get it right to 
avoid running out of disk space (having the segments preallocated 
ensures that they are available when needed).

- Heikki



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Freezing without write I/O
Next
From: "Karl O. Pinc"
Date:
Subject: Re: Make targets of doc links used by phpPgAdmin static