Re: Redesigning checkpoint_segments - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Redesigning checkpoint_segments |
Date | |
Msg-id | 51B08CDD.8030900@vmware.com Whole thread Raw |
In response to | Re: Redesigning checkpoint_segments (Kevin Grittner <kgrittn@ymail.com>) |
List | pgsql-hackers |
On 06.06.2013 15:31, Kevin Grittner wrote: > Heikki Linnakangas<hlinnakangas@vmware.com> wrote: >> On 05.06.2013 22:18, Kevin Grittner wrote: >>> Heikki Linnakangas<hlinnakangas@vmware.com> wrote: >>> >>>> I was not thinking of making it a hard limit. It would be just >>>> like checkpoint_segments from that point of view - if a >>>> checkpoint takes a long time, max_wal_size might still be >>>> exceeded. >>> >>> Then I suggest we not use exactly that name. I feel quite sure we >>> would get complaints from people if something labeled as "max" was >>> exceeded -- especially if they set that to the actual size of a >>> filesystem dedicated to WAL files. >> >> You're probably right. Any suggestions for a better name? >> wal_size_soft_limit? > > After reading later posts on the thread, I would be inclined to > support making it a hard limit and adapting the behavior to match. Well, that's a lot more difficult to implement. And even if we have a hard limit, I think many people would still want to have a soft limit that would trigger a checkpoint, but would not stop WAL writes from happening. So what would we call that? I'd love to see a hard limit too, but I see that as an orthogonal feature. How about calling the (soft) limit "checkpoint_wal_size"? That goes well together with checkpoint_timeout, meaning that a checkpoint will be triggered if you're about to exceed the given size. > I'm also concerned about the "spin up" from idle to high activity. > Perhaps a "min" should also be present, to mitigate repeated short > checkpoint cycles for "bursty" environments? With my proposal, you wouldn't get repeated short checkpoint cycles with bursts. The checkpoint interval would be controlled by checkpoint_timeout, and checkpoint_wal_size. If there is a lot of activity, then checkpoints will happen more frequently, as checkpoint_wal_size is reached sooner. But it would not depend on the activity in previous checkpoint cycles, only the current one, so it would not make a difference if you have a continuously high load, or a bursty one. The history would matter for the calculation of how many segments to preallocate/recycle, however. Under the proposal, that would be calculated separately from checkpoint_wal_size, and for that we'd use some kind of a moving average of how many segments were used in previous cycles. A min setting might be useful for that. We could also try to make WAL file creation cheaper, ie. by using posix_fallocate(), as was proposed in another thread, and doing it in bgwriter or walwriter. That would make it less important to get the estimate right, from a performance point of view, although you'd still want to get it right to avoid running out of disk space (having the segments preallocated ensures that they are available when needed). - Heikki
pgsql-hackers by date: