Re: Redesigning checkpoint_segments - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Redesigning checkpoint_segments
Date
Msg-id CAHGQGwEqhF=F7qqa92fZJZn8R5j+HSKTmr+Uwbvoq20qcv+7gw@mail.gmail.com
Whole thread Raw
In response to Re: Redesigning checkpoint_segments  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Redesigning checkpoint_segments
Re: Redesigning checkpoint_segments
List pgsql-hackers
On Thu, Jun 6, 2013 at 3:35 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 05.06.2013 21:16, Fujii Masao wrote:
>>
>> On Wed, Jun 5, 2013 at 9:16 PM, Heikki Linnakangas
>> <hlinnakangas@vmware.com>  wrote:
>>>
>>> I propose that we do something similar, but not exactly the same. Let's
>>> have
>>>
>>> a setting, max_wal_size, to control the max. disk space reserved for WAL.
>>> Once that's reached (or you get close enough, so that there are still
>>> some
>>> segments left to consume while the checkpoint runs), a checkpoint is
>>> triggered.
>>
>>
>> What if max_wal_size is reached while the checkpoint is running? We should
>> change the checkpoint from spread mode to fast mode?
>
>
> The checkpoint spreading code already tracks if the checkpoint is "on
> schedule", and it takes into account both checkpoint_timeout and
> checkpoint_segments. Ie. if you consume segments faster than expected, the
> checkpoint will speed up as well. Once checkpoint_segments is reached, the
> checkpoint will complete ASAP, with no delays to spread it out.

Yep, right. One problem is that this mechanism doesn't work in the standby.
So, are you planning to 'fix' that so that max_wal_size works well even in
the standby? Or just leave that as it is? According to the remaining part of
your email, you seem to choose the latter, though.

>
> This would still work the same with max_wal_size. A new checkpoint would be
> started well before reaching max_wal_size, so that it has enough time to
> complete. If the checkpoint "falls behind", it will hurry up until it's back
> on schedule. If max_wal_size is reached anyway, it will complete ASAP.
>
>
>> Or, if max_wal_size
>> is hard limit, we should keep the allocation of new WAL file waiting until
>> the checkpoint has finished and removed some old WAL files?
>
>
> I was not thinking of making it a hard limit. It would be just like
> checkpoint_segments from that point of view - if a checkpoint takes a long
> time, max_wal_size might still be exceeded.

So, if the archive command keeps failing or its speed is very slow
(e.g., because
of using compression tool), max_wal_size can still be extremely exceeded. Right?

I'm wondering if it's worth exposing the option specifying whether to use
max_wal_size as the hard limit or not. If it's not hard limit, the
disk space can
be filled up with WAL files and PANIC can happen. In this case, in order to
restart the database service, we need to enlarge the disk space or relocate
some WAL files to another disk space, and then we need to start up the server.
The normal crash recovery needs to be done. This would lead lots of service
down time.

OTOH, if we use max_wal_size as a hard limit, we can avoid such PANIC
error and long down time. Of course, in this case, once max_wal_size is
reached, we cannot complete any query writing WAL until the checkpoint
has completed and removed old WAL files. During that time, the database
service looks like down from a client, but its down time is shorter than the
PANIC error case. So I'm thinking that some users might want the hard
limit of pg_xlog size.

>>> In this proposal, the number of segments preallocated is controlled
>>> separately from max_wal_size, so that you can set max_wal_size high,
>>> without
>>> actually consuming that much space in normal operation. It's just a
>>> backstop, to avoid completely filling the disk, if there's a sudden burst
>>> of
>>> activity. The number of segments preallocated is auto-tuned, based on the
>>> number of segments used in previous checkpoint cycles.
>>
>>
>> How is wal_keep_segments handled in your approach?
>
>
> Hmm, haven't thought about that. I think a better unit to set
> wal_keep_segments in would also be MB, not segments.

+1

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: Redesigning checkpoint_segments
Next
From: Giovanni Mascellani
Date:
Subject: About large objects asynchronous and non-blocking support