Re: Redesigning checkpoint_segments - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Redesigning checkpoint_segments |
Date | |
Msg-id | CAHGQGwEqhF=F7qqa92fZJZn8R5j+HSKTmr+Uwbvoq20qcv+7gw@mail.gmail.com Whole thread Raw |
In response to | Re: Redesigning checkpoint_segments (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Redesigning checkpoint_segments
Re: Redesigning checkpoint_segments |
List | pgsql-hackers |
On Thu, Jun 6, 2013 at 3:35 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 05.06.2013 21:16, Fujii Masao wrote: >> >> On Wed, Jun 5, 2013 at 9:16 PM, Heikki Linnakangas >> <hlinnakangas@vmware.com> wrote: >>> >>> I propose that we do something similar, but not exactly the same. Let's >>> have >>> >>> a setting, max_wal_size, to control the max. disk space reserved for WAL. >>> Once that's reached (or you get close enough, so that there are still >>> some >>> segments left to consume while the checkpoint runs), a checkpoint is >>> triggered. >> >> >> What if max_wal_size is reached while the checkpoint is running? We should >> change the checkpoint from spread mode to fast mode? > > > The checkpoint spreading code already tracks if the checkpoint is "on > schedule", and it takes into account both checkpoint_timeout and > checkpoint_segments. Ie. if you consume segments faster than expected, the > checkpoint will speed up as well. Once checkpoint_segments is reached, the > checkpoint will complete ASAP, with no delays to spread it out. Yep, right. One problem is that this mechanism doesn't work in the standby. So, are you planning to 'fix' that so that max_wal_size works well even in the standby? Or just leave that as it is? According to the remaining part of your email, you seem to choose the latter, though. > > This would still work the same with max_wal_size. A new checkpoint would be > started well before reaching max_wal_size, so that it has enough time to > complete. If the checkpoint "falls behind", it will hurry up until it's back > on schedule. If max_wal_size is reached anyway, it will complete ASAP. > > >> Or, if max_wal_size >> is hard limit, we should keep the allocation of new WAL file waiting until >> the checkpoint has finished and removed some old WAL files? > > > I was not thinking of making it a hard limit. It would be just like > checkpoint_segments from that point of view - if a checkpoint takes a long > time, max_wal_size might still be exceeded. So, if the archive command keeps failing or its speed is very slow (e.g., because of using compression tool), max_wal_size can still be extremely exceeded. Right? I'm wondering if it's worth exposing the option specifying whether to use max_wal_size as the hard limit or not. If it's not hard limit, the disk space can be filled up with WAL files and PANIC can happen. In this case, in order to restart the database service, we need to enlarge the disk space or relocate some WAL files to another disk space, and then we need to start up the server. The normal crash recovery needs to be done. This would lead lots of service down time. OTOH, if we use max_wal_size as a hard limit, we can avoid such PANIC error and long down time. Of course, in this case, once max_wal_size is reached, we cannot complete any query writing WAL until the checkpoint has completed and removed old WAL files. During that time, the database service looks like down from a client, but its down time is shorter than the PANIC error case. So I'm thinking that some users might want the hard limit of pg_xlog size. >>> In this proposal, the number of segments preallocated is controlled >>> separately from max_wal_size, so that you can set max_wal_size high, >>> without >>> actually consuming that much space in normal operation. It's just a >>> backstop, to avoid completely filling the disk, if there's a sudden burst >>> of >>> activity. The number of segments preallocated is auto-tuned, based on the >>> number of segments used in previous checkpoint cycles. >> >> >> How is wal_keep_segments handled in your approach? > > > Hmm, haven't thought about that. I think a better unit to set > wal_keep_segments in would also be MB, not segments. +1 Regards, -- Fujii Masao
pgsql-hackers by date: