Re: Redesigning checkpoint_segments - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: Redesigning checkpoint_segments |
Date | |
Msg-id | 51AF9C85.5080700@agliodbs.com Whole thread Raw |
In response to | Redesigning checkpoint_segments (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Redesigning checkpoint_segments
|
List | pgsql-hackers |
Heikki, > We shouldn't impose that calculation on the user. It > should be possible to just specify "checkpoint_segments=512MB", and the > system would initiate checkpoints so that the total size of WAL in > pg_xlog stays below 512MB. Agreed. > For limiting the time required to recover after crash, > checkpoint_segments is awkward because it's difficult to calculate how > long recovery will take, given checkpoint_segments=X. A bulk load can > use up segments really fast, and recovery will be fast, while segments > full of random deletions can need a lot of random I/O to replay, and > take a long time. IMO checkpoint_timeout is a much better way to control > that, although it's not perfect either. This is true, but I don't see that your proposal changes this at all (for the better or for the worse). > A third point is that even if you have 10 GB of disk space reserved for > WAL, you don't want to actually consume all that 10 GB, if it's not > required to run the database smoothly. Agreed. > I propose that we do something similar, but not exactly the same. Let's > have a setting, max_wal_size, to control the max. disk space reserved > for WAL. Once that's reached (or you get close enough, so that there are > still some segments left to consume while the checkpoint runs), a > checkpoint is triggered. Refinement of the proposal: 1. max_wal_size is a hard limit 2. checkpointing targets 50% of ( max_wal_size - wal_keep_segments ) to avoid lockup if checkpoint takes longer than expected. 3. wal_keep_segments is taken out of max_wal_size.a. it automatically defaults to 20% of max_wal_size if max_wal_senders> 0b. for that reason, we don't allow it to be larger than 80% of max_wal_size 4. preallocated WAL isn't allowed to shrink smaller than wal_keep_segements + (max_wal_size * 0.1). This would mean that I could set my server to: max_wal_size = 2GB and ... * by default, 26 segments (416MB) would be kept for wal_keep_segments. * checkpoint target would be 77 segments (1.2GB) * preallocated WAL will always be at least 39 segments (624MB), including keep_segments. now, if I had a fairly low transaction database, but wanted to make sure I could recover from an 8-hour break in replication, I might bump up wal_keep_segments to 1GB. In that case: * 64 segments (1GB) would be kept. * checkpoints would target 96 segments (1.5GB) * preallocated WAL would always be at least 77 segments (1.2GB) > Hmm, haven't thought about that. I think a better unit to set > wal_keep_segments in would also be MB, not segments. Well, the ideal unit from the user's point of view is *time*, not space.That is, the user wants the master to keep, say,"8 hours of transaction logs", not any amount of MB. I don't want to complicate this proposal by trying to deliver that, though. > In this proposal, the number of segments preallocated is controlled > separately from max_wal_size, so that you can set max_wal_size high, > without actually consuming that much space in normal operation. It's > just a backstop, to avoid completely filling the disk, if there's a > sudden burst of activity. The number of segments preallocated is > auto-tuned, based on the number of segments used in previous checkpoint > cycles. "based on"; can you give me your algorithmic thinking here? I'm thinking we should have some calculation of last cycle size and peak cycle size so that bursty workloads aren't compromised. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
pgsql-hackers by date: