Re: Redesigning checkpoint_segments - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: Redesigning checkpoint_segments |
Date | |
Msg-id | 51B0C5BD.80200@agliodbs.com Whole thread Raw |
In response to | Redesigning checkpoint_segments (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Redesigning checkpoint_segments
|
List | pgsql-hackers |
>> Then I suggest we not use exactly that name. I feel quite sure we >> would get complaints from people if something labeled as "max" was >> exceeded -- especially if they set that to the actual size of a >> filesystem dedicated to WAL files. > > You're probably right. Any suggestions for a better name? > wal_size_soft_limit? "checkpoint_size_limit", or something similar. That is, what you're defining is: "this is the size at which we trigger a checkpoint even if checkpoint_timeout has not been exceeded". However, I think it's worth considering: if we're doing this "sizing checkpoints based on prior cycles" thing, do we really need a size_limit *at all* for most users? I can see how a hard limit is useful, but not how a soft limit is. Most of our users most of the time don't care how large WAL is as long as it doesn't exceed disk space. And on most databases, hitting checkpoint_timeout is more frequent than hitting checkpoint_segments -- at least in my substantial performance-tuning experience. So I think most users would prefer a setting which essentially says "make WAL as big as it has to be in order to maximize throughput", and wouldn't worry about the disk space. > > Yeah, something like that :-). I was thinking of letting the estimate > decrease like a moving average, but react to any increases immediately. > Same thing we do in bgwriter to track buffer allocations: Seems reasonable. Given the behavior of xlog, I'd want to adjust the algo so that peak usage on a 24-hour basis would affect current preallocation. That is, if a site regularly has a peak from 2-3pm where they're using 180 segments/cycle, then they should still be somewhat higher at 2am than a database which doesn't have that peak. I'm pretty sure that the bgwriter's moving average cycles much shorter time scales than that. >> Well, the ideal unit from the user's point of view is *time*, not space. >> That is, the user wants the master to keep, say, "8 hours of >> transaction logs", not any amount of MB. I don't want to complicate >> this proposal by trying to deliver that, though. > > OTOH, if you specify it in terms of time, then you don't have any limit > on the amount of disk space required. Well, the best setup from my perspective as a remote DBA for a lot of clients would be two-factor: wal_keep_time: ##hr wal_keep_size_limit: ##GB That is, we would try to keep ##hr of WAL around for the standbys, unless that amount exceeded ##GB (at which point we'd write a warning to the logs). If max_wal_size was a hard limit, we wouldn't need wal_keep_size_limit, of course. However, to some degree Andres' work will render all this wal_keep_segments stuff obsolete by letting the master track what segment was last consumed by each replica, so I don't think it's worth pursuing this line of thinking a lot further. In any case, I'm just pointing out that we need to think of wal_keep_segments as part of the total WAL size, and not as something seperate, because that's confusing our users. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
pgsql-hackers by date: