Re: Load Distributed Checkpoints, revised patch - Mailing list pgsql-patches
From | Heikki Linnakangas |
---|---|
Subject | Re: Load Distributed Checkpoints, revised patch |
Date | |
Msg-id | 4674E807.3050805@enterprisedb.com Whole thread Raw |
In response to | Re: Load Distributed Checkpoints, revised patch ("Simon Riggs" <simon@2ndquadrant.com>) |
Responses |
Re: Load Distributed Checkpoints, revised patch
|
List | pgsql-patches |
Simon Riggs wrote: > On Fri, 2007-06-15 at 11:34 +0100, Heikki Linnakangas wrote: > >> - What units should we use for the new GUC variables? From >> implementation point of view, it would be simplest if >> checkpoint_write_rate is given as pages/bgwriter_delay, similarly to >> bgwriter_*_maxpages. I never liked those *_maxpages settings, though, a >> more natural unit from users perspective would be KB/s. > > checkpoint_maxpages would seem like a better name; we've already had > those _maxpages settings for 3 releases, so changing that is not really > an option (at so late a stage). As Tom pointed out, we don't promise compatibility of conf-files over major releases. I wasn't actually thinking of changing any of the existing parameters, just thinking about the best name and behavior for the new ones. > We don't really care about units because > the way you use it is to nudge it up a little and see if that works > etc.. Not necessarily. If it's given in KB/s, you might very well have an idea of how much I/O your hardware is capable of, and set aside a fraction of that for checkpoints. > Can we avoid having another parameter? There must be some protection in > there to check that a checkpoint lasts for no longer than > checkpoint_timeout, so it makes most sense to vary the checkpoint in > relation to that parameter. Sure, that's what checkpoint_write_percent is for. checkpoint_rate can be used to finish the checkpoint faster, if there's not much work to do. For example, if there's only 10 pages to flush in a checkpoint, checkpoint_timeout is 30 minutes and checkpoint_write_percent = 50%, you don't want to spread out those 10 writes over 15 minutes, that would be just silly. checkpoint_rate sets the *minimum* rate used to write. If writing at that minimum rate isn't enough to finish the checkpoint in time, as defined by by checkpoint interval * checkpoint_write_percent, we write more aggressively. I'm more interested in checkpoint_write_percent myself as well, but Greg Smith said he wanted the checkpoint to use a constant I/O rate and let the length of the checkpoint to vary. >> - The signaling between RequestCheckpoint and bgwriter is a bit tricky. >> Bgwriter now needs to deal immediate checkpoint requests, like those >> coming from explicit CHECKPOINT or CREATE DATABASE commands, differently >> from those triggered by checkpoint_segments. I'm afraid there might be >> race conditions when a CHECKPOINT is issued at the same instant as >> checkpoint_segments triggers one. What might happen then is that the >> checkpoint is performed lazily, spreading the writes, and the CHECKPOINT >> command has to wait for that to finish which might take a long time. I >> have not been able to convince myself neither that the race condition >> exists or that it doesn't. > > Is there a mechanism for requesting immediate/non-immediate checkpoints? No, CHECKPOINT requests an immediate one. Is there a use case for CHECKPOINT LAZY? > pg_start_backup() should be a normal checkpoint I think. No need for > backup to be an intrusive process. Good point. A spread out checkpoint can take a long time to finish, though. Is there risk for running into a timeout or something if it takes say 10 minutes for a call to pg_start_backup to finish? >> - to coordinate the writes with with checkpoint_segments, we need to >> read the WAL insertion location. To do that, we need to acquire the >> WALInsertLock. That means that in the worst case, WALInsertLock is >> acquired every bgwriter_delay when a checkpoint is in progress. I don't >> think that's a problem, it's only held for a very short duration, but I >> thought I'd mention it. > > I think that is a problem. Why? > Do we need to know it so exactly that we look > at WALInsertLock? Maybe use info_lck to request the latest page, since > that is less heavily contended and we need never wait across I/O. Is there such a value available, that's protected by just info_lck? I can't see one. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-patches by date: