Thread: Re: [PATCHES] Load distributed checkpoint patch
Bruce Momjian <bruce@momjian.us> wrote: > OK, if I understand correctly, instead of doing a buffer scan, write(), > and fsync(), and recyle the WAL files at checkpoint time, you delay the > scan/write part with the some delay. Exactly. Actual behavior of checkpoint is not changed by the patch. Compared with existing checkpoints, it just takes longer time in scan/write part. > Do you use the same delay autovacuum uses? What do you mean 'the same delay'? Autovacuum does VACUUM, not CHECKPOINT. If you think cost-based-delay, I think we cannot use it here. It's hard to estimate how much checkpoints delay by cost-based sleeping, but we should finish asynchronous checkpoints by the start of next checkpoint. So I gave priority to punctuality over load smoothing. > As I remember, often the checkpoint is caused because > we are using the last WAL file. Doesn't this delay the creation of new > WAL files by renaming the old ones to higher numbers (we can't rename > them until the checkpoint is complete)? Checkpoints should be done by the next one, so we need WAL files for two checkpoints. It is the same as now. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
From: "ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp> > Bruce Momjian <bruce@momjian.us> wrote: >> Do you use the same delay autovacuum uses? > > What do you mean 'the same delay'? Autovacuum does VACUUM, not CHECKPOINT. > If you think cost-based-delay, I think we cannot use it here. It's hard to > estimate how much checkpoints delay by cost-based sleeping, but we should > finish asynchronous checkpoints by the start of next checkpoint. So I gave > priority to punctuality over load smoothing. I consider that smoothing the load (more meaningfully, response time) has higher priority over checkpoint punctuality in a practical sense, because the users of a system benefit from good steady response and give good reputation to the system. If the checkpoint processing is not punctual, crash recovery would take longer time. But which would you give higher priority, the unlikely event (=crash of the system) or likely event (=peek hours of the system)? I believe the latter should be regarded. The system can write dirty buffers after the peek hours pass. User experience should be taken much case of.
ITAGAKI Takahiro wrote: > Bruce Momjian <bruce@momjian.us> wrote: > > > OK, if I understand correctly, instead of doing a buffer scan, write(), > > and fsync(), and recyle the WAL files at checkpoint time, you delay the > > scan/write part with the some delay. > > Exactly. Actual behavior of checkpoint is not changed by the patch. Compared > with existing checkpoints, it just takes longer time in scan/write part. > > > Do you use the same delay autovacuum uses? Sorry, I meant bgwriter delay, not autovauum. > What do you mean 'the same delay'? Autovacuum does VACUUM, not CHECKPOINT. > If you think cost-based-delay, I think we cannot use it here. It's hard to > estimate how much checkpoints delay by cost-based sleeping, but we should > finish asynchronous checkpoints by the start of next checkpoint. So I gave > priority to punctuality over load smoothing. OK. > > As I remember, often the checkpoint is caused because > > we are using the last WAL file. Doesn't this delay the creation of new > > WAL files by renaming the old ones to higher numbers (we can't rename > > them until the checkpoint is complete)? > > Checkpoints should be done by the next one, so we need WAL files for two > checkpoints. It is the same as now. Ah, OK, so we already reserve a full set of WAL files while we are waiting for the checkpoint to complete. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
>>> On Wed, Dec 20, 2006 at 6:05 AM, in message <03be01c7242f$2b4ce130$19527c0a@OPERAO>, "Takayuki Tsunakawa" <tsunakawa.takay@jp.fujitsu.com> wrote: > > I consider that smoothing the load (more meaningfully, response time) > has higher priority over checkpoint punctuality in a practical sense, > because the users of a system benefit from good steady response and > give good reputation to the system. I agree with that. > If the checkpoint processing is > not punctual, crash recovery would take longer time. But which would > you give higher priority, the unlikely event (=crash of the system) or > likely event (=peek hours of the system)? I believe the latter should > be regarded. I'm still with you here. > The system can write dirty buffers after the peek hours > pass. I don't see that in our busiest environment. We have 3,000 "directly connected" users, various business partner interfaces, and public web entry doing OLTP in 72 databases distributed around the state, with real-time replication to central databases which are considered derived copies. If all the pages modified on the central databases were held in buffers or cache until after peak hours, query performance would suffer -- assuming it would all even fit in cache. We must have a way for dirty pages to be written under load while responding to hundreds of thousands of queries per hour without disturbing "freezes" during checkpoints. On top of that, we monitor database requests on the source machines, and during "idle time" we synchronize the data with all of the targets to identify, log, and correct "drift". So even if we could shift all our disk writes to the end of the day, that would have its own down side, in extending our synchronization cycle. I raise this only to be sure that such environments are considered with these changes, not to discourage improvements in the checkpoint techniques. We have effectively eliminated checkpoint problems in our environment with a combination of battery backed controller cache and aggressive background writer configuration. When you have a patch which seems to help those who still have problems, I'll try to get time approved to run a transaction replication stream onto one of our servers (in "catch up mode") while we do a web "stress test" by playing back requests from our production log. That should indicate how the patch will affect us. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > > I consider that smoothing the load (more meaningfully, response time) > > has higher priority over checkpoint punctuality in a practical sense, > > I agree with that. I agree with checkpoint_time is not so important, but we should respect checkpoint_segements, or else new WAL files would be created unboundedly, as Bruce pointed. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Hello, Mr. Grittner, From: "Kevin Grittner" <Kevin.Grittner@wicourts.gov> > We have 3,000 "directly connected" users, various business partner > interfaces, and public web entry doing OLTP in 72 databases distributed > around the state, with real-time replication to central databases which > are considered derived copies. What a big system you have. > If all the pages modified on the central > databases were held in buffers or cache until after peak hours, query > performance would suffer -- assuming it would all even fit in cache. We > must have a way for dirty pages to be written under load while > responding to hundreds of thousands of queries per hour without > disturbing "freezes" during checkpoints. I agree with you. My words were not good. I consider it is necessary to always advance checkpoints even under heavy load, caring OLTP transactions. > I raise this only to be sure that such environments are considered with > these changes, not to discourage improvements in the checkpoint > techniques. We have effectively eliminated checkpoint problems in our > environment with a combination of battery backed controller cache and > aggressive background writer configuration. When you have a patch which > seems to help those who still have problems, I'll try to get time > approved to run a transaction replication stream onto one of our servers > (in "catch up mode") while we do a web "stress test" by playing back > requests from our production log. That should indicate how the patch > will affect us. Thank you very much for your kind offer.