Thread: Checkpoints occur too frequently
Recent OSDL test reports for 8.0RC1 show that checkpoints occur too frequently for higher settings of checkpoint_segments and checkpoint_timeout. This is incorrect according to the manual, and the intention of the current implementation from examining the code. Early tests showed that checkpoints were occurring too frequently. checkpoint_timeout was set to 600 and 1800, on two separate tests, yet checkpoints occurred in around 6m 15s. checkpoint_segments was set to 8192 for the last test, yet checkpoints did not occur any less frequently. The presence of manual CHECKPOINTs has been ruled out by the test author, Mark Wong. DEBUG1 messages showed that there is an apparent limit of 255 xlog files per checkpoint - this cannot be just a reporting bug since the checkpoint timings are simply not increasing as requested by the parameter settings. Many checkpoints occur with a spread of values from 0..255, though in one run there are 18 consecutive checkpoints with exactly 255 files. Across the whole test there were 48 checkpoints, of which 25 checkpoints recycled exactly 255 files and all others recycled less. This situation could occur by chance, but with an extremely low probability, which I estimate to be of at least of the order of 1 in 1 million - even a single test result higher than 255 would disprove this, but none are available (anyone?). Results referred to above are shown here: http://www.osdl.org/projects/dbt2dev/results/dev4-010/207/db/log checkpoint_segments is limited to the range 0..INT_MAX in the code, so should not be limited to only 255 files, which dare I say is suspiciously 1 bytes worth of files. Not only is this a bug, but it limits high-end performance for those who would wish to set those parameters higher. Brief examination of the code reveals no explanation for this observation, so I'm raising it for general investigation. Happy for someone to explain how to make checkpoints occur less frequently... (apart from don't write to the database). -- Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > DEBUG1 messages showed that there is an apparent limit of 255 xlog files > per checkpoint - The volume-based checkpoint trigger code is if (IsUnderPostmaster && (openLogId != RedoRecPtr.xlogid || openLogSeg >= (RedoRecPtr.xrecoff / XLogSegSize) + (uint32) CheckPointSegments)) { #ifdef WAL_DEBUG if (XLOG_DEBUG) elog(LOG, "time for a checkpoint, signaling bgwriter"); #endif RequestCheckpoint(false); } which now that I look at it obviously forces a checkpoint whenever xlogid (the upper half of XLogRecPtr) changes, ie every 4GB of WAL output. I suppose on a high-performance platform it's possible that one would want checkpoints further apart than that, though the idea of plowing through multiple gigabytes of WAL in order to recover from a crash is a bit daunting. It's not immediately obvious how to recast the comparison without either creating overflow bugs or depending on 64-bit-int arithmetic being available. Thoughts? regards, tom lane
On Tue, 2004-12-14 at 23:31, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > DEBUG1 messages showed that there is an apparent limit of 255 xlog files > > per checkpoint - > > The volume-based checkpoint trigger code is > > if (IsUnderPostmaster && > (openLogId != RedoRecPtr.xlogid || > openLogSeg >= (RedoRecPtr.xrecoff / XLogSegSize) + > (uint32) CheckPointSegments)) > { > #ifdef WAL_DEBUG > if (XLOG_DEBUG) > elog(LOG, "time for a checkpoint, signaling bgwriter"); > #endif > RequestCheckpoint(false); > } > > which now that I look at it obviously forces a checkpoint whenever > xlogid (the upper half of XLogRecPtr) changes, ie every 4GB of WAL > output. I suppose on a high-performance platform it's possible that > one would want checkpoints further apart than that, though the idea > of plowing through multiple gigabytes of WAL in order to recover from > a crash is a bit daunting. > > It's not immediately obvious how to recast the comparison without > either creating overflow bugs or depending on 64-bit-int arithmetic > being available. Thoughts? Thanks for finding it. It was staring me in the face. I'd say no code changes for 8.0, now we know what's causing it. A doc patch to show the limit is probably just going to annoy the translators at this stage also. Reasons: - you can recompile using larger XLogSegSize, if you care to - the real answer is to reduce the xlog volume -- Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > I'd say no code changes for 8.0, now we know what's causing it. A doc > patch to show the limit is probably just going to annoy the translators > at this stage also. We could adjust guc.c to limit checkpoint_segments to the range 1..255 without having to touch any translatable strings. This isn't a necessary change but it seems harmless ... any objections? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Simon Riggs <simon@2ndquadrant.com> writes: >> I'd say no code changes for 8.0, now we know what's causing it. A doc >> patch to show the limit is probably just going to annoy the translators >> at this stage also. > We could adjust guc.c to limit checkpoint_segments to the range 1..255 > without having to touch any translatable strings. This isn't a > necessary change but it seems harmless ... any objections? Or we could just fix it. After thinking a bit more, I realized that it's not hard to push the forced-checkpoint boundary out to 2^32 segments instead of 255. That should be enough to still any complaints. regards, tom lane
On Fri, 2004-12-17 at 00:12, Tom Lane wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > Simon Riggs <simon@2ndquadrant.com> writes: > >> I'd say no code changes for 8.0, now we know what's causing it. A doc > >> patch to show the limit is probably just going to annoy the translators > >> at this stage also. > > > We could adjust guc.c to limit checkpoint_segments to the range 1..255 > > without having to touch any translatable strings. This isn't a > > necessary change but it seems harmless ... any objections? > > Or we could just fix it. After thinking a bit more, I realized that > it's not hard to push the forced-checkpoint boundary out to 2^32 > segments instead of 255. That should be enough to still any complaints. Sorry for the delay in replying. Thanks for considering this further. If it can be fixed in 8.0, that would be good. If this means any risk or non-portability, then I would defer. -- Best Regards, Simon Riggs