On 7/26/13 9:14 AM, didier wrote:
> During recovery you have to load the log in cache first before applying WAL.
Checkpoints exist to bound recovery time after a crash. That is their
only purpose. What you're suggesting moves a lot of work into the
recovery path, which will slow down how long it takes to process.
More work at recovery time means someone who uses the default of
checkpoint_timeout='5 minutes', expecting that crash recovery won't take
very long, will discover it does take a longer time now. They'll be
forced to shrink the value to get the same recovery time as they do
currently. You might need to make checkpoint_timeout 3 minutes instead,
if crash recovery now has all this extra work to deal with. And when
the time between checkpoints drops, it will slow the fundamental
efficiency of checkpoint processing down. You will end up writing out
more data in the end.
The interval between checkpoints and recovery time are all related. If
you let any one side of the current requirements slip, it makes the rest
easier to deal with. Those are all trade-offs though, not improvements. And this particular one is already an
option.
If you want less checkpoint I/O per capita and don't care about recovery
time, you don't need a code change to get it. Just make
checkpoint_timeout huge. A lot of checkpoint I/O issues go away if you
only do a checkpoint per hour, because instead of random writes you're
getting sequential ones to the WAL. But when you crash, expect to be
down for a significant chunk of an hour, as you go back to sort out all
of the work postponed before.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com