Re: Checkpoint question - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Checkpoint question |
Date | |
Msg-id | 1137056890.3180.7.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Checkpoint question (Qingqing Zhou <zhouqq@cs.toronto.edu>) |
Responses |
Re: Checkpoint question
Re: Checkpoint question |
List | pgsql-hackers |
On Wed, 2006-01-11 at 22:33 -0500, Qingqing Zhou wrote: > On Wed, 11 Jan 2006, Tom Lane wrote: > > It'd be possible to do something like this: after establishing > > RedoRecPtr, make one quick pass through the buffers and make a list of > > what needs to be dumped at that instant. Then go back and do the actual > > I/O for only those buffers. I'm dubious that this will really improve > > matters though, as the net effect is just to postpone I/O that will > > happen anyway soon after the checkpoint (as soon as the bgwriter goes > > back to normal activity). > > > Looks like a good idea. I don't worry too much about the problem you > mentioned. AFAIK, checkpoint has two targets: (1) cleanup buffer pool; (2) > reduce recovery time; I think its a good idea, but agree it does not save much in practice. The only buffers this will miss are ones that were clean throughout the whole of the last checkpoint cycle, yet have been dirtied between the start of the checkpoint pass and when the pass reaches it. Given the relative durations of those two intervals, I would guess that this would yield very few buffers. Further, if you miss a buffer on one checkpoint it will not be able to avoid being written at the next. If we write the buffer again in next checkpoint cycle then we combine the two I/Os and save effort. If the buffer is not written to in the next cycle, and this seems likely since it wasn't written to in the last, we do not avoid I/O, we just defer it to next checkpoint. So the only buffer I/O we would save is for buffers that - are not written to in checkpoint cycle, n (by definition) - are written to *during* the checkpoint - are written to again during the next checkpoint cycle, n+1 You could do math, or measure that, though my guess is that this wouldn't save more than a few percentage points on the checkpoint process. To compile the list, you'd need to stop all buffer write activity while you compile it, which sounds a high price for the benefit. > For (2), it is clear that the above idea will work since the recovery will > always read the data page to check its LSN -- the is the source of the > cost. For (1), we have bgwriter, and part of reason it is desiged is to > cleanup buffer pool. Deferring I/O gains us nothing in the long run, though would speed up recovery time by a fraction - but then crash recovery time is not much an issue is it? If it is, there are other optimizations. Best Regards, Simon Riggs
pgsql-hackers by date: