Re: Checkpoint question - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Checkpoint question
Date
Msg-id 1137056890.3180.7.camel@localhost.localdomain
Whole thread Raw
In response to Re: Checkpoint question  (Qingqing Zhou <zhouqq@cs.toronto.edu>)
Responses Re: Checkpoint question
Re: Checkpoint question
List pgsql-hackers
On Wed, 2006-01-11 at 22:33 -0500, Qingqing Zhou wrote: 
> On Wed, 11 Jan 2006, Tom Lane wrote:

> > It'd be possible to do something like this: after establishing
> > RedoRecPtr, make one quick pass through the buffers and make a list of
> > what needs to be dumped at that instant.  Then go back and do the actual
> > I/O for only those buffers.  I'm dubious that this will really improve
> > matters though, as the net effect is just to postpone I/O that will
> > happen anyway soon after the checkpoint (as soon as the bgwriter goes
> > back to normal activity).
> >
> Looks like a good idea. I don't worry too much about the problem you
> mentioned. AFAIK, checkpoint has two targets: (1) cleanup buffer pool; (2)
> reduce recovery time;

I think its a good idea, but agree it does not save much in practice.

The only buffers this will miss are ones that were clean throughout the
whole of the last checkpoint cycle, yet have been dirtied between the
start of the checkpoint pass and when the pass reaches it. Given the
relative durations of those two intervals, I would guess that this would
yield very few buffers.

Further, if you miss a buffer on one checkpoint it will not be able to
avoid being written at the next. If we write the buffer again in next
checkpoint cycle then we combine the two I/Os and save effort. If the
buffer is not written to in the next cycle, and this seems likely since
it wasn't written to in the last, we do not avoid I/O, we just defer it
to next checkpoint. 

So the only buffer I/O we would save is for buffers that
- are not written to in checkpoint cycle, n (by definition)
- are written to *during* the checkpoint
- are written to again during the next checkpoint cycle, n+1

You could do math, or measure that, though my guess is that this
wouldn't save more than a few percentage points on the checkpoint
process.

To compile the list, you'd need to stop all buffer write activity while
you compile it, which sounds a high price for the benefit.

> For (2), it is clear that the above idea will work since the recovery will
> always read the data page to check its LSN -- the is the source of the
> cost. For (1), we have bgwriter, and part of reason it is desiged is to
> cleanup buffer pool.

Deferring I/O gains us nothing in the long run, though would speed up
recovery time by a fraction - but then crash recovery time is not much
an issue is it? If it is, there are other optimizations. 

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Checkpoint question
Next
From: Tom Lane
Date:
Subject: Re: Checkpoint question