Home > mailing lists

Re: Checkpoint question - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Checkpoint question
Date	January 12, 2006 05:08:32
Msg-id	1137056890.3180.7.camel@localhost.localdomain Whole thread
In response to	Re: Checkpoint question (Qingqing Zhou <zhouqq@cs.toronto.edu>)
Responses	Re: Checkpoint question Re: Checkpoint question
List	pgsql-hackers

Tree view

On Wed, 2006-01-11 at 22:33 -0500, Qingqing Zhou wrote: 
> On Wed, 11 Jan 2006, Tom Lane wrote:

> > It'd be possible to do something like this: after establishing
> > RedoRecPtr, make one quick pass through the buffers and make a list of
> > what needs to be dumped at that instant.  Then go back and do the actual
> > I/O for only those buffers.  I'm dubious that this will really improve
> > matters though, as the net effect is just to postpone I/O that will
> > happen anyway soon after the checkpoint (as soon as the bgwriter goes
> > back to normal activity).
> >
> Looks like a good idea. I don't worry too much about the problem you
> mentioned. AFAIK, checkpoint has two targets: (1) cleanup buffer pool; (2)
> reduce recovery time;

I think its a good idea, but agree it does not save much in practice.

The only buffers this will miss are ones that were clean throughout the
whole of the last checkpoint cycle, yet have been dirtied between the
start of the checkpoint pass and when the pass reaches it. Given the
relative durations of those two intervals, I would guess that this would
yield very few buffers.

Further, if you miss a buffer on one checkpoint it will not be able to
avoid being written at the next. If we write the buffer again in next
checkpoint cycle then we combine the two I/Os and save effort. If the
buffer is not written to in the next cycle, and this seems likely since
it wasn't written to in the last, we do not avoid I/O, we just defer it
to next checkpoint. 

So the only buffer I/O we would save is for buffers that
- are not written to in checkpoint cycle, n (by definition)
- are written to *during* the checkpoint
- are written to again during the next checkpoint cycle, n+1

You could do math, or measure that, though my guess is that this
wouldn't save more than a few percentage points on the checkpoint
process.

To compile the list, you'd need to stop all buffer write activity while
you compile it, which sounds a high price for the benefit.

> For (2), it is clear that the above idea will work since the recovery will
> always read the data page to check its LSN -- the is the source of the
> cost. For (1), we have bgwriter, and part of reason it is desiged is to
> cleanup buffer pool.

Deferring I/O gains us nothing in the long run, though would speed up
recovery time by a fraction - but then crash recovery time is not much
an issue is it? If it is, there are other optimizations. 

Best Regards, Simon Riggs

pgsql-hackers by date:

From: Simon Riggs
Date: 12 January 2006, 04:58:52
Subject: Re: Checkpoint question

From: Tom Lane
Date: 12 January 2006, 05:15:55
Subject: Re: Checkpoint question

Re: Checkpoint question - Mailing list pgsql-hackers

Previous

Next