Re: RC2 and open issues - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: RC2 and open issues
Date
Msg-id 200412272256.iBRMuXf14798@candle.pha.pa.us
Whole thread Raw
In response to Re: RC2 and open issues  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
Greg Stark wrote:
> 
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > Suppose that you run a checkpoint every 5 minutes, and with the knob
> > you slow down the checkpoint to extend over say 3 minutes on average,
> > rather than the normal blast-it-out-as-fast-as-possible.  Then you'll
> > be keeping an average of 8 minutes worth of WAL files instead of 5.
> > Not exactly a killer objection.
> 
> Right. I was thinking that the goal would be to spread the checkpoint out over
> exactly the checkpoint interval, minus some safety factor. So if it has some
> estimate of the total number of dirty buffers that need flushing it could just
> divide the checkpoint interval by that and calculate the delay needed to
> finish in some fraction of the checkpoint interval, 60% seems like a
> reasonable guess.
> 
> > One issue is that while we can regulate the rate at which we issue
> > write()s, we still have to issue fsync()s at the end, and we can't
> > control what happens in response to those.  It's quite possible that
> > all the I/O would happen in response to the fsync()s anyway, in which
> > case the whole exercise would be a waste of time.
> 
> Well you could fsync earlier as well, say just before whenever you sleep.
> Obviously the delay on the checkpoint process doesn't matter to performance if
> it's about to sleep. It could end up scheduling i/o earlier than necessary and
> cause redundant seeks but then I guess that's an inherent tension between
> trying to spread out the i/o evenly and trying to get the ideal ordering of
> i/o.

It certainly is an interesting idea to have the checkpoint span a longer
time period.  We couldn't do that with sync, but now that we fsync each
file it is possible.

It would be easy do this if we didn't also need the fsync.  The original
idea was that we would write() the dirty buffers long before the
checkpoint, and the kernel would write many of these dirty buffers
before we got to checkpoint time.

We could go with the checkpoint clock sweep idea but then we aren't
writing them but actually doing write/fsync a lot more.  I can't think
of a way this would be a win.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: John Hansen
Date:
Subject: Re: Bgwriter behavior
Next
From: Bruce Momjian
Date:
Subject: Re: Bgwriter behavior