Re: Background writer process - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: Background writer process
Date
Msg-id 3FB41ABE.8010304@Yahoo.com
Whole thread Raw
In response to Re: Background writer process  (Kurt Roeckx <Q@ping.be>)
Responses Re: Background writer process  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
Kurt Roeckx wrote:
> On Thu, Nov 13, 2003 at 05:39:32PM -0500, Bruce Momjian wrote:
>> Jan Wieck wrote:
>> > Bruce Momjian wrote:
>> > > He found that write() itself didn't encourage the kernel to write the
>> > > buffers to disk fast enough.  I think the final solution will be to use
>> > > fsync or O_SYNC.
>> > 
>> > write() alone doesn't encourage the kernel to do any physical IO at all. 
>> > As long as you have enough OS buffers, it does happy write caching until 
>> > you checkpoint and sync(), and then the system freezes.
>> 
>> That's not completely true.  Some kernels with trickle sync, meaning
>> they sync a little bit regularly rather than all at once so write() does
>> help get those shared buffers into the kernel for possible writing. 
>> Also, it is possible the kernel will issue a sync() on its own.
> 
> So basicly on some kernels you want them to flush their dirty
> buffers faster.
> 
> I have a feeling we should more make it depend on the system how
> we ask them not to keep it in memory too long and that maybe the
> sync(), fsync() or O_SYNC could be a fallback in case it's needed
> and there are no better ways of doing it.
> 
> Maybe something as posix_fadvise() might be useful too on systems
> that have it?

That is all right and as said, how often, how much and how forced we do 
the IO can all be configurable and as flexible as people see fit. But 
whether you use sync(), fsync(), fdatasync(), O_SYNC, O_DSYNC or 
posix_fadvise(), somewhere you have to do the write(). And that write 
has to be coordinated with the buffer cache replacement strategy so that 
you write those buffers that are likely to be replaced soon, and don't 
write those that the strategy thinks keeping for longer anyway. Except 
at a checkpoint, then you have to write whatever is dirty.

The patch I posted does this write() in coordination with the strategy 
in a separate background process, so that the regular backends don't 
have to write under normal circumstances (there are some places in DDL 
statements that call BufferSync(), that's exceptions IMHO). Can we agree 
on this general outline? Or do we have any better proposals?


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



pgsql-hackers by date:

Previous
From: Kurt Roeckx
Date:
Subject: Re: Background writer process
Next
From: Tom Lane
Date:
Subject: Re: cvs head? initdb?