Re: Background writer process - Mailing list pgsql-hackers

From Shridhar Daithankar
Subject Re: Background writer process
Date
Msg-id 200311171403.38713.shridhar_daithankar@myrealbox.com
Whole thread Raw
In response to Re: Background writer process  (Jan Wieck <JanWieck@Yahoo.com>)
Responses Re: Background writer process  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
On Friday 14 November 2003 22:10, Jan Wieck wrote:
> Shridhar Daithankar wrote:
> > On Friday 14 November 2003 03:05, Jan Wieck wrote:
> >> For sure the sync() needs to be replaced by the discussed fsync() of
> >> recently written files. And I think the algorithm how much and how often
> >> to flush can be significantly improved. But after all, this does not
> >> change the real checkpointing at all, and the general framework having a
> >> separate process is what we probably want.
> >
> > Having fsync for regular data files and sync for WAL segment a
> > comfortable compramise?  Or this is going to use fsync for all of them.
> >
> > IMO, with fsync, we tell kernel that you can write this buffer. It may or
> > may not write it immediately, unless it is hard sync.
>
> I think it's more the other way around. On some systems sync() might
> return before all buffers are flushed to disk, while fsync() does not.

Oops.. that's bad.

> > Since postgresql can afford lazy writes for data files, I think this
> > could work.
>
> The whole point of a checkpoint is to know for certain that a specific
> change is in the datafile, so that it is safe to throw away older WAL
> segments.

I just made another posing on patches for a thread crossing win32-devel.

Essentially I said

1. Open WAL files with O_SYNC|O_DIRECT or O_SYNC(Not sure if current code does 
it. The hackery in xlog.c is not exactly trivial.)
2. Open data files normally and fsync them only in background writer process.

Now BGWriter process will flush everything at the time of checkpointing. It 
does not need to flush WAL because of O_SYNC(ideally but an additional fsync 
won't hurt). So it just flushes all the file decriptors touched since last 
checkpoint, which should not be much of a load because it is flushing those 
files intermittently anyways.

It could also work nicely if only background writer fsync the data files. 
Backends can either wait or proceed to other business by the time disk is 
flushed. Backends needs to wait for certain while committing and it should be 
rather small delay of syncing to disk in current process as opposed to in  
background process. 

In case of commit, BGWriter could get away with files touched in transaction
+WAL as opposed to all files touched since last checkpoint+WAL in case of 
chekpoint. I don't know how difficult that would be.

What is different in currrent BGwriter implementation? Use of sync()?
Shridhar



pgsql-hackers by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: start of transaction (was: Re: [PERFORM] Help with
Next
From: Tommi Maekitalo
Date:
Subject: Re: Release now live ...