Re: Background writer process - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Background writer process
Date
Msg-id 200311172353.hAHNro604386@candle.pha.pa.us
Whole thread Raw
In response to Re: Background writer process  (Shridhar Daithankar <shridhar_daithankar@myrealbox.com>)
Responses Re: Background writer process  (Shridhar Daithankar <shridhar_daithankar@myrealbox.com>)
List pgsql-hackers
Shridhar Daithankar wrote:
> On Friday 14 November 2003 22:10, Jan Wieck wrote:
> > Shridhar Daithankar wrote:
> > > On Friday 14 November 2003 03:05, Jan Wieck wrote:
> > >> For sure the sync() needs to be replaced by the discussed fsync() of
> > >> recently written files. And I think the algorithm how much and how often
> > >> to flush can be significantly improved. But after all, this does not
> > >> change the real checkpointing at all, and the general framework having a
> > >> separate process is what we probably want.
> > >
> > > Having fsync for regular data files and sync for WAL segment a
> > > comfortable compromise?  Or this is going to use fsync for all of them.
> > >
> > > IMO, with fsync, we tell kernel that you can write this buffer. It may or
> > > may not write it immediately, unless it is hard sync.
> >
> > I think it's more the other way around. On some systems sync() might
> > return before all buffers are flushed to disk, while fsync() does not.
> 
> Oops.. that's bad.

Yes, one I idea I had was to do an fsync on a new file _after_ issuing
sync, hoping that this will complete after all the sync buffers are
done.

> > > Since postgresql can afford lazy writes for data files, I think this
> > > could work.
> >
> > The whole point of a checkpoint is to know for certain that a specific
> > change is in the datafile, so that it is safe to throw away older WAL
> > segments.
> 
> I just made another posing on patches for a thread crossing win32-devel.
> 
> Essentially I said
> 
> 1. Open WAL files with O_SYNC|O_DIRECT or O_SYNC(Not sure if current code does 
> it. The hackery in xlog.c is not exactly trivial.)

We write WAL, then fsync, so if we write multiple blocks, we can write
them and fsync once, rather than O_SYNC every write.

> 2. Open data files normally and fsync them only in background writer process.
> 
> Now BGWriter process will flush everything at the time of checkpointing. It 
> does not need to flush WAL because of O_SYNC(ideally but an additional fsync 
> won't hurt). So it just flushes all the file descriptors touched since last 
> checkpoint, which should not be much of a load because it is flushing those 
> files intermittently anyways.
> 
> It could also work nicely if only background writer fsync the data files. 
> Backends can either wait or proceed to other business by the time disk is 
> flushed. Backends needs to wait for certain while committing and it should be 
> rather small delay of syncing to disk in current process as opposed to in  
> background process. 
> 
> In case of commit, BGWriter could get away with files touched in transaction
> +WAL as opposed to all files touched since last checkpoint+WAL in case of 
> checkpoint. I don't know how difficult that would be.
> 
> What is different in current BGwriter implementation? Use of sync()?

Well, basically we are still discussing how to do this.  Right now the
backend writer patch uses sync(), but the final version will use fsync
or O_SYNC, or maybe nothing.

The open items are whether a background process can keep the dirty
buffers cleaned fast enough to keep up with the maximum number of
backends.  We might need to use multiple processes or threads to do
this.   We certainly will have a background writer in 7.5 --- the big
question is whether _all_ write will go through it.   It certainly would
be nice if it could, and Tom thinks it can, so we are still exploring
this.

If the background writer uses fsync, it can write and allow the buffer
to be reused and fsync later, while if we use O_SYNC, we have to wait
for the O_SYNC write to happen before reusing the buffer;  that will be
slower.

Another open issue is _if_ the backend writer can't keep up with the
normal backends, do we allow normal backends to write dirty buffers, and
do they use fsync(), or can we record the file in a shared area and have
the background writer do the fsync.  This is the issue of whether one
process can fsync all dirty buffers for the file or just the buffers it
wrote.

I think this is these are the basics of the current discussion.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: [pgsql-advocacy] Not 7.5, but 8.0 ?
Next
From: "Marc G. Fournier"
Date:
Subject: Re: Release cycle length