Re: Background writer process - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Background writer process |
Date | |
Msg-id | 200311172353.hAHNro604386@candle.pha.pa.us Whole thread Raw |
In response to | Re: Background writer process (Shridhar Daithankar <shridhar_daithankar@myrealbox.com>) |
Responses |
Re: Background writer process
|
List | pgsql-hackers |
Shridhar Daithankar wrote: > On Friday 14 November 2003 22:10, Jan Wieck wrote: > > Shridhar Daithankar wrote: > > > On Friday 14 November 2003 03:05, Jan Wieck wrote: > > >> For sure the sync() needs to be replaced by the discussed fsync() of > > >> recently written files. And I think the algorithm how much and how often > > >> to flush can be significantly improved. But after all, this does not > > >> change the real checkpointing at all, and the general framework having a > > >> separate process is what we probably want. > > > > > > Having fsync for regular data files and sync for WAL segment a > > > comfortable compromise? Or this is going to use fsync for all of them. > > > > > > IMO, with fsync, we tell kernel that you can write this buffer. It may or > > > may not write it immediately, unless it is hard sync. > > > > I think it's more the other way around. On some systems sync() might > > return before all buffers are flushed to disk, while fsync() does not. > > Oops.. that's bad. Yes, one I idea I had was to do an fsync on a new file _after_ issuing sync, hoping that this will complete after all the sync buffers are done. > > > Since postgresql can afford lazy writes for data files, I think this > > > could work. > > > > The whole point of a checkpoint is to know for certain that a specific > > change is in the datafile, so that it is safe to throw away older WAL > > segments. > > I just made another posing on patches for a thread crossing win32-devel. > > Essentially I said > > 1. Open WAL files with O_SYNC|O_DIRECT or O_SYNC(Not sure if current code does > it. The hackery in xlog.c is not exactly trivial.) We write WAL, then fsync, so if we write multiple blocks, we can write them and fsync once, rather than O_SYNC every write. > 2. Open data files normally and fsync them only in background writer process. > > Now BGWriter process will flush everything at the time of checkpointing. It > does not need to flush WAL because of O_SYNC(ideally but an additional fsync > won't hurt). So it just flushes all the file descriptors touched since last > checkpoint, which should not be much of a load because it is flushing those > files intermittently anyways. > > It could also work nicely if only background writer fsync the data files. > Backends can either wait or proceed to other business by the time disk is > flushed. Backends needs to wait for certain while committing and it should be > rather small delay of syncing to disk in current process as opposed to in > background process. > > In case of commit, BGWriter could get away with files touched in transaction > +WAL as opposed to all files touched since last checkpoint+WAL in case of > checkpoint. I don't know how difficult that would be. > > What is different in current BGwriter implementation? Use of sync()? Well, basically we are still discussing how to do this. Right now the backend writer patch uses sync(), but the final version will use fsync or O_SYNC, or maybe nothing. The open items are whether a background process can keep the dirty buffers cleaned fast enough to keep up with the maximum number of backends. We might need to use multiple processes or threads to do this. We certainly will have a background writer in 7.5 --- the big question is whether _all_ write will go through it. It certainly would be nice if it could, and Tom thinks it can, so we are still exploring this. If the background writer uses fsync, it can write and allow the buffer to be reused and fsync later, while if we use O_SYNC, we have to wait for the O_SYNC write to happen before reusing the buffer; that will be slower. Another open issue is _if_ the backend writer can't keep up with the normal backends, do we allow normal backends to write dirty buffers, and do they use fsync(), or can we record the file in a shared area and have the background writer do the fsync. This is the issue of whether one process can fsync all dirty buffers for the file or just the buffers it wrote. I think this is these are the basics of the current discussion. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql-hackers by date: