pgman wrote:
> Curtis Faith wrote:
> > Back-end servers would not issue fsync calls. They would simply block
> > waiting until the LogWriter had written their record to the disk, i.e.
> > until the sync'd block # was greater than the block that contained the
> > XLOG_XACT_COMMIT record. The LogWriter could wake up committed back-
> > ends after its log write returns.
> >
> > The log file would be opened O_DSYNC, O_APPEND every time. The LogWriter
> > would issue writes of the optimal size when enough data was present or
> > of smaller chunks if enough time had elapsed since the last write.
>
> So every backend is to going to wait around until its fsync gets done by
> the backend process? How is that a win? This is just another version
> of our GUC parameters:
>
> #commit_delay = 0 # range 0-100000, in microseconds
> #commit_siblings = 5 # range 1-1000
>
> which attempt to delay fsync if other backends are nearing commit.
> Pushing things out to another process isn't a win; figuring out if
> someone else is coming for commit is. Remember, write() is fast, fsync
> is slow.
Let me add to what I just said:
While the above idea doesn't win for normal operation, because each
backend waits for the fsync, and we have no good way of determining of
other backends are nearing commit, a background WAL fsync process would
be nice if we wanted an option between fsync on (wait for fsync before
reporting commit), and fsync off (no crash recovery).
We could have a mode where we did an fsync every X milliseconds, so we
issue a COMMIT to the client, but wait a few milliseconds before
fsync'ing. Many other databases have such a mode, but we don't, and I
always felt it would be valuable. It may allow us to remove the fsync
option in favor of one that has _some_ crash recovery.
-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610)
359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square,
Pennsylvania19073