I spent a little time reviewing the xlog.c logic, which I hadn't looked
at in awhile. I see I made a mistake earlier: I claimed that only when
a backend wanted to commit or ran out of space in the WAL buffers would
it issue any write(). This is not true: there is code in XLogInsert()
that will try to issue write() if the WAL buffers are more than half
full:
/* * If cache is half filled then try to acquire write lock and do * XLogWrite. Ignore any fractional blocks
inperforming this check. */ LogwrtRqst.Write.xrecoff -= LogwrtRqst.Write.xrecoff % BLCKSZ; if
(LogwrtRqst.Write.xlogid!= LogwrtResult.Write.xlogid || (LogwrtRqst.Write.xrecoff >= LogwrtResult.Write.xrecoff +
XLogCtl->XLogCacheByte / 2)) { if (LWLockConditionalAcquire(WALWriteLock, LW_EXCLUSIVE)) {
LogwrtResult = XLogCtl->Write.LogwrtResult; if (XLByteLT(LogwrtResult.Write, LogwrtRqst.Write))
XLogWrite(LogwrtRqst); LWLockRelease(WALWriteLock); } }
Because of the "conditional acquire" call, this will not block if
someone else is currently doing a WAL write or fsync, but will just
fall through in that case. However, if the code does acquire the
lock then the backend will issue some writes --- synchronously, if
O_SYNC or O_DSYNC mode is being used. It would be better to remove
this code and allow a background process to issue writes for filled
WAL pages.
Note this is done before acquiring WALInsertLock, so it does not block
other would-be inserters of WAL records.
regards, tom lane