Thread: XLogFlush
Maybe this is one of those things that is obvious when someone points it out to you, but right now I am not seeing it. If you look at the last eight lines of this snippet from XLogFlush, you see that if we obtain WriteRqstPtr under the WALInsertLock, then we both write and flush up to the highest write request. But if we obtain it under the info_lck, then we write up to the highest write request but flush only up to our own records flush request. Why the disparate treatment? The effect of this seems to be that when WALInsertLock is busy, group commits are suppressed. if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE)){ XLogCtlInsert *Insert = &XLogCtl->Insert; uint32 freespace = INSERT_FREESPACE(Insert); if (freespace < SizeOfXLogRecord) /* buffer is full */ WriteRqstPtr= XLogCtl->xlblocks[Insert->curridx]; else { WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx]; WriteRqstPtr.xrecoff -= freespace; } LWLockRelease(WALInsertLock); WriteRqst.Write= WriteRqstPtr; WriteRqst.Flush = WriteRqstPtr;}else{ WriteRqst.Write = WriteRqstPtr; WriteRqst.Flush= record;} Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > Maybe this is one of those things that is obvious when someone points > it out to you, but right now I am not seeing it. If you look at the > last eight lines of this snippet from XLogFlush, you see that if we > obtain WriteRqstPtr under the WALInsertLock, then we both write and > flush up to the highest write request. But if we obtain it under the > info_lck, then we write up to the highest write request but flush only > up to our own records flush request. Why the disparate treatment? I think the point of the check within the info_lck section is that the global Write pointer must not be allowed to go backward. It's likely unnecessary though, since there is probably a defense against that in XLogWrite (or if not there should be). The other bit of the reasoning that doesn't seem well commented is that if we can't find out what the global status is (because of failure to acquire the insert lock), we should just do the work we know we need, not guess at some greater requirement. regards, tom lane
On 21 August 2009 at 10:18 Jeff Janes <jeff.janes@gmail.com> wrote:
> The effect of this seems to be that when WALInsertLock is busy, group
> commits are suppressed.
Agreed, but its not a place to look at just yet since this is changing as part of sync rep patch.
We do need to change this to make group commit work better.
Glad to see someone willing to get involved.
Best Regards, Simon Riggs
On Fri, Aug 21, 2009 at 1:18 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
I realized I was misinterpreting this. XLogWrite doesn't just flush up to WriteRqst.Flush, because fsync doesn't work that way. If it flushes at all (which I think it always will when invoked from XLogFlush, as otherwise XLogFlush would not call it), it will flush up to WriteRqst.Write anyway, even if WriteRqst.Flush is behind. So as long as record <= WriteRqst.Flush <= WriteRqst.Write, then it doesn't matter exactly what WriteRqst.Flush is. The problem with group commit on a busy WALInsertLock is that if the xlogctl->LogwrtRqst.Write does get advanced by someone else, it is almost surely going to be while we are waiting on the WALWriteLock, and so too late for us to have discovered it when we previously checked under the protection of info_lck. We should probably have an else branch on the LWLockConditionalAcquire so that if it fails, we get the info_lck and check again for advancement of xlogctl->LogwrtRqst.Write.
But since Simon is doing big changes as part of sync rep, I'll hold off on doing much experimentation on this until then.
Cheers,
Jeff
Maybe this is one of those things that is obvious when someone points
it out to you, but right now I am not seeing it. If you look at the
last eight lines of this snippet from XLogFlush, you see that if we
obtain WriteRqstPtr under the WALInsertLock, then we both write and
flush up to the highest write request. But if we obtain it under the
info_lck, then we write up to the highest write request but flush only
up to our own records flush request. Why the disparate treatment?
The effect of this seems to be that when WALInsertLock is busy, group
commits are suppressed.
I realized I was misinterpreting this. XLogWrite doesn't just flush up to WriteRqst.Flush, because fsync doesn't work that way. If it flushes at all (which I think it always will when invoked from XLogFlush, as otherwise XLogFlush would not call it), it will flush up to WriteRqst.Write anyway, even if WriteRqst.Flush is behind. So as long as record <= WriteRqst.Flush <= WriteRqst.Write, then it doesn't matter exactly what WriteRqst.Flush is. The problem with group commit on a busy WALInsertLock is that if the xlogctl->LogwrtRqst.Write does get advanced by someone else, it is almost surely going to be while we are waiting on the WALWriteLock, and so too late for us to have discovered it when we previously checked under the protection of info_lck. We should probably have an else branch on the LWLockConditionalAcquire so that if it fails, we get the info_lck and check again for advancement of xlogctl->LogwrtRqst.Write.
But since Simon is doing big changes as part of sync rep, I'll hold off on doing much experimentation on this until then.
LWLockRelease(WALInsertLock);
WriteRqst.Write = WriteRqstPtr;
WriteRqst.Flush = WriteRqstPtr;
}
else
{
WriteRqst.Write = WriteRqstPtr;
WriteRqst.Flush = record;
}
Cheers,
Jeff