Thread: XLogFlush

XLogFlush

From
Jeff Janes
Date:
Maybe this is one of those things that is obvious when someone points
it out to you, but right now I am not seeing it.  If you look at the
last eight lines of this snippet from XLogFlush, you see that if we
obtain WriteRqstPtr under the WALInsertLock, then we both write and
flush up to the highest write request.  But if we obtain it under the
info_lck, then we write up to the highest write request but flush only
up to our own records flush request.  Why the disparate treatment?
The effect of this seems to be that when WALInsertLock is busy, group
commits are suppressed.

if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE)){    XLogCtlInsert *Insert = &XLogCtl->Insert;    uint32
 freespace = INSERT_FREESPACE(Insert);        if (freespace < SizeOfXLogRecord)        /* buffer is full */
WriteRqstPtr= XLogCtl->xlblocks[Insert->curridx];    else    {        WriteRqstPtr =
XLogCtl->xlblocks[Insert->curridx];       WriteRqstPtr.xrecoff -= freespace;    }    LWLockRelease(WALInsertLock);
WriteRqst.Write= WriteRqstPtr;    WriteRqst.Flush = WriteRqstPtr;}else{    WriteRqst.Write = WriteRqstPtr;
WriteRqst.Flush= record;}
 

Cheers,

Jeff


Re: XLogFlush

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> Maybe this is one of those things that is obvious when someone points
> it out to you, but right now I am not seeing it.  If you look at the
> last eight lines of this snippet from XLogFlush, you see that if we
> obtain WriteRqstPtr under the WALInsertLock, then we both write and
> flush up to the highest write request.  But if we obtain it under the
> info_lck, then we write up to the highest write request but flush only
> up to our own records flush request.  Why the disparate treatment?

I think the point of the check within the info_lck section is that the
global Write pointer must not be allowed to go backward.  It's likely
unnecessary though, since there is probably a defense against that
in XLogWrite (or if not there should be).

The other bit of the reasoning that doesn't seem well commented is
that if we can't find out what the global status is (because of failure
to acquire the insert lock), we should just do the work we know we need,
not guess at some greater requirement.
        regards, tom lane


Re: XLogFlush

From
"simon@2ndquadrant.com"
Date:

On 21 August 2009 at 10:18 Jeff Janes <jeff.janes@gmail.com> wrote:

> The effect of this seems to be that when WALInsertLock is busy, group
> commits are suppressed.

Agreed, but its not a place to look at just yet since this is changing as part of sync rep patch.

We do need to change this to make group commit work better.

Glad to see someone willing to get involved.

Best Regards, Simon Riggs

Re: XLogFlush

From
Jeff Janes
Date:
On Fri, Aug 21, 2009 at 1:18 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
Maybe this is one of those things that is obvious when someone points
it out to you, but right now I am not seeing it.  If you look at the
last eight lines of this snippet from XLogFlush, you see that if we
obtain WriteRqstPtr under the WALInsertLock, then we both write and
flush up to the highest write request.  But if we obtain it under the
info_lck, then we write up to the highest write request but flush only
up to our own records flush request.  Why the disparate treatment?
The effect of this seems to be that when WALInsertLock is busy, group
commits are suppressed.

I realized I was misinterpreting this.  XLogWrite doesn't just flush up to WriteRqst.Flush, because fsync doesn't work that way.  If it flushes at all (which I think it always will when invoked from XLogFlush, as otherwise XLogFlush would not call it), it will flush up to WriteRqst.Write anyway, even if WriteRqst.Flush is behind.  So as long as record <= WriteRqst.Flush <= WriteRqst.Write, then it doesn't matter exactly what WriteRqst.Flush is.  The problem with group commit on a busy WALInsertLock is that if the xlogctl->LogwrtRqst.Write does get advanced by someone else, it is almost surely going to be while we are waiting on the WALWriteLock, and so too late for us to have discovered it when we previously checked under the protection of info_lck.  We should probably have an else branch on the LWLockConditionalAcquire so that if it fails, we get the info_lck and check again for advancement of xlogctl->LogwrtRqst.Write. 

But since Simon is doing big changes as part of sync rep, I'll hold off on doing much experimentation on this until then.

 
               LWLockRelease(WALInsertLock);
               WriteRqst.Write = WriteRqstPtr;
               WriteRqst.Flush = WriteRqstPtr;
       }
       else
       {
               WriteRqst.Write = WriteRqstPtr;
               WriteRqst.Flush = record;
       }

Cheers,

Jeff