Re: [PATCH] introduce XLogLockBlockRangeForCleanup() - Mailing list pgsql-hackers

From Abhijit Menon-Sen
Subject Re: [PATCH] introduce XLogLockBlockRangeForCleanup()
Date
Msg-id 20150623130909.GA13652@toroid.org
Whole thread Raw
In response to Re: [PATCH] introduce XLogLockBlockRangeForCleanup()  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
At 2014-08-20 11:07:44 +0300, hlinnakangas@vmware.com wrote:
>
> I don't think the new GetBufferWithoutRelcache function is in line
> with the existing ReadBuffer API. I think it would be better to add a
> new ReadBufferMode - RBM_CACHE_ONLY? - that only returns the buffer if
> it's already in cache, and InvalidBuffer otherwise.

Hi Heikki.

I liked the idea of having a new ReadBufferMode of RBM_CACHE_ONLY, but I
wasn't happy with the result when I tried to modify the code to suit. It
didn't make the code easier for me to follow.

(I'll say straight up that it can be done the way you said, and if the
consensus is that it would be an improvement, I'll do it that way. I'm
just not convinced about it, hence this mail.)

For example:

> static Buffer
> ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
>                   BlockNumber blockNum, ReadBufferMode mode,
>                   BufferAccessStrategy strategy, bool *hit)
> {
>     volatile BufferDesc *bufHdr;
>     Block       bufBlock;
>     bool        found;
>     bool        isExtend;
>     bool        isLocalBuf = SmgrIsTemp(smgr);
>
>     *hit = false;
>
>     /* Make sure we will have room to remember the buffer pin */
>     ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
>
>     isExtend = (blockNum == P_NEW);

isLocalBuf and isExtend will both be false for the RBM_CACHE_ONLY case,
which is good for us. But that probably needs to be mentioned in a
comment here.

>     TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum,
>                                        smgr->smgr_rnode.node.spcNode,
>                                        smgr->smgr_rnode.node.dbNode,
>                                        smgr->smgr_rnode.node.relNode,
>                                        smgr->smgr_rnode.backend,
>                                        isExtend);

We know we're not going to be doing IO in the RBM_CACHE_ONLY case, but
that's probably OK? I don't understand this TRACE_* stuff. But we need
to either skip this, or also do the corresponding TRACE_*_DONE later.

>     if (isLocalBuf)
>     {
>         bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
>         if (found)
>             pgBufferUsage.local_blks_hit++;
>         else
>             pgBufferUsage.local_blks_read++;
>     }
>     else
>     {
>         /*
>          * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
>          * not currently in memory.
>          */
>         bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
>                              strategy, &found);
>         if (found)
>             pgBufferUsage.shared_blks_hit++;
>         else
>             pgBufferUsage.shared_blks_read++;
>     }

The nicest option I could think of here was to copy the shared_buffers
lookup from BufferAlloc into a new BufferAlloc_shared function, and add
a new branch for RBM_CACHE_ONLY. That would look like:

    if (isLocalBuf)
        …
    else if (mode == RBM_CACHE_ONLY)
    {
        /*
         * We check to see if the buffer is already cached, and if it's
         * not, we return InvalidBuffer because we know it's not pinned.
         */
        bufHdr = BufferAlloc_shared(…, &found);
        if (!found)
            return InvalidBuffer;
        LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr));
        return BufferDescriptorGetBuffer(bufHdr);
    }
    else
    {
        /*
         * lookup the buffer …
    }

…or if we went with the rest of the code and didn't do the lock/return
immediately, we would fall through to this code:

>     /* At this point we do NOT hold any locks. */
>
>     /* if it was already in the buffer pool, we're done */
>     if (found)
>     {
>         if (!isExtend)
>         {
>             /* Just need to update stats before we exit */
>             *hit = true;
>             VacuumPageHit++;
>
>             if (VacuumCostActive)
>                 VacuumCostBalance += VacuumCostPageHit;
>
>             TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
>                                               smgr->smgr_rnode.node.spcNode,
>                                               smgr->smgr_rnode.node.dbNode,
>                                               smgr->smgr_rnode.node.relNode,
>                                               smgr->smgr_rnode.backend,
>                                               isExtend,
>                                               found);
>
>             /*
>              * In RBM_ZERO_AND_LOCK mode the caller expects the page to be
>              * locked on return.
>              */
>             if (!isLocalBuf)
>             {
>                 if (mode == RBM_ZERO_AND_LOCK)
>                     LWLockAcquire(bufHdr->content_lock, LW_EXCLUSIVE);
>                 else if (mode == RBM_ZERO_AND_CLEANUP_LOCK)
>                     LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr));
>             }
>
>             return BufferDescriptorGetBuffer(bufHdr);
>         }

Some of this isn't applicable to RBM_CACHE_ONLY, so we could either skip
it until we reach the RBM_ZERO_AND_CLEANUP_LOCK part, or add a new
branch before the «if (!isExtend)» one.

In summary, we're either adding one new branch that handles everything
related to RBM_CACHE_ONLY and side-stepping some things on the way that
happen to not apply (isLocalBuf, isExtend, TRACE_*), or we're adding a
bunch of tests to ReadBuffer_common.

But splitting BufferAlloc into BufferAlloc_shared and the rest of
BufferAlloc doesn't look especially nice either. If you look at the
shared_buffers lookup from BufferAlloc:

>     /* see if the block is in the buffer pool already */
>     LWLockAcquire(newPartitionLock, LW_SHARED);
>     buf_id = BufTableLookup(&newTag, newHash);
>     if (buf_id >= 0)
>     {
>         /*
>          * Found it.  Now, pin the buffer so no one can steal it from the
>          * buffer pool, and check to see if the correct data has been loaded
>          * into the buffer.
>          */
>         buf = GetBufferDescriptor(buf_id);
>
>         valid = PinBuffer(buf, strategy);
>
>         /* Can release the mapping lock as soon as we've pinned it */
>         LWLockRelease(newPartitionLock);
>
>         *foundPtr = TRUE;
>
>         if (!valid)
>         {
>             /*
>              * We can only get here if (a) someone else is still reading in
>              * the page, or (b) a previous read attempt failed.  We have to
>              * wait for any active read attempt to finish, and then set up our
>              * own read attempt if the page is still not BM_VALID.
>              * StartBufferIO does it all.
>              */
>             if (StartBufferIO(buf, true))
>             {
>                 /*
>                  * If we get here, previous attempts to read the buffer must
>                  * have failed ... but we shall bravely try again.
>                  */
>                 *foundPtr = FALSE;
>             }
>         }
>
>         return buf;
>     }
>
>     /*
>      * Didn't find it in the buffer pool.  We'll have to initialize a new
>      * buffer.  Remember to unlock the mapping lock while doing the work.
>      */
>     LWLockRelease(newPartitionLock);
>
>     /* Loop here in case we have to try another victim buffer */
>     for (;;)
>     {

There are two options here. I split out BufferAlloc_shared and either
make BufferAlloc call it, or not.

If I make BufferAlloc call it, the LWLockAcquire/Release could move to
the _shared function without confusion, but I'll either have to return
the PinBuffer BM_VALID test result via a validPtr, or repeat the test
in the caller before calling StartBufferIO. Both the INIT_BUFFERTAG and
BufTableHashCode() would also have to be in both functions, since they
are used for BufTableInsert() later in BufferAlloc.

On the other hand, if I don't make BufferAlloc call the _shared
function, then we end up with a new few-line special-case function plus
either a self-contained new branch in ReadBuffer_common, or a bunch of
RBM_CACHE_ONLY tests. That doesn't really seem an improvement over what
the original patch did, i.e. introduce a single special-case function to
handle a single special-case access.

I'm open to suggestions.

-- Abhijit

P.S. For reference, I've attached the patch that I posted earlier,
rebased to apply to master.

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: A couple of newlines missing in pg_rewind log entries
Next
From: Robert Haas
Date:
Subject: Re: upper planner path-ification