Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: [Patch] Optimize dropping of relation buffers using dlist
Date
Msg-id 20200916.115629.89842527379346862.horikyota.ntt@gmail.com
Whole thread Raw
In response to RE: [Patch] Optimize dropping of relation buffers using dlist  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
Responses Re: [Patch] Optimize dropping of relation buffers using dlist  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
Thanks for the new version. Jamison.

At Tue, 15 Sep 2020 11:11:26 +0000, "k.jamison@fujitsu.com" <k.jamison@fujitsu.com> wrote in 
> Hi, 
> 
> > BTW, I think I see one problem in the code:
> > >
> > > if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
> > > + bufHdr->tag.forkNum == forkNum[j] && tag.blockNum >=
> > > + bufHdr->firstDelBlock[j])
> > >
> > > Here, I think you need to use 'i' not 'j' for forkNum and
> > > firstDelBlock as those are arrays w.r.t forks. That might fix the
> > > problem but I am not sure as I haven't tried to reproduce it.
> > 
> > Thanks for advice. Right, that seems to be the cause of error, and fixing that
> > (using fork) solved the case.
> > I also followed the advice of Tsunakawa-san of using more meaningful
> > iterator Instead of using "i" & "j" for readability.

(FWIW, I prefer short conventional names for short-term iterator variables.)


master> * XXX currently it sequentially searches the buffer pool, should be
master> * changed to more clever ways of searching.  However, this routine
master> * is used only in code paths that aren't very performance-critical,
master> * and we shouldn't slow down the hot paths to make it faster ...

This comment needs a rewrite.


+        for (fork_num = 0; fork_num < nforks; fork_num++)
         {
             if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
-                bufHdr->tag.forkNum == forkNum[j] &&
-                bufHdr->tag.blockNum >= firstDelBlock[j])
+                bufHdr->tag.forkNum == forkNum[fork_num] &&
+                bufHdr->tag.blockNum >= firstDelBlock[fork_num])

fork_num is not actually a fork number, but the index of forkNum[].
It should be fork_idx (or just i, which I prefer..).

-            for (j = 0; j < nforks; j++)
-                DropRelFileNodeLocalBuffers(rnode.node, forkNum[j],
-                                            firstDelBlock[j]);
+            for (fork_num = 0; fork_num < nforks; fork_num++)
+                DropRelFileNodeLocalBuffers(rnode.node, forkNum[fork_num],
+                                            firstDelBlock[fork_num]);

I think we don't need to include the irrelevant refactoring in this
patch. (And I think j is better there.)

+     * We only speedup this path during recovery, because that's the only
+     * timing when we can get a valid cached value of blocks for relation.
+     * See comment in smgrnblocks() in smgr.c. Otherwise, proceed to usual
+     * buffer invalidation process (scanning of whole shared buffers).

We need an explanation of why we do this optimizaton only for the
recovery case.

+            /* Get the number of blocks for the supplied relation's fork */
+            nblocks = smgrnblocks(smgr_reln, forkNum[fork_num]);
+            Assert(BlockNumberIsValid(nblocks));
+
+            if (nblocks < BUF_DROP_FULLSCAN_THRESHOLD)

As mentioned upthread, the criteria whether we do full-scan or
lookup-drop is how large portion of NBUFFERS this relation-drop can be
going to invalidate.  So the nblocks above sould be the sum of number
of blocks to be truncated (not just the total number of blocks) of all
designated forks.  Then once we decided to do loopup-drop method, we
do that for all forks.

+                for (block_num = 0; block_num <= nblocks; block_num++)
+                {

block_num is quite confusing with nblocks, at least for
me(:p). Likewise fork_num, I prefer that it is just j or iblk or
something else anyway not confusing with nblocks.  By the way, the
loop runs nblocks + 1 times, which seems wrong.  We can start the loop
from firstDelBlock[fork_num], instead of 0 and that makes the check
against firstDelBlock[] later useless.

+                    /* create a tag with respect to the block so we can lookup the buffer */
+                    INIT_BUFFERTAG(newTag, rnode.node, forkNum[fork_num],
+                                   firstDelBlock[block_num]);

Mmm. it is wrong that the tag is initialized using
firstDelBlock[block_num]. Why isn't is just block_num?


+                    if (buf_id < 0)
+                    {
+                        LWLockRelease(newPartitionLock);
+                        continue;
+                    }
+                    LWLockRelease(newPartitionLock);

We don't need two separate LWLockRelease()'s there.

+   /*
+    * We can make this a tad faster by prechecking the buffer tag before
+    * we attempt to lock the buffer; this saves a lot of lock
...
+    */
+   if (!RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
+       continue;

In the original code, this is performed in order to avoid taking a
lock on bufHder for irrelevant buffers. We have identified the buffer
by looking up using the rnode, so I think we don't need to this
check. Note that we are doing the same check after lock aquisition.

+        else
+            UnlockBufHdr(bufHdr, buf_state);
+    }
+    /*
+     * We've invalidated the nblocks already. Scan the shared buffers
+     * for each fork.
+     */
+    if (block_num > nblocks)
+    {
+        DropRelFileNodeBuffersOfFork(rnode.node, forkNum[fork_num],
+                                     firstDelBlock[fork_num]);
+    }

Mmm? block_num is always larger than nblocks there. And the function
call runs a whole Nbuffers scan for the just-processed fork. What is
the point of this code?


> > I also added a new function when relation fork is bigger than the threshold
> >     If (nblocks > BUF_DROP_FULLSCAN_THRESHOLD)
> > (DropRelFileNodeBuffersOfFork) Perhaps there's a better name for that
> > function.
> > However, as expected in the previous discussions, this is a bit slower than the
> > standard buffer invalidation process, because the whole shared buffers are
> > scanned nfork times.
> > Currently, I set the threshold to (NBuffers / 500)
> 
> I made a mistake in the v12. I replaced the firstDelBlock[fork_num] with firstDelBlock[block_num],
> In the for-loop code block of block_num, because we want to process the current block of per-block loop
> OTOH, I used the firstDelBlock[fork_num] when relation fork is bigger than the threshold,
> or if the cached blocks of small relations were already invalidated.

Really? I believe that firstDelBlock is an array has only nforks elements.

> The logic could be either correct or wrong, so I'd appreciate feedback and comments/advice.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist