Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: [Patch] Optimize dropping of relation buffers using dlist |
Date | |
Msg-id | 20200916.115629.89842527379346862.horikyota.ntt@gmail.com Whole thread Raw |
In response to | RE: [Patch] Optimize dropping of relation buffers using dlist ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>) |
Responses |
Re: [Patch] Optimize dropping of relation buffers using dlist
|
List | pgsql-hackers |
Thanks for the new version. Jamison. At Tue, 15 Sep 2020 11:11:26 +0000, "k.jamison@fujitsu.com" <k.jamison@fujitsu.com> wrote in > Hi, > > > BTW, I think I see one problem in the code: > > > > > > if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) && > > > + bufHdr->tag.forkNum == forkNum[j] && tag.blockNum >= > > > + bufHdr->firstDelBlock[j]) > > > > > > Here, I think you need to use 'i' not 'j' for forkNum and > > > firstDelBlock as those are arrays w.r.t forks. That might fix the > > > problem but I am not sure as I haven't tried to reproduce it. > > > > Thanks for advice. Right, that seems to be the cause of error, and fixing that > > (using fork) solved the case. > > I also followed the advice of Tsunakawa-san of using more meaningful > > iterator Instead of using "i" & "j" for readability. (FWIW, I prefer short conventional names for short-term iterator variables.) master> * XXX currently it sequentially searches the buffer pool, should be master> * changed to more clever ways of searching. However, this routine master> * is used only in code paths that aren't very performance-critical, master> * and we shouldn't slow down the hot paths to make it faster ... This comment needs a rewrite. + for (fork_num = 0; fork_num < nforks; fork_num++) { if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) && - bufHdr->tag.forkNum == forkNum[j] && - bufHdr->tag.blockNum >= firstDelBlock[j]) + bufHdr->tag.forkNum == forkNum[fork_num] && + bufHdr->tag.blockNum >= firstDelBlock[fork_num]) fork_num is not actually a fork number, but the index of forkNum[]. It should be fork_idx (or just i, which I prefer..). - for (j = 0; j < nforks; j++) - DropRelFileNodeLocalBuffers(rnode.node, forkNum[j], - firstDelBlock[j]); + for (fork_num = 0; fork_num < nforks; fork_num++) + DropRelFileNodeLocalBuffers(rnode.node, forkNum[fork_num], + firstDelBlock[fork_num]); I think we don't need to include the irrelevant refactoring in this patch. (And I think j is better there.) + * We only speedup this path during recovery, because that's the only + * timing when we can get a valid cached value of blocks for relation. + * See comment in smgrnblocks() in smgr.c. Otherwise, proceed to usual + * buffer invalidation process (scanning of whole shared buffers). We need an explanation of why we do this optimizaton only for the recovery case. + /* Get the number of blocks for the supplied relation's fork */ + nblocks = smgrnblocks(smgr_reln, forkNum[fork_num]); + Assert(BlockNumberIsValid(nblocks)); + + if (nblocks < BUF_DROP_FULLSCAN_THRESHOLD) As mentioned upthread, the criteria whether we do full-scan or lookup-drop is how large portion of NBUFFERS this relation-drop can be going to invalidate. So the nblocks above sould be the sum of number of blocks to be truncated (not just the total number of blocks) of all designated forks. Then once we decided to do loopup-drop method, we do that for all forks. + for (block_num = 0; block_num <= nblocks; block_num++) + { block_num is quite confusing with nblocks, at least for me(:p). Likewise fork_num, I prefer that it is just j or iblk or something else anyway not confusing with nblocks. By the way, the loop runs nblocks + 1 times, which seems wrong. We can start the loop from firstDelBlock[fork_num], instead of 0 and that makes the check against firstDelBlock[] later useless. + /* create a tag with respect to the block so we can lookup the buffer */ + INIT_BUFFERTAG(newTag, rnode.node, forkNum[fork_num], + firstDelBlock[block_num]); Mmm. it is wrong that the tag is initialized using firstDelBlock[block_num]. Why isn't is just block_num? + if (buf_id < 0) + { + LWLockRelease(newPartitionLock); + continue; + } + LWLockRelease(newPartitionLock); We don't need two separate LWLockRelease()'s there. + /* + * We can make this a tad faster by prechecking the buffer tag before + * we attempt to lock the buffer; this saves a lot of lock ... + */ + if (!RelFileNodeEquals(bufHdr->tag.rnode, rnode.node)) + continue; In the original code, this is performed in order to avoid taking a lock on bufHder for irrelevant buffers. We have identified the buffer by looking up using the rnode, so I think we don't need to this check. Note that we are doing the same check after lock aquisition. + else + UnlockBufHdr(bufHdr, buf_state); + } + /* + * We've invalidated the nblocks already. Scan the shared buffers + * for each fork. + */ + if (block_num > nblocks) + { + DropRelFileNodeBuffersOfFork(rnode.node, forkNum[fork_num], + firstDelBlock[fork_num]); + } Mmm? block_num is always larger than nblocks there. And the function call runs a whole Nbuffers scan for the just-processed fork. What is the point of this code? > > I also added a new function when relation fork is bigger than the threshold > > If (nblocks > BUF_DROP_FULLSCAN_THRESHOLD) > > (DropRelFileNodeBuffersOfFork) Perhaps there's a better name for that > > function. > > However, as expected in the previous discussions, this is a bit slower than the > > standard buffer invalidation process, because the whole shared buffers are > > scanned nfork times. > > Currently, I set the threshold to (NBuffers / 500) > > I made a mistake in the v12. I replaced the firstDelBlock[fork_num] with firstDelBlock[block_num], > In the for-loop code block of block_num, because we want to process the current block of per-block loop > OTOH, I used the firstDelBlock[fork_num] when relation fork is bigger than the threshold, > or if the cached blocks of small relations were already invalidated. Really? I believe that firstDelBlock is an array has only nforks elements. > The logic could be either correct or wrong, so I'd appreciate feedback and comments/advice. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: