Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [Patch] Optimize dropping of relation buffers using dlist
Date
Msg-id CAA4eK1LP6TWHTioD-tiMKcJ-PNejjzQXc8se174szAX=O3YEhQ@mail.gmail.com
Whole thread Raw
In response to RE: [Patch] Optimize dropping of relation buffers using dlist  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
Responses Re: [Patch] Optimize dropping of relation buffers using dlist  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Wed, Nov 4, 2020 at 8:28 AM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:
>
> Hi,
>
> I've updated the patch 0004 (Truncate optimization) with the previous comments of
> Tsunakawa-san already addressed in the patch. (Thank you very much for the review.)
> The change here compared to the previous version is that in DropRelFileNodesAllBuffers()
> we don't check for the accurate flag anymore when deciding whether to optimize or not.
> For relations with blocks that do not exceed the threshold for full scan, we call
> DropRelFileNodeBuffers where the flag will be checked anyway. Otherwise, we proceed
> to the traditional buffer scan. Thoughts?
>

Can we do a Truncate optimization once we decide about your other
patch as I see a few problems with it? If we can get the first patch
(vacuum optimization) committed it might be a bit easier for us to get
the truncate optimization. If possible, let's focus on (auto)vacuum
optimization first.

Few comments on patches:
======================
v29-0002-Add-bool-param-in-smgrnblocks-for-cached-blocks
-----------------------------------------------------------------------------------
1.
-smgrnblocks(SMgrRelation reln, ForkNumber forknum)
+smgrnblocks(SMgrRelation reln, ForkNumber forknum, bool *accurate)
 {
  BlockNumber result;

  /*
  * For now, we only use cached values in recovery due to lack of a shared
- * invalidation mechanism for changes in file size.
+ * invalidation mechanism for changes in file size.  The cached values
+ * could be smaller than the actual number of existing buffers of the file.
+ * This is caused by lseek of buggy Linux kernels that might not have
+ * accounted for the recent write.
  */
  if (InRecovery && reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
+ {
+ if (accurate != NULL)
+ *accurate = true;
+

I don't understand this comment. Few emails back, I think we have
discussed that cached value can't be less than the number of buffers
during recovery. If that happens to be true then we have some problem.
If you want to explain 'accurate' variable then you can do the same
atop of function. Would it be better to name this variable as
'cached'?

v29-0003-Optimize-DropRelFileNodeBuffers-during-recovery
----------------------------------------------------------------------------------
2.
+ /* Check that it is in the buffer pool. If not, do nothing. */
+ LWLockAcquire(bufPartitionLock, LW_SHARED);
+ buf_id = BufTableLookup(&bufTag, bufHash);
+ LWLockRelease(bufPartitionLock);
+
+ if (buf_id < 0)
+ continue;
+
+ bufHdr = GetBufferDescriptor(buf_id);
+
+ buf_state = LockBufHdr(bufHdr);
+
+ if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&

I think a pre-check for RelFileNode might be better before LockBufHdr
for the reasons mentioned in this function few lines down.

3.
-DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
+DropRelFileNodeBuffers(SMgrRelation smgr_reln, ForkNumber *forkNum,
     int nforks, BlockNumber *firstDelBlock)
 {
  int i;
  int j;
+ RelFileNodeBackend rnode;
+ bool accurate;

It is better to initialize accurate with false. Again, is it better to
change this variable name as 'cached'.

4.
+ /*
+ * Look up the buffer in the hashtable if the block size is known to
+ * be accurate and the total blocks to be invalidated is below the
+ * full scan threshold.  Otherwise, give up the optimization.
+ */
+ if (accurate && nBlocksToInvalidate < BUF_DROP_FULL_SCAN_THRESHOLD)
+ {
+ for (j = 0; j < nforks; j++)
+ {
+ BlockNumber curBlock;
+
+ for (curBlock = firstDelBlock[j]; curBlock < nForkBlocks[j]; curBlock++)
+ {
+ uint32 bufHash; /* hash value for tag */
+ BufferTag bufTag; /* identity of requested block */
+ LWLock     *bufPartitionLock; /* buffer partition lock for it */
+ int buf_id;
+
+ /* create a tag so we can lookup the buffer */
+ INIT_BUFFERTAG(bufTag, rnode.node, forkNum[j], curBlock);
+
+ /* determine its hash code and partition lock ID */
+ bufHash = BufTableHashCode(&bufTag);
+ bufPartitionLock = BufMappingPartitionLock(bufHash);
+
+ /* Check that it is in the buffer pool. If not, do nothing. */
+ LWLockAcquire(bufPartitionLock, LW_SHARED);
+ buf_id = BufTableLookup(&bufTag, bufHash);
+ LWLockRelease(bufPartitionLock);
+
+ if (buf_id < 0)
+ continue;
+
+ bufHdr = GetBufferDescriptor(buf_id);
+
+ buf_state = LockBufHdr(bufHdr);
+
+ if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
+ bufHdr->tag.forkNum == forkNum[j] &&
+ bufHdr->tag.blockNum >= firstDelBlock[j])
+ InvalidateBuffer(bufHdr); /* releases spinlock */
+ else
+ UnlockBufHdr(bufHdr, buf_state);
+ }
+ }
+ return;
+ }

Can we move the code under this 'if' condition to a separate function,
say FindAndDropRelFileNodeBuffers or something like that?

v29-0004-TRUNCATE-optimization
------------------------------------------------
5.
+ for (i = 0; i < n; i++)
+ {
+ nforks = 0;
+ nBlocksToInvalidate = 0;
+
+ for (j = 0; j <= MAX_FORKNUM; j++)
+ {
+ if (!smgrexists(rels[i], j))
+ continue;
+
+ /* Get the number of blocks for a relation's fork */
+ nblocks = smgrnblocks(rels[i], j, NULL);
+
+ nBlocksToInvalidate += nblocks;
+
+ forks[nforks++] = j;
+ }
+ if (nBlocksToInvalidate >= BUF_DROP_FULL_SCAN_THRESHOLD)
+ goto buffer_full_scan;
+
+ DropRelFileNodeBuffers(rels[i], forks, nforks, firstDelBlocks);
+ }
+ pfree(nodes);
+ pfree(rels);
+ pfree(rnodes);
+ return;

I think this can be slower than the current Truncate. Say there are
three relations and for one of them the size is greater than
BUF_DROP_FULL_SCAN_THRESHOLD then you would anyway have to scan the
entire shared buffers so the work done in optimized path for other two
relations will add some over head.

Also, as written, I think you need to remove the nodes for which you
have invalidated the buffers via optimized path, no.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions