RE: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: [Patch] Optimize dropping of relation buffers using dlist
Date
Msg-id TYAPR01MB29907DEEB7D8F4D146295634FEE00@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [Patch] Optimize dropping of relation buffers using dlist  (Andres Freund <andres@anarazel.de>)
Responses RE: [Patch] Optimize dropping of relation buffers using dlist  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
Re: [Patch] Optimize dropping of relation buffers using dlist  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
From: Andres Freund <andres@anarazel.de>
> DropRelFileNodeBuffers() in recovery? The most common path is
> DropRelationFiles()->smgrdounlinkall()->DropRelFileNodesAllBuffers(),
> which 3/4 doesn't address and 4/4 doesn't mention.
>
> 4/4 seems to address DropRelationFiles(), but only talks about TRUNCATE?

Yes.  DropRelationFiles() is used in the following two paths:

[Replay of TRUNCATE during recovery]
xact_redo_commit/abort() -> DropRelationFiles()
 -> smgrdounlinkall() -> DropRelFileNodesAllBuffers()

[COMMIT/ROLLBACK PREPARED]
FinishPreparedTransaction() -> DropRelationFiles()
 -> smgrdounlinkall() -> DropRelFileNodesAllBuffers()



> I also don't get why 4/4 would be a good idea on its own. It uses
> BUF_DROP_FULL_SCAN_THRESHOLD to guard
> FindAndDropRelFileNodeBuffers() on a per relation basis. But since
> DropRelFileNodesAllBuffers() can be used for many relations at once, this
> could end up doing BUF_DROP_FULL_SCAN_THRESHOLD - 1 lookups a lot of
> times, once for each of nnodes relations?

So, the threshold value should be compared with the total number of blocks of all target relations, not each relation.
Youseem to be right, got it. 


> Also, how is 4/4 safe - this is outside of recovery too?

It seems that DropRelFileNodesAllBuffers() should trigger the new optimization path only when InRecovery == true,
becauseit intentionally doesn't check the "accurate" value returned from smgrnblocks(). 


> Smaller comment:
>
> +static void
> +FindAndDropRelFileNodeBuffers(RelFileNode rnode, ForkNumber *forkNum,
> int nforks,
> +                              BlockNumber
> *nForkBlocks, BlockNumber *firstDelBlock)
> ...
> +            /* Check that it is in the buffer pool. If not, do nothing.
> */
> +            LWLockAcquire(bufPartitionLock, LW_SHARED);
> +            buf_id = BufTableLookup(&bufTag, bufHash);
> ...
> +            bufHdr = GetBufferDescriptor(buf_id);
> +
> +            buf_state = LockBufHdr(bufHdr);
> +
> +            if (RelFileNodeEquals(bufHdr->tag.rnode, rnode) &&
> +                bufHdr->tag.forkNum == forkNum[i] &&
> +                bufHdr->tag.blockNum >= firstDelBlock[i])
> +                InvalidateBuffer(bufHdr);    /* releases
> spinlock */
> +            else
> +                UnlockBufHdr(bufHdr, buf_state);
>
> I'm a bit confused about the check here. We hold a buffer partition lock, and
> have done a lookup in the mapping table. Why are we then rechecking the
> relfilenode/fork/blocknum? And why are we doing so holding the buffer header
> lock, which is essentially a spinlock, so should only ever be held for very short
> portions?
>
> This looks like it's copying logic from DropRelFileNodeBuffers() etc, but there
> the situation is different: We haven't done a buffer mapping lookup, and we
> don't hold a partition lock!

That's because the buffer partition lock is released immediately after the hash table has been looked up.  As an aside,
InvalidateBuffer()requires the caller to hold the buffer header spinlock and doesn't hold the buffer partition lock. 



Regards
Takayuki Tsunakawa




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Should we document IS [NOT] OF?
Next
From: Pavel Borisov
Date:
Subject: Re: Is postgres ready for 2038?