Home > mailing lists

RE: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From	tsunakawa.takay@fujitsu.com
Subject	RE: [Patch] Optimize dropping of relation buffers using dlist
Date	November 19, 2020 07:07:34
Msg-id	TYAPR01MB29907DEEB7D8F4D146295634FEE00@TYAPR01MB2990.jpnprd01.prod.outlook.com Whole thread Raw
In response to	Re: [Patch] Optimize dropping of relation buffers using dlist (Andres Freund <andres@anarazel.de>)
Responses	RE: [Patch] Optimize dropping of relation buffers using dlist Re: [Patch] Optimize dropping of relation buffers using dlist
List	pgsql-hackers

Tree view

From: Andres Freund <andres@anarazel.de>
> DropRelFileNodeBuffers() in recovery? The most common path is
> DropRelationFiles()->smgrdounlinkall()->DropRelFileNodesAllBuffers(),
> which 3/4 doesn't address and 4/4 doesn't mention.
>
> 4/4 seems to address DropRelationFiles(), but only talks about TRUNCATE?

Yes.  DropRelationFiles() is used in the following two paths:

[Replay of TRUNCATE during recovery]
xact_redo_commit/abort() -> DropRelationFiles()
 -> smgrdounlinkall() -> DropRelFileNodesAllBuffers()

[COMMIT/ROLLBACK PREPARED]
FinishPreparedTransaction() -> DropRelationFiles()
 -> smgrdounlinkall() -> DropRelFileNodesAllBuffers()



> I also don't get why 4/4 would be a good idea on its own. It uses
> BUF_DROP_FULL_SCAN_THRESHOLD to guard
> FindAndDropRelFileNodeBuffers() on a per relation basis. But since
> DropRelFileNodesAllBuffers() can be used for many relations at once, this
> could end up doing BUF_DROP_FULL_SCAN_THRESHOLD - 1 lookups a lot of
> times, once for each of nnodes relations?

So, the threshold value should be compared with the total number of blocks of all target relations, not each relation.
Youseem to be right, got it. 


> Also, how is 4/4 safe - this is outside of recovery too?

It seems that DropRelFileNodesAllBuffers() should trigger the new optimization path only when InRecovery == true,
becauseit intentionally doesn't check the "accurate" value returned from smgrnblocks(). 


> Smaller comment:
>
> +static void
> +FindAndDropRelFileNodeBuffers(RelFileNode rnode, ForkNumber *forkNum,
> int nforks,
> +                              BlockNumber
> *nForkBlocks, BlockNumber *firstDelBlock)
> ...
> +            /* Check that it is in the buffer pool. If not, do nothing.
> */
> +            LWLockAcquire(bufPartitionLock, LW_SHARED);
> +            buf_id = BufTableLookup(&bufTag, bufHash);
> ...
> +            bufHdr = GetBufferDescriptor(buf_id);
> +
> +            buf_state = LockBufHdr(bufHdr);
> +
> +            if (RelFileNodeEquals(bufHdr->tag.rnode, rnode) &&
> +                bufHdr->tag.forkNum == forkNum[i] &&
> +                bufHdr->tag.blockNum >= firstDelBlock[i])
> +                InvalidateBuffer(bufHdr);    /* releases
> spinlock */
> +            else
> +                UnlockBufHdr(bufHdr, buf_state);
>
> I'm a bit confused about the check here. We hold a buffer partition lock, and
> have done a lookup in the mapping table. Why are we then rechecking the
> relfilenode/fork/blocknum? And why are we doing so holding the buffer header
> lock, which is essentially a spinlock, so should only ever be held for very short
> portions?
>
> This looks like it's copying logic from DropRelFileNodeBuffers() etc, but there
> the situation is different: We haven't done a buffer mapping lookup, and we
> don't hold a partition lock!

That's because the buffer partition lock is released immediately after the hash table has been looked up.  As an aside,
InvalidateBuffer()requires the caller to hold the buffer header spinlock and doesn't hold the buffer partition lock. 



Regards
Takayuki Tsunakawa

pgsql-hackers by date:

From: Tom Lane
Date: 19 November 2020, 07:03:59
Subject: Re: Should we document IS [NOT] OF?

From: Pavel Borisov
Date: 19 November 2020, 07:22:08
Subject: Re: Is postgres ready for 2038?

RE: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

Previous

Next