Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: [Patch] Optimize dropping of relation buffers using dlist
Date
Msg-id 20200929.103439.403667376111258587.horikyota.ntt@gmail.com
Whole thread Raw
In response to RE: [Patch] Optimize dropping of relation buffers using dlist  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
Responses RE: [Patch] Optimize dropping of relation buffers using dlist  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
List pgsql-hackers
At Mon, 28 Sep 2020 08:57:36 +0000, "k.jamison@fujitsu.com" <k.jamison@fujitsu.com> wrote in 
> On Monday, September 28, 2020 5:08 PM, Tsunakawa-san wrote:
> 
> >     From: Jamison, Kirk/ジャミソン カーク <k.jamison@fujitsu.com>
> > > Is my understanding above correct?
> > 
> > No.  I simply meant DropRelFileNodeBuffers() calls the following function,
> > and avoids the optimization if it returns InvalidBlockNumber.
> > 
> > 
> > BlockNumber
> > smgrcachednblocks(SMgrRelation reln, ForkNumber forknum) {
> >     return reln->smgr_cached_nblocks[forknum];
> > }
> 
> Thank you for clarifying. 

FWIW, I (and maybe Amit) am thinking that the property we need here is
not it is cached or not but the accuracy of the returned file length,
and that the "cached" property should be hidden behind the API.

Another reason for not adding this function is the cached value is not
really reliable on non-recovery environment.

> So in the new function, it goes something like:
>     if (InRecovery)
>     {
>         if (reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
>             return reln->smgr_cached_nblocks[forknum];
>         else
>             return InvalidBlockNumber;
>     }

If we add the new function, it should reutrn InvalidBlockNumber
without consulting smgr_nblocks().

> I've revised the patch and added the new function accordingly in the attached file.
> I also did not remove the duplicate code from smgrnblocks because Amit-san mentioned
> that when the caching for non-recovery cases is implemented, we can use it
> for non-recovery cases as well.
> 
> Although I am not sure if the way it's written in DropRelFileNodeBuffers is okay.
> BlockNumberIsValid(nTotalBlocks)
>  
>             nTotalBlocks = smgrcachednblocks(smgr_reln, forkNum[j]);
>             nBlocksToInvalidate = nTotalBlocks - firstDelBlock[j];
> 
>             if (BlockNumberIsValid(nTotalBlocks) &&
>                 nBlocksToInvalidate < BUF_DROP_FULLSCAN_THRESHOLD)
>             {
>                 //enter optimization loop
>             }
>             else
>             {
>                 //full scan for each fork  
>             }

Hmm. The current loop in DropRelFileNodeBuffers looks like this:

    if (InRecovery)
       for (for each forks)
          if (the fork meets the criteria)
             <optimized dropping>
          else
             <full scan>

I think this is somewhat different from the current
discussion. Whether we sum-up the number of blcoks for all forks or
just use that of the main fork, we should take full scan if we failed
to know the accurate size for any one of the forks. (In other words,
it is stupid that we run a full scan for more than one fork at a
drop.)

Come to think of that, we can naturally sum-up all forks' blocks since
anyway we need to call smgrnblocks for all forks to know the
optimzation is usable.

So that block would be something like this:

    for (forks of the rel)
      /* the function returns InvalidBlockNumber if !InRecovery */
      if (smgrnblocks returned InvalidBlockNumber) 
         total_blocks = InvalidBlockNumber;
         break;
      total_blocks += nbloks of this fork

    /* <we could rely on the fact that InvalidBlockNumber is zero> */
    if (total_blocks != InvalidBlockNumber && total_blocks < threshold)
          for (forks of the rel)
          for (blocks of the fork)
             <try dropping the buffer for the block>
    else
       <full scan dropping>

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Load TIME fields - proposed performance improvement
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist