Home > mailing lists

Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [Patch] Optimize dropping of relation buffers using dlist
Date	November 5, 2020 12:48:40
Msg-id	CAA4eK1Jyu9GZzuKuapHC4aeBfsRm3sCUEiDbndQvw4rRWz10wQ@mail.gmail.com Whole thread Raw
In response to	Re: [Patch] Optimize dropping of relation buffers using dlist (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses	Re: [Patch] Optimize dropping of relation buffers using dlist (Thomas Munro <thomas.munro@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, Nov 5, 2020 at 1:59 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Thu, 5 Nov 2020 11:07:21 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > On Thu, Nov 5, 2020 at 8:26 AM k.jamison@fujitsu.com
> > <k.jamison@fujitsu.com> wrote:
> > > > > Few comments on patches:
> > > > > ======================
> > > > > v29-0002-Add-bool-param-in-smgrnblocks-for-cached-blocks
> > > > > ----------------------------------------------------------------------
> > > > > -------------
> > > > > 1.
> > > > > -smgrnblocks(SMgrRelation reln, ForkNumber forknum)
> > > > > +smgrnblocks(SMgrRelation reln, ForkNumber forknum, bool *accurate)
> > > > >  {
> > > > >   BlockNumber result;
> > > > >
> > > > >   /*
> > > > >   * For now, we only use cached values in recovery due to lack of a
> > > > > shared
> > > > > - * invalidation mechanism for changes in file size.
> > > > > + * invalidation mechanism for changes in file size.  The cached
> > > > > + values
> > > > > + * could be smaller than the actual number of existing buffers of the file.
> > > > > + * This is caused by lseek of buggy Linux kernels that might not have
> > > > > + * accounted for the recent write.
> > > > >   */
> > > > >   if (InRecovery && reln->smgr_cached_nblocks[forknum] !=
> > > > > InvalidBlockNumber)
> > > > > + {
> > > > > + if (accurate != NULL)
> > > > > + *accurate = true;
> > > > > +
> > > > >
> > > > > I don't understand this comment. Few emails back, I think we have
> > > > > discussed that cached value can't be less than the number of buffers
> > > > > during recovery. If that happens to be true then we have some problem.
> > > > > If you want to explain 'accurate' variable then you can do the same
> > > > > atop of function. Would it be better to name this variable as
> > > > > 'cached'?
> > > >
> > > > (I agree that the comment needs to be fixed.)
> > > >
> > > > FWIW I don't think 'cached' suggests the characteristics of the returned value
> > > > on its interface.  It was introduced to reduce fseek() calls, and after that we
> > > > have found that it can be regarded as the authoritative source of the file size.
> > > > The "accurate" means that it is guaranteed that we don't have a buffer for the
> > > > file blocks further than that number.  I don't come up with a more proper
> > > > word than "accurate" but also I don't think "cached" is proper here.
> > >
> >
> > Sure but that is not the guarantee this API gives. It has to be
> > guaranteed by the logic else-where, so not sure if it is a good idea
> > to try to reflect the same here. The comments in the caller where we
> > use this should explain why it is safe to use this value.
>
> Isn't it already guaranteed by the bugmgr code that we don't have
> buffers for nonexistent file blocks?  What is needed here is, yeah,
> the returned value from smgrblocks is "reliable".  If "reliable" is
> still not proper, I give up and agree to "cached".
>


I still feel 'cached' is a better name.

>
> > > I am not sure if the patch should cover this or should be a separate thread altogether since
> > > a number of functions also rely on the smgrnblocks(). But I'll take it into consideration.
> > >
> > >
> > > > > v29-0003-Optimize-DropRelFileNodeBuffers-during-recovery
> > > > > ----------------------------------------------------------------------
> > > > > ------------
> > > > > 2.
> > > > > + /* Check that it is in the buffer pool. If not, do nothing. */
> > > > > + LWLockAcquire(bufPartitionLock, LW_SHARED); buf_id =
> > > > > + BufTableLookup(&bufTag, bufHash); LWLockRelease(bufPartitionLock);
> > > > > +
> > > > > + if (buf_id < 0)
> > > > > + continue;
> > > > > +
> > > > > + bufHdr = GetBufferDescriptor(buf_id);
> > > > > +
> > > > > + buf_state = LockBufHdr(bufHdr);
> > > > > +
> > > > > + if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
> > > > >
> > > > > I think a pre-check for RelFileNode might be better before LockBufHdr
> > > > > for the reasons mentioned in this function few lines down.
> > > >
> > > > The equivalent check is already done by BufTableLookup(). The last line in
> > > > the above is not a precheck but the final check.
> > >
> >
> > Which check in that API you are talking about? Are you telling because
> > we are trying to use a hash value corresponding to rnode.node to find
> > the block then I don't think it is equivalent because there is a
> > difference in actual values. But even if we want to rely on that, a
> > comment is required but I guess we can do the check as well because it
> > shouldn't be a costly pre-check.
>
> I think the only problematic case is that BufTableLookup wrongly
> misses buffers actually to be dropped. (And the case of too-many
> false-positives, not critical though.)  If omission is the case, we
> cannot adopt this optimization at all.  And if the false-positive is
> the case, maybe we need to adopt more restrictive prechecking, but
> RelFileNodeEquals is *not* more restrictive than BufTableLookup in the
> first place.
>
> What case do you think is problematic when considering
> BufTableLookup() as the prechecking?
>

I was slightly worried about false-positives but on again thinking
about it, I think we don't need any additional pre-check here.

-- 
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Amit Kapila
Date: 05 November 2020, 12:20:38
Subject: Re: Some doubious code in pgstat.c

From: Daniel Gustafsson
Date: 05 November 2020, 12:49:45
Subject: Re: Move OpenSSL random under USE_OPENSSL_RANDOM

Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

Previous

Next