Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: [Patch] Optimize dropping of relation buffers using dlist
Date
Msg-id 20200916.123222.724387222397793701.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: [Patch] Optimize dropping of relation buffers using dlist  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [Patch] Optimize dropping of relation buffers using dlist  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
At Wed, 16 Sep 2020 08:33:06 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in 
> On Wed, Sep 16, 2020 at 7:46 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > Is this means lseek(SEEK_END) doesn't count blocks that are
> > write(2)'ed (by smgrextend) but not yet flushed? (I don't think so,
> > for clarity.) The nblocks cache is added just to reduce the number of
> > lseek()s and expected to always have the same value with what lseek()
> > is expected to return.
> >
> 
> See comments in ReadBuffer_common() which indicates such a possibility
> ("Unfortunately, we have also seen this case occurring because of
> buggy Linux kernels that sometimes return an lseek(SEEK_END) result
> that doesn't account for a recent write."). Also, refer my previous
> email [1] on this and another email link in that email which has a
> discussion on this point.
>
> > The reason it is reliable only during recovery
> > is that the cache is not shared but the startup process is the only
> > process that changes the relation size during recovery.
> >
> 
> Yes, that is why we are planning to do this optimization for recovery path.
> 
> > If any other process can extend the relation while smgrtruncate is
> > running, the current DropRelFileNodeBuffers should have the chance
> > that a new buffer for extended area is allocated at a buffer location
> > where the function already have passed by, which is a disaster.
> >
> 
> The relation might have extended before smgrtruncate but the newly
> added pages can be flushed by checkpointer during smgrtruncate.
> 
> [1] - https://www.postgresql.org/message-id/CAA4eK1LH2uQWznwtonD%2Bnch76kqzemdTQAnfB06z_LXa6NTFtQ%40mail.gmail.com

Ah! I understood that! The reason we can rely on the cahce is that the
cached value is *not* what lseek returned but how far we intended to
extend. Thank you for the explanation.

By the way I'm not sure that actually happens, but if one smgrextend
call exnteded the relation by two or more blocks, the cache is
invalidated and succeeding smgrnblocks returns lseek()'s result. Don't
we need to guarantee the cache to be valid while recovery?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Ashutosh Sharma
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Next
From: Tom Lane
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."