Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [Patch] Optimize dropping of relation buffers using dlist
Date
Msg-id CAA4eK1J5Zxnm7+dCJqqZy+z+SpV2HexAdfgkOJoFa9icj9R6Cg@mail.gmail.com
Whole thread Raw
In response to RE: [Patch] Optimize dropping of relation buffers using dlist  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
Responses Re: [Patch] Optimize dropping of relation buffers using dlist
List pgsql-hackers
On Mon, Dec 7, 2020 at 12:32 PM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:
>
> On Friday, December 4, 2020 8:27 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Nov 27, 2020 at 11:36 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > >
> > > At Fri, 27 Nov 2020 02:19:57 +0000, "k.jamison@fujitsu.com"
> > > <k.jamison@fujitsu.com> wrote in
> > > > > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Hello, Kirk.
> > > > > Thank you for the new version.
> > > >
> > > > Hi, Horiguchi-san. Thank you for your very helpful feedback.
> > > > I'm updating the patches addressing those.
> > > >
> > > > > +                   if (!smgrexists(rels[i], j))
> > > > > +                           continue;
> > > > > +
> > > > > +                   /* Get the number of blocks for a relation's fork */
> > > > > +                   blocks[i][numForks] = smgrnblocks(rels[i], j,
> > > > > NULL);
> > > > >
> > > > > If we see a fork which its size is not cached we must give up this
> > > > > optimization for all target relations.
> > > >
> > > > I did not use the "cached" flag in DropRelFileNodesAllBuffers and
> > > > use InRecovery when deciding for optimization because of the following
> > reasons:
> > > > XLogReadBufferExtended() calls smgrnblocks() to apply changes to
> > > > relation page contents. So in DropRelFileNodeBuffers(),
> > > > XLogReadBufferExtended() is called during VACUUM replay because
> > VACUUM changes the page content.
> > > > OTOH, TRUNCATE doesn't change the relation content, it just
> > > > truncates relation pages without changing the page contents. So
> > > > XLogReadBufferExtended() is not called, and the "cached" flag will
> > > > always return false. I tested with "cached" flags before, and this
> > >
> > > A bit different from the point, but if some tuples have been inserted
> > > to the truncated table, XLogReadBufferExtended() is called for the
> > > table and the length is cached.
> > >
> > > > always return false, at least in DropRelFileNodesAllBuffers. Due to
> > > > this, we cannot use the cached flag in DropRelFileNodesAllBuffers().
> > > > However, I think we can still rely on smgrnblocks to get the file
> > > > size as long as we're InRecovery. That cached nblocks is still guaranteed
> > to be the maximum in the shared buffer.
> > > > Thoughts?
> > >
> > > That means that we always think as if smgrnblocks returns "cached" (or
> > > "safe") value during recovery, which is out of our current consensus.
> > > If we go on that side, we don't need to consult the "cached" returned
> > > from smgrnblocks at all and it's enough to see only InRecovery.
> > >
> > > I got confused..
> > >
> > > We are relying on the "fact" that the first lseek() call of a
> > > (startup) process tells the truth.  We added an assertion so that we
> > > make sure that the cached value won't be cleared during recovery.  A
> > > possible remaining danger would be closing of an smgr object of a live
> > > relation just after a file extension failure. I think we are thinking
> > > that that doesn't happen during recovery.  Although it seems to me
> > > true, I'm not confident.
> > >
> >
> > Yeah, I also think it might not be worth depending upon whether smgr close
> > has been done before or not. I feel the current idea of using 'cached'
> > parameter is relatively solid and we should rely on that.
> > Also, which means that in DropRelFileNodesAllBuffers() we should rely on
> > the same, I think doing things differently in this regard will lead to confusion. I
> > agree in some cases we might not get benefits but it is more important to be
> > correct and keep the code consistent to avoid introducing bugs now or in the
> > future.
> >
> Hi,
> I have reported before that it is not always the case that the "cached" flag of
> srnblocks() return true. So when I checked the truncate test case used in my
> patch, it does not enter the optimization path despite doing INSERT before
> truncation of table.
> The reason for that is because in TRUNCATE, a new RelFileNode is assigned
> to the relation when creating a new file. In recovery, XLogReadBufferExtended()
> always opens the RelFileNode and calls smgrnblocks() for that RelFileNode for the
> first time. And for recovery processing, different RelFileNodes are used for the
> INSERTs to the table and TRUNCATE to the same table.
>

Hmm, how is it possible if Insert is done before Truncate? The insert
should happen in old RelFileNode only. I have verified by adding a
break-in (while (1), so that it stops there) heap_xlog_insert and
DropRelFileNodesAllBuffers(), and both get the same (old) RelFileNode.
How have you verified what you are saying?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS
Next
From: Amit Kapila
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS