RE: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From k.jamison@fujitsu.com
Subject RE: [Patch] Optimize dropping of relation buffers using dlist
Date
Msg-id OSBPR01MB2341EB559E240818E2BFA79FEF200@OSBPR01MB2341.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [Patch] Optimize dropping of relation buffers using dlist  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: [Patch] Optimize dropping of relation buffers using dlist  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
List pgsql-hackers
On Tuesday, September 8, 2020 1:02 PM, Amit Kapila wrote:
Hello,
> On Mon, Sep 7, 2020 at 1:33 PM k.jamison@fujitsu.com
> <k.jamison@fujitsu.com> wrote:
> >
> > On Wednesday, September 2, 2020 5:49 PM, Amit Kapila wrote:
> > > On Wed, Sep 2, 2020 at 9:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > >
> > > > Amit Kapila <amit.kapila16@gmail.com> writes:
> > > > > Even if the relation is locked, background processes like
> > > > > checkpointer can still touch the relation which might cause
> > > > > problems. Consider a case where we extend the relation but
> > > > > didn't flush the newly added pages. Now during truncate
> > > > > operation, checkpointer can still flush those pages which can
> > > > > cause trouble for truncate. But, I think in the recovery path
> > > > > such cases won't cause a
> > > problem.
> > > >
> > > > I wouldn't count on that staying true ...
> > > >
> > > >
> > >
> https://www.postgresql.org/message-id/CA+hUKGJ8NRsqgkZEnsnRc2MFR
> > > OBV-jC
> > > > nacbYvtpptK2A9YYp9Q@mail.gmail.com
> > > >
> > >
> > > I don't think that proposal will matter after commit c5315f4f44
> > > because we are caching the size/blocks for recovery while doing
> > > extend (smgrextend). In the above scenario, we would have cached the
> > > blocks which will be used at later point of time.
> > >
> >
> > I'm guessing we can still pursue this idea of improving the recovery path
> first.
> >
> 
> I think so.

Alright, so I've updated the patch which passes the regression and TAP tests.
It compiles and builds as intended.

> > I'm working on an updated patch version, because the CFBot's telling
> > that postgres fails to build (one of the recovery TAP tests fails).
> > I'm still working on refactoring my patch, but have yet to find a proper
> solution at the moment.
> > So I'm going to continue my investigation.
> >
> > Attached is an updated WIP patch.
> > I'd appreciate if you could take a look at the patch as well.
> >
> 
> So, I see the below log as one of the problems:
> 2020-09-07 06:20:33.918 UTC [10914] LOG:  redo starts at 0/15FFEC0
> 2020-09-07 06:20:33.919 UTC [10914] FATAL:  unexpected data beyond EOF
> in block 1 of relation base/13743/24581
> 
> This indicates that we missed invalidating some buffer which should have
> been invalidated. If you are able to reproduce this locally then I suggest to first
> write a simple patch without the check of the threshold, basically in recovery
> always try to use the new way to invalidate the buffer. That will reduce the
> scope of the code that can create a problem. Let us know if the problem still
> exists and share the logs. BTW, I think I see one problem in the code:
> 
> if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
> + bufHdr->tag.forkNum == forkNum[j] && tag.blockNum >= 
> + bufHdr->firstDelBlock[j])
> 
> Here, I think you need to use 'i' not 'j' for forkNum and 
> firstDelBlock as those are arrays w.r.t forks. That might fix the 
> problem but I am not sure as I haven't tried to reproduce it.

Thanks for advice. Right, that seems to be the cause of error,
and fixing that (using fork) solved the case.
I also followed the advice of Tsunakawa-san of using more meaningful iterator
Instead of using "i" & "j" for readability.

I also added a new function when relation fork is bigger than the threshold
    If (nblocks > BUF_DROP_FULLSCAN_THRESHOLD)
(DropRelFileNodeBuffersOfFork) Perhaps there's a better name for that function.
However, as expected in the previous discussions, this is a bit slower than the
standard buffer invalidation process, because the whole shared buffers are scanned nfork times.
Currently, I set the threshold to (NBuffers / 500)

Feedback on the patch/testing are very much welcome.

Best regards,
Kirk Jamison


Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Next
From: Peter Geoghegan
Date:
Subject: Re: logtape.c stats don't account for unused "prefetched" block numbers