Home > mailing lists

Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [Patch] Optimize dropping of relation buffers using dlist
Date	September 8, 2020 07:02:25
Msg-id	CAA4eK1KxUSrTYDFrB4KiCMMchH+Av4OqKwJC94y8M_1_tqLzhQ@mail.gmail.com Whole thread Raw
In response to	RE: [Patch] Optimize dropping of relation buffers using dlist ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
Responses	RE: [Patch] Optimize dropping of relation buffers using dlist RE: [Patch] Optimize dropping of relation buffers using dlist
List	pgsql-hackers

Tree view

On Mon, Sep 7, 2020 at 1:33 PM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:
>
> On Wednesday, September 2, 2020 5:49 PM, Amit Kapila wrote:
> > On Wed, Sep 2, 2020 at 9:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >
> > > Amit Kapila <amit.kapila16@gmail.com> writes:
> > > > Even if the relation is locked, background processes like
> > > > checkpointer can still touch the relation which might cause
> > > > problems. Consider a case where we extend the relation but didn't
> > > > flush the newly added pages. Now during truncate operation,
> > > > checkpointer can still flush those pages which can cause trouble for
> > > > truncate. But, I think in the recovery path such cases won't cause a
> > problem.
> > >
> > > I wouldn't count on that staying true ...
> > >
> > >
> > https://www.postgresql.org/message-id/CA+hUKGJ8NRsqgkZEnsnRc2MFR
> > OBV-jC
> > > nacbYvtpptK2A9YYp9Q@mail.gmail.com
> > >
> >
> > I don't think that proposal will matter after commit c5315f4f44 because we are
> > caching the size/blocks for recovery while doing extend (smgrextend). In the
> > above scenario, we would have cached the blocks which will be used at later
> > point of time.
> >
>
> I'm guessing we can still pursue this idea of improving the recovery path first.
>

I think so.

> I'm working on an updated patch version, because the CFBot's telling
> that postgres fails to build (one of the recovery TAP tests fails).
> I'm still working on refactoring my patch, but have yet to find a proper solution at the moment.
> So I'm going to continue my investigation.
>
> Attached is an updated WIP patch.
> I'd appreciate if you could take a look at the patch as well.
>

So, I see the below log as one of the problems:
2020-09-07 06:20:33.918 UTC [10914] LOG:  redo starts at 0/15FFEC0
2020-09-07 06:20:33.919 UTC [10914] FATAL:  unexpected data beyond EOF
in block 1 of relation base/13743/24581

This indicates that we missed invalidating some buffer which should
have been invalidated. If you are able to reproduce this locally then
I suggest to first write a simple patch without the check of the
threshold, basically in recovery always try to use the new way to
invalidate the buffer. That will reduce the scope of the code that can
create a problem. Let us know if the problem still exists and share
the logs. BTW, I think I see one problem in the code:

if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
+ bufHdr->tag.forkNum == forkNum[j] &&
+ bufHdr->tag.blockNum >= firstDelBlock[j])

Here, I think you need to use 'i' not 'j' for forkNum and
firstDelBlock as those are arrays w.r.t forks. That might fix the
problem but I am not sure as I haven't tried to reproduce it.

-- 
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: "tsunakawa.takay@fujitsu.com"
Date: 08 September 2020, 07:00:45
Subject: RE: Transactions involving multiple postgres foreign servers, take 2

From: Ian Barwick
Date: 08 September 2020, 07:03:01
Subject: Re: Improving connection scalability: GetSnapshotData()

Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

Previous

Next