Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [Patch] Optimize dropping of relation buffers using dlist
Date
Msg-id CAA4eK1LJFXUPTiFgXKypnugVgRhzmUURu-dnkAK=eVDae+GVeQ@mail.gmail.com
Whole thread Raw
In response to Re: [Patch] Optimize dropping of relation buffers using dlist  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: [Patch] Optimize dropping of relation buffers using dlist  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Tue, Nov 10, 2020 at 10:00 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Sat, Nov 7, 2020 at 12:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I think one of the problems is returning fewer rows and that too
> > without any warning or error, so maybe that is a bigger problem but we
> > seem to be okay with it as that is already a known thing though I
> > think that is not documented anywhere.
>
> I'm not OK with it, and I'm not sure it's widely known or understood,
>

Yeah, it is quite possible but may be because we don't see many field
reports nobody thought of doing anything about it.

> though I think we've made some progress in this thread.  Perhaps, as a
> separate project, we need to solve several related problems with a
> shmem table of relation sizes from not-yet-synced files so that
> smgrnblocks() is fast and always sees all preceding smgrextend()
> calls.  If we're going to need something like that anyway, and if we
> can come up with a simple way to detect and report this type of
> failure in the meantime, maybe this fast DROP project should just go
> ahead and use the existing smgrnblocks() function without the weird
> caching bandaid that only works in recovery?
>

I am not sure if it would be easy to detect all such failures and we
might end up opening other can of worms for us but if there is some
simpler way then sure we can consider it. OTOH, till we have a shared
cache of relation sizes (which I think is good for multiple things) it
seems the safe way to proceed by relying on the cache during recovery.
And, it is not that we can't change this once we have a shared
relation size solution.

> > > The main argument I can think of against the idea of using plain old
> > > smgrnblocks() is that the current error messages on smgrwrite()
> > > failure for stray blocks would be indistinguishible from cases where
> > > an external actor unlinked the file.  I don't mind getting an error
> > > that prevents checkpointing -- your system is in big trouble! -- but
> > > it'd be nice to be able to detect that *we* unlinked the file,
> > > implying the filesystem and bufferpool are out of sync, and spit out a
> > > special diagnostic message.  I suppose if it's the checkpointer doing
> > > the writing, it could check if the relfilenode is on the
> > > queued-up-for-delete-after-the-checkpoint list, and if so, it could
> > > produce a different error message just for this edge case.
> > > Unfortunately that's not a general solution, because any backend might
> > > try to write a buffer out and they aren't synchronised with
> > > checkpoints.
> >
> > Yeah, but I am not sure if we can consider manual (external actor)
> > tinkering with the files the same as something that happened due to
> > the database server relying on the wrong information.
>
> Here's a rough idea I thought of to detect this case; I'm not sure if
> it has holes.  When unlinking a relation, currently we truncate
> segment 0 and unlink all the rest of the segments, and tell the
> checkpointer to unlink segment 0 after the next checkpoint.
>

Do we always truncate all the blocks? What if the vacuum has cleaned
last N (say 100) blocks then how do we handle it?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "k.jamison@fujitsu.com"
Date:
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Next
From: Amit Kapila
Date:
Subject: Re: logical streaming of xacts via test_decoding is broken