Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id 51419.1632856781@antos
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs
List pgsql-hackers
Dmitry Dolgov <9erthalion6@gmail.com> wrote:

> > On Tue, Sep 21, 2021 at 10:07:55AM +0200, Dmitry Dolgov wrote:
> > On Tue, 21 Sep 2021 09:00 Antonin Houska, <ah@cybertec.at> wrote:
> >
> > > Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > >
> > > > Yep, makes sense, thanks. I have few more questions:
> > > >
> > > > * The use case with orphaned files is working somewhat differently after
> > > >   the rebase on the latest master, do you observe it as well? The
> > > >   difference is ApplyPendingUndo -> SyncPostCheckpoint doesn't clean up
> > > >   an orphaned relation file immediately (only later on checkpoint)
> > > >   because of empty pendingUnlinks. I haven't investigated more yet, but
> > > >   seems like after this commit:
> > > >
> > > >     commit 7ff23c6d277d1d90478a51f0dd81414d343f3850
> > > >     Author: Thomas Munro <tmunro@postgresql.org>
> > > >     Date:   Mon Aug 2 17:32:20 2021 +1200
> > > >
> > > >         Run checkpointer and bgwriter in crash recovery.
> > > >
> > > >         Start up the checkpointer and bgwriter during crash recovery
> > > (except in
> > > >         --single mode), as we do for replication.  This wasn't done back
> > > in
> > > >         commit cdd46c76 out of caution.  Now it seems like a better idea
> > > to make
> > > >         the environment as similar as possible in both cases.  There may
> > > also be
> > > >         some performance advantages.
> > > >
> > > >   something has to be updated (pendingOps are empty right now, so no
> > > >   unlink request is remembered).
> > >
> > > I haven't been debugging that part recently, but yes, this commit is
> > > relevant,
> > > thanks for pointing that out! Attached is a patch that should fix it. I'll
> > > include it in the next version of the patch series, unless you tell me that
> > > something is still wrong.
> > >
> >
> > Sure, but  I can take a look only in a couple of days.
>
> Thanks for the patch.
>
> Hm, maybe there is some misunderstanding. My question above was about
> the changed behaviour, when orphaned files (e.g. created relation files
> after the backend was killed) are removed only by checkpointer when it
> kicks in. As far as I understand, the original intention was to do this
> job right away, that's why SyncPre/PostCheckpoint was invoked. But the
> recent changes around checkpointer make the current implementation
> insufficient.

> The patch you've proposed removes invokation of SyncPre/PostCheckpoint,
> do I see it correctly? In this sense it doesn't change anything, except
> removing non-functioning code of course.

Yes, it sounds like a misundeerstanding. I thought you complain about code
which is no longer needed.

The original intention was to make sure that the files are ever unlinked. IIRC
before the commit 7ff23c6d27 the calls SyncPre/PostCheckpoint were necessary
because the checkpointer wasn't runnig that early during the startup. Without
these calls the startup process would exit without doing anything. Sorry, I
see now that the comment incorrectly says "... it seems simpler ...", but in
fact it was necessary.

> But the question, probably
> reformulated from the more design point of view, stays the same — when
> and by which process such orphaned files have to be removed? I've
> assumed by removing right away the previous version was trying to avoid
> any kind of thunder effects of removing too many at once, but maybe I'm
> mistaken here.

I'm just trying to use the existing infrastructure: the effect of DROP TABLE
also appear to be performed by the checkpointer. However I don't know why the
unlinks need to be performed by the checkpointer.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fixing WAL instability in various TAP tests
Next
From: Tom Lane
Date:
Subject: Re: statement_timeout vs DECLARE CURSOR