Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAA4eK1L5bVTHkCQtZauAEk3CfSZy9Z9eUeBDCkJNkE3uEhJirQ@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Antonin Houska <ah@cybertec.at>)
Responses Re: POC: Cleaning up orphaned files using undo logs
List pgsql-hackers
On Wed, Nov 18, 2020 at 4:03 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > On Fri, Nov 13, 2020 at 6:02 PM Antonin Houska <ah@cybertec.at> wrote:
> > >
> > > Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > > On Thu, Nov 12, 2020 at 2:45 PM Antonin Houska <ah@cybertec.at> wrote:
> > > > >
> > > > >
> > > > > No background undo
> > > > > ------------------
> > > > >
> > > > > Reduced complexity of the patch seems to be the priority at the moment. Amit
> > > > > suggested that cleanup of an orphaned relation file is simple enough to be
> > > > > done on foreground and I agree.
> > > > >
> > > >
> > > > Yeah, I think we should try and see if we can make it work but I
> > > > noticed that there are few places like AbortOutOfAnyTransaction where
> > > > we have the assumption that undo will be executed in the background.
> > > > We need to deal with it.
> > >
> > > I think this is o.k. if we always check for unapplied undo during startup.
> > >
> >
> > Hmm, how it is ok to leave undo (and rely on startup) unless it is a
> > PANIC error. IIRC, this path is invoked in non-panic errors as well.
> > Basically, we won't be able to discard such an undo which doesn't seem
> > like a good idea.
>
> Since failure to apply leaves unconsistent data, I assume it should always
> cause PANIC, shouldn't it?
>

But how can we ensure that AbortOutOfAnyTransaction will be called
only in that scenario?

> (Thomas seems to assume the same in [1].)
>
> > > > > "undo worker" is still there, but it only processes undo requests after server
> > > > > restart because relation data can only be changed in a transaction - it seems
> > > > > cleaner to launch a background worker for this than to hack the startup
> > > > > process.
> > > > >
> > > >
> > > > But, I see there are still multiple undoworkers that are getting
> > > > launched and I am not sure if that works correctly because a
> > > > particular undoworker is connected to a database and then it starts
> > > > processing all the pending undo.
> > >
> > > Each undo worker applies only transactions for its own database, see
> > > ProcessExistingUndoRequests():
> > >
> > >         /* We can only process undo of the database we are connected to. */
> > >         if (xact_hdr.dboid != MyDatabaseId)
> > >                 continue;
> > >
> > > Nevertheless, as I've just mentioned in my response to Thomas, I admit that we
> > > should try to live w/o the undo worker altogether.
> > >
> >
> > Okay, but keep in mind that there could be a large amount of undo
> > (unlike redo which has some limit as we can replay it from the last
> > checkpoint) which needs to be processed but it might be okay to live
> > with that for now.
>
> Yes, the information to remove relation file does not take much space in the
> undo log.
>
> > Another thing is that it seems we need to connect to the database to perform
> > it which might appear a bit odd that we don't allow users to connect to the
> > database but internally we are connecting it.
>
> I think the implementation will need to follow the outcome of the part of the
> discussion that starts at [2], but I see your concern. I'm thinking why
> database connection is not needed to apply WAL but is needed for UNDO. I think
> locks make the difference.
>

Yeah, it would be probably a good idea to see if we can make undo
apply work without db-connection especially if we want to do before
allowing connections. The other possibility could be to let discard
worker do this work lazily after allowing connections.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
Next
From: Michael Paquier
Date:
Subject: Re: Move OpenSSL random under USE_OPENSSL_RANDOM