Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | 27476.1632752108@antos Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
|
List | pgsql-hackers |
Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Sep 24, 2021 at 4:44 PM Antonin Houska <ah@cybertec.at> wrote: > > > > Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > On Mon, Sep 20, 2021 at 10:24 AM Antonin Houska <ah@cybertec.at> wrote: > > > > > > > > Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > On Fri, Sep 17, 2021 at 9:50 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > > > > > > > > > > On Tue, Sep 14, 2021 at 10:51:42AM +0200, Antonin Houska wrote: > > > > > > > > > > > > * What happened with the idea of abandoning discard worker for the sake > > > > > > of simplicity? From what I see limiting everything to foreground undo > > > > > > could reduce the core of the patch series to the first four patches > > > > > > (forgetting about test and docs, but I guess it would be enough at > > > > > > least for the design review), which is already less overwhelming. > > > > > > What we can miss, at least for the cleanup of the orphaned files, is the *undo > > > > worker*. In this patch series the cleanup is handled by the startup process. > > > > > > > > > > Okay, I think various people at different point of times has suggested > > > that idea. I think one thing we might need to consider is what to do > > > in case of a FATAL error? In case of FATAL error, it won't be > > > advisable to execute undo immediately, so would we upgrade the error > > > to PANIC in such cases. I remember vaguely that for clean up of > > > orphaned files that can happen rarely and someone has suggested > > > upgrading the error to PANIC in such a case but I don't remember the > > > exact details. > > > > Do you mean FATAL error during normal operation? > > > > Yes. > > > As far as I understand, even > > zheap does not rely on immediate UNDO execution (otherwise it'd never > > introduce the undo worker), so FATAL only means that the undo needs to be > > applied later so it can be discarded. > > > > Yeah, zheap either applies undo later via background worker or next > time before dml operation if there is a need. > > > As for the orphaned files cleanup feature with no undo worker, we might need > > PANIC to ensure that the undo is applied during restart and that it can be > > discarded, otherwise the unapplied undo log would stay there until the next > > (regular) restart and it would block discarding. However upgrading FATAL to > > PANIC just because the current transaction created a table seems quite > > rude. > > > > True, I guess but we can once see in what all scenarios it can > generate FATAL during that operation. By "that operation" you mean "CREATE TABLE"? It's not about FATAL during CREATE TABLE, rather it's about FATAL anytime during a transaction. Whichever operation caused the FATAL error, we'd need to upgrade it to PANIC as long as the transaction has some undo. Although the postgres core probably does not raise FATAL errors too often (OOM conditions seem to be the typical cause), I'm still not enthusiastic about idea that the undo feature turns such errors into PANIC. I wonder what the reason to avoid undoing transaction on FATAL is. If it's about possibly long duration of the undo execution, deletion of orphaned files (relations or the whole databases) via undo shouldn't make things worse because currently FATAL also triggers this sort of cleanup immediately, it's just implemented in different ways. -- Antonin Houska Web: https://www.cybertec-postgresql.com
pgsql-hackers by date: