Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAA4eK1Kfm5YQd6SWodHBr8J+YqH=Y0pVNzCox_44w88m1kkUoQ@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: POC: Cleaning up orphaned files using undo logs
List pgsql-hackers
On Mon, Aug 5, 2019 at 12:09 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 05/08/2019 07:23, Thomas Munro wrote:
> > On Mon, Aug 5, 2019 at 3:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> On Sun, Aug 4, 2019 at 2:46 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >>> Could we leave out the UNDO and discard worker processes for now?
> >>> Execute all UNDO actions immediately at rollback, and after crash
> >>> recovery. That would be fine for cleaning up orphaned files,
> >>
> >> Even if we execute all the undo actions on rollback, we need discard
> >> worker to discard undo at regular intervals.  Also, what if we get an
> >> error while applying undo actions during rollback?  Right now, we have
> >> a mechanism to push such a request to background worker and allow the
> >> session to continue.  Instead, we might want to Panic in such cases if
> >> we don't want to have background undo workers.
> >>
> >>> and it
> >>> would cut down the size of the patch to review.
> >>
> >> If we can find some way to handle all cases and everyone agrees to it,
> >> that would be good. In fact, we can try to get the basic stuff
> >> committed first and then try to get the rest (undo-worker machinery)
> >> done.
> >
> > I think it's definitely worth exploring.
>
> Yeah. For cleaning up orphaned files, if unlink() fails, we can just log
> the error and move on. That's what we do in the main codepath, too. For
> any other error, PANIC seems ok. We're not expecting any errors during
> undo processing, so it doesn't seems safe to continue running.
>
> Hmm. Since applying the undo record is WAL-logged, you could run out of
> disk space while creating the WAL record. That seems unpleasant.
>

We might get away by doing some minimum error handling for orphan file
cleanup patch, but this facility was supposed to be a generic
facility.  Assuming, all of us agree on error handling stuff, still, I
think we might not be able to get away with the requirement for
discard worker to discard the logs.

> >>> Can this race condition happen: Transaction A creates a table and an
> >>> UNDO record to remember it. The transaction is rolled back, and the file
> >>> is removed. Another transaction, B, creates a different table, and
> >>> chooses the same relfilenode. It loads the table with data, and commits.
> >>> Then the system crashes. After crash recovery, the UNDO record for the
> >>> first transaction is applied, and it removes the file that belongs to
> >>> the second table, created by transaction B.
> >>
> >> I don't think such a race exists, but we should verify it once.
> >> Basically, once the rollback is complete, we mark the transaction
> >> rollback as complete in the transaction header in undo and write a WAL
> >> for it.  After crash-recovery, we will skip such a transaction.  Isn't
> >> that sufficient to prevent such a race condition?
>
> Ok, I didn't realize there's a flag in the undo record to mark it as
> applied. Yeah, that fixes it. Seems a bit heavy-weight, but I guess it's
> fine. Do you do something different in zheap? I presume writing a WAL
> record for every applied undo record would be too heavy there.
>

For zheap, we collect all the records of a page, apply them together
and then write the entire page in WAL.  The progress of transaction is
updated at either transaction end (rollback complete) or after
processing some threshold of undo records.  So, generally, the WAL
won't be for each undo record apply.

> This needs some performance testing. We're creating one extra WAL record
> and one UNDO record for every file creation, and another WAL record on
> abort. It's probably cheap compared to all the other work done during
> table creation, but we should still get some numbers on it.
>

makes sense.





-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Floris Van Nee
Date:
Subject: Re: Index Skip Scan
Next
From: Laurenz Albe
Date:
Subject: Re: Identity columns should own only one sequence