Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id 87363.1611941415@antos
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Antonin Houska <ah@cybertec.at>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Antonin Houska <ah@cybertec.at> wrote:

> Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> 
> > Thanks for the updated patch. As I've mentioned off the list I'm slowly
> > looking through it with the intent to concentrate on undo progress
> > tracking. But before I will post anything I want to mention couple of
> > strange issues I see, otherwise I will forget for sure. Maybe it's
> > already known, but running several times 'make installcheck' against a
> > freshly build postgres with the patch applied from time to time I
> > observe various errors.
> > 
> > This one happens on a crash recovery, seems like
> > UndoRecordSetXLogBufData has usr_type = USRT_INVALID and is involved in
> > the replay process:
> > 
> >     TRAP: FailedAssertion("page_offset + this_page_bytes <= uph->ud_insertion_point", File: "undopage.c", Line:
300)
> >     postgres: startup recovering 000000010000000000000012(ExceptionalCondition+0xa1)[0x558b38b8a350]
> >     postgres: startup recovering 000000010000000000000012(UndoPageSkipOverwrite+0x0)[0x558b38761b7e]
> >     postgres: startup recovering 000000010000000000000012(UndoReplay+0xa1d)[0x558b38766f32]
> >     postgres: startup recovering 000000010000000000000012(XactUndoReplay+0x77)[0x558b38769281]
> >     postgres: startup recovering 000000010000000000000012(smgr_redo+0x1af)[0x558b387aa7bd]
> > 
> > This one is somewhat similar:
> > 
> >     TRAP: FailedAssertion("page_offset >= SizeOfUndoPageHeaderData", File: "undopage.c", Line: 287)
> >     postgres: undo worker for database 36893 (ExceptionalCondition+0xa1)[0x5559c90f1350]
> >     postgres: undo worker for database 36893 (UndoPageOverwrite+0xa6)[0x5559c8cc8ae3]
> >     postgres: undo worker for database 36893 (UpdateLastAppliedRecord+0xbe)[0x5559c8ccd008]
> >     postgres: undo worker for database 36893 (smgr_undo+0xa6)[0x5559c8d11989]
> 
> Well, on repeated run of the test I could also hit the first one. I could fix
> it and will post a new version of the patch (along with some other small
> changes) this week.

Attached is the next version. Changes done:

  * Removed the progress tracking and implemented undo discarding in a simpler
    way. Now, instead of maintaining the pointer to the last record applied,
    only a boolean field in the chunk header is set when ROLLBACK is
    done. This helps to determine whether the undo of a non-committed
    transaction can be discarded.

  * Removed the "undo worker" that the previous version only used to apply the
    undo after crash recovery. The startup process does the work now.

  * Umplemented cleanup after crashed CREATE DATABASE and ALTER DATABASE ... SET TABLESPACE.

    BTW, I wonder if this change allows these commands to be executed in a
    transaction block. I think the reason to prohibit that is to minimize the
    window between creation of the files and transaction commit - if the
    server crashes in that window, the new database files survive but the
    catalog changes don't. But maybe there are other reasons. (I don't claim
    it's terribly useful to create database in a transaction block though
    because the client cannot connect to it w/o leaving the current
    transaction.)

  * Reordered the diffs, i.e. moved the discarding in front of the actual
    features.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Dumping/restoring fails on inherited generated column
Next
From: Alexey Kondratov
Date:
Subject: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly