Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | 87363.1611941415@antos Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Antonin Houska <ah@cybertec.at>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
|
List | pgsql-hackers |
Antonin Houska <ah@cybertec.at> wrote: > Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > Thanks for the updated patch. As I've mentioned off the list I'm slowly > > looking through it with the intent to concentrate on undo progress > > tracking. But before I will post anything I want to mention couple of > > strange issues I see, otherwise I will forget for sure. Maybe it's > > already known, but running several times 'make installcheck' against a > > freshly build postgres with the patch applied from time to time I > > observe various errors. > > > > This one happens on a crash recovery, seems like > > UndoRecordSetXLogBufData has usr_type = USRT_INVALID and is involved in > > the replay process: > > > > TRAP: FailedAssertion("page_offset + this_page_bytes <= uph->ud_insertion_point", File: "undopage.c", Line: 300) > > postgres: startup recovering 000000010000000000000012(ExceptionalCondition+0xa1)[0x558b38b8a350] > > postgres: startup recovering 000000010000000000000012(UndoPageSkipOverwrite+0x0)[0x558b38761b7e] > > postgres: startup recovering 000000010000000000000012(UndoReplay+0xa1d)[0x558b38766f32] > > postgres: startup recovering 000000010000000000000012(XactUndoReplay+0x77)[0x558b38769281] > > postgres: startup recovering 000000010000000000000012(smgr_redo+0x1af)[0x558b387aa7bd] > > > > This one is somewhat similar: > > > > TRAP: FailedAssertion("page_offset >= SizeOfUndoPageHeaderData", File: "undopage.c", Line: 287) > > postgres: undo worker for database 36893 (ExceptionalCondition+0xa1)[0x5559c90f1350] > > postgres: undo worker for database 36893 (UndoPageOverwrite+0xa6)[0x5559c8cc8ae3] > > postgres: undo worker for database 36893 (UpdateLastAppliedRecord+0xbe)[0x5559c8ccd008] > > postgres: undo worker for database 36893 (smgr_undo+0xa6)[0x5559c8d11989] > > Well, on repeated run of the test I could also hit the first one. I could fix > it and will post a new version of the patch (along with some other small > changes) this week. Attached is the next version. Changes done: * Removed the progress tracking and implemented undo discarding in a simpler way. Now, instead of maintaining the pointer to the last record applied, only a boolean field in the chunk header is set when ROLLBACK is done. This helps to determine whether the undo of a non-committed transaction can be discarded. * Removed the "undo worker" that the previous version only used to apply the undo after crash recovery. The startup process does the work now. * Umplemented cleanup after crashed CREATE DATABASE and ALTER DATABASE ... SET TABLESPACE. BTW, I wonder if this change allows these commands to be executed in a transaction block. I think the reason to prohibit that is to minimize the window between creation of the files and transaction commit - if the server crashes in that window, the new database files survive but the catalog changes don't. But maybe there are other reasons. (I don't claim it's terribly useful to create database in a transaction block though because the client cannot connect to it w/o leaving the current transaction.) * Reordered the diffs, i.e. moved the discarding in front of the actual features. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Attachment
pgsql-hackers by date: