Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id 46514.1610975746@antos
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Antonin Houska <ah@cybertec.at>)
List pgsql-hackers
Dmitry Dolgov <9erthalion6@gmail.com> wrote:

> Thanks for the updated patch. As I've mentioned off the list I'm slowly
> looking through it with the intent to concentrate on undo progress
> tracking. But before I will post anything I want to mention couple of
> strange issues I see, otherwise I will forget for sure. Maybe it's
> already known, but running several times 'make installcheck' against a
> freshly build postgres with the patch applied from time to time I
> observe various errors.
>
> This one happens on a crash recovery, seems like
> UndoRecordSetXLogBufData has usr_type = USRT_INVALID and is involved in
> the replay process:
>
>     TRAP: FailedAssertion("page_offset + this_page_bytes <= uph->ud_insertion_point", File: "undopage.c", Line: 300)
>     postgres: startup recovering 000000010000000000000012(ExceptionalCondition+0xa1)[0x558b38b8a350]
>     postgres: startup recovering 000000010000000000000012(UndoPageSkipOverwrite+0x0)[0x558b38761b7e]
>     postgres: startup recovering 000000010000000000000012(UndoReplay+0xa1d)[0x558b38766f32]
>     postgres: startup recovering 000000010000000000000012(XactUndoReplay+0x77)[0x558b38769281]
>     postgres: startup recovering 000000010000000000000012(smgr_redo+0x1af)[0x558b387aa7bd]
>
> This one is somewhat similar:
>
>     TRAP: FailedAssertion("page_offset >= SizeOfUndoPageHeaderData", File: "undopage.c", Line: 287)
>     postgres: undo worker for database 36893 (ExceptionalCondition+0xa1)[0x5559c90f1350]
>     postgres: undo worker for database 36893 (UndoPageOverwrite+0xa6)[0x5559c8cc8ae3]
>     postgres: undo worker for database 36893 (UpdateLastAppliedRecord+0xbe)[0x5559c8ccd008]
>     postgres: undo worker for database 36893 (smgr_undo+0xa6)[0x5559c8d11989]

Well, on repeated run of the test I could also hit the first one. I could fix
it and will post a new version of the patch (along with some other small
changes) this week.

> There are also here and there messages about not found undo files:
>
>     ERROR:  cannot open undo segment file 'base/undo/000008.0000020000': No such file or directory
>     WARNING:  failed to undo transaction

I don't see this one in the log so far, will try again.

Thanks for the report!

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Next
From: Tomas Vondra
Date:
Subject: Re: list of extended statistics on psql