Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Dmitry Dolgov |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | 20210816150018.o7xbd74s6dzuhdtt@localhost Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Antonin Houska <ah@cybertec.at>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
|
List | pgsql-hackers |
> On Wed, Jun 30, 2021 at 07:41:16PM +0200, Antonin Houska wrote: > Antonin Houska <ah@cybertec.at> wrote: > > > tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > > > > I'm crawling like a snail to read the patch set. Below are my first set of review comments, which are all minor. > > > > Thanks. > > I've added the patch to the upcoming CF [1], so it possibly gets more review > and makes some progress. I've marked myself as the author so it's clear who > will try to respond to the reviews. It's clear that other people did much more > work on the feature than I did so far - they are welcome to add themselves to > the author list. > > [1] https://commitfest.postgresql.org/33/3228/ Hi, I'm crawling through the patch set like even slower creature than a snail, sorry for long absence. I'm reading the latest version posted here and, although it's hard to give any high level design comments on it yet, I thought it could be useful to post a few findings and questions in the meantime. * One question about the functionality: > On Fri, Jan 29, 2021 at 06:30:15PM +0100, Antonin Houska wrote: > Attached is the next version. Changes done: > > * Removed the progress tracking and implemented undo discarding in a simpler > way. Now, instead of maintaining the pointer to the last record applied, > only a boolean field in the chunk header is set when ROLLBACK is > done. This helps to determine whether the undo of a non-committed > transaction can be discarded. Just to clarify, the whole feature was removed for the sake of simplicity, right? * By throwing at the patchset `make installcheck` I'm getting from time to time and error on the restart: TRAP: FailedAssertion("BufferIsValid(buffers[nbuffers].buffer)", File: "undorecordset.c", Line: 1098, PID: 6055) From what I see XLogReadBufferForRedoExtended finds an invalid buffer and returns BLK_NOTFOUND. The commentary says: If the block was not found, then it must be discarded later in the WAL. and continues with skip = false, but fails to get a page from an invalid buffer few lines later. It seems that the skip flag is supposed to be used this situation, should it also guard the BufferGetPage part? * Another interesting issue I've found happened inside DropUndoLogsInTablespace, when the process got SIGTERM. It seems processing stuck on: slist_foreach_modify(iter, &UndoLogShared->shared_free_lists[i]) iterating on the same element over and over. My guess is clear_private_free_lists was called and caused such unexpected outcome, should the access to shared_free_lists be somehow protected? * I also wonder about the segments in base/undo, the commentary in pg_undodump says: Since the UNDO log is a continuous stream of changes, any hole terminates processing. It looks like it's relatively easy to end up with such holes, and pg_undodump ends up with a message (found is added by me and contains a found offset which do not match the expected value): pg_undodump: error: segment 0000000000 missing in log 2, found 0000100000 This seems to be not causing any real issues, but it's not clear for me if such situation with gaps is fine or is it a problem? Other than that one more time thank you for this tremendous work, I find that the topic is of extreme importance.
pgsql-hackers by date: