Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id 20210816150018.o7xbd74s6dzuhdtt@localhost
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Antonin Houska <ah@cybertec.at>)
Responses Re: POC: Cleaning up orphaned files using undo logs
List pgsql-hackers
> On Wed, Jun 30, 2021 at 07:41:16PM +0200, Antonin Houska wrote:
> Antonin Houska <ah@cybertec.at> wrote:
>
> > tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote:
> >
> > > I'm crawling like a snail to read the patch set.  Below are my first set of review comments, which are all
minor.
> >
> > Thanks.
>
> I've added the patch to the upcoming CF [1], so it possibly gets more review
> and makes some progress. I've marked myself as the author so it's clear who
> will try to respond to the reviews. It's clear that other people did much more
> work on the feature than I did so far - they are welcome to add themselves to
> the author list.
>
> [1] https://commitfest.postgresql.org/33/3228/

Hi,

I'm crawling through the patch set like even slower creature than a snail,
sorry for long absence. I'm reading the latest version posted here and,
although it's hard to give any high level design comments on it yet, I thought
it could be useful to post a few findings and questions in the meantime.

* One question about the functionality:

  > On Fri, Jan 29, 2021 at 06:30:15PM +0100, Antonin Houska wrote:
  > Attached is the next version. Changes done:
  >
  >   * Removed the progress tracking and implemented undo discarding in a simpler
  >     way. Now, instead of maintaining the pointer to the last record applied,
  >     only a boolean field in the chunk header is set when ROLLBACK is
  >     done. This helps to determine whether the undo of a non-committed
  >     transaction can be discarded.

  Just to clarify, the whole feature was removed for the sake of
  simplicity, right?

* By throwing at the patchset `make installcheck` I'm getting from time to time
  and error on the restart:

    TRAP: FailedAssertion("BufferIsValid(buffers[nbuffers].buffer)",
    File: "undorecordset.c", Line: 1098, PID: 6055)

  From what I see XLogReadBufferForRedoExtended finds an invalid buffer and
  returns BLK_NOTFOUND. The commentary says:

     If the block was not found, then it must be discarded later in
     the WAL.

  and continues with skip = false, but fails to get a page from an invalid
  buffer few lines later. It seems that the skip flag is supposed to be used
  this situation, should it also guard the BufferGetPage part?

* Another interesting issue I've found happened inside
  DropUndoLogsInTablespace, when the process got SIGTERM. It seems processing
  stuck on:

    slist_foreach_modify(iter, &UndoLogShared->shared_free_lists[i])

  iterating on the same element over and over. My guess is
  clear_private_free_lists was called and caused such unexpected outcome,
  should the access to shared_free_lists be somehow protected?

* I also wonder about the segments in base/undo, the commentary in pg_undodump
  says:

     Since the UNDO log is a continuous stream of changes, any hole
     terminates processing.

  It looks like it's relatively easy to end up with such holes, and pg_undodump
  ends up with a message (found is added by me and contains a found offset
  which do not match the expected value):

    pg_undodump: error: segment 0000000000 missing in log 2, found 0000100000

  This seems to be not causing any real issues, but it's not clear for me if
  such situation with gaps is fine or is it a problem?

Other than that one more time thank you for this tremendous work, I find that
the topic is of extreme importance.



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
Next
From: Ranier Vilela
Date:
Subject: Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode