Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Kuntal Ghosh
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAGz5QCJ2LHXPMdkVx=GHanWcU7BeKhMjxzvhmeU7D15UGtYxaw@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: POC: Cleaning up orphaned files using undo logs  (Robert Haas <robertmhaas@gmail.com>)
Re: POC: Cleaning up orphaned files using undo logs  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Hello Thomas,

On Mon, Sep 16, 2019 at 11:23 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> > 1. In UndoLogAllocateInRecovery, when we find the current log number
> > from the list of registered blocks, we don't check whether the
> > block->in_use flag is true or not. In XLogResetInsertion, we just
> > reset in_use flag without reseting the blocks[]->rnode information.
> > So, if we don't check the in_use flag, it's possible that we'll
> > consult some block information from the previous WAL record. IMHO,
> > just adding an in_use check in UndoLogAllocateInRecovery will solve
> > the problem.
>
> Agreed.  I added a line to break out of that loop if !block->in_use.
>
I think we should skip the block if !block->in_use. Because, the undo
buffer can be registered in a subsequent block as well. For different
operations, we can use different block_id to register the undo buffer
in the redo record.

> BTW I am planning to simplify that code considerably, based on a plan
> to introduce a new rule: there can be only one undo record and
> therefore only one undo allocation per WAL record.
>
Okay. In that case, we need to rethink the cases for multi-inserts and
non-inlace updates both of which currently inserts multiple undo
record corresponding to a single WAL record. For multi-inserts, it can
be solved easily by moving all the offset information in the payload.
But, for non-inlace updates, we insert one undo record for the update
and one for the insert. Wondering whether we've to insert two WAL
records - one for update and one for the new insert.

> > 2. A transaction, inserts one undo record and generated a WAL record
> > for the same, say at WAL location 0/2000A000. Next, the undo record
> > gets discarded and WAL is generated to update the meta.discard pointer
> > at location 0/2000B000  At the same time, an ongoing checkpoint with
> > checkpoint.redo at 0/20000000 flushes the latest meta.discard pointer.
> > Now, the system crashes.
> > Now, the recovery starts from the location 0/20000000. When the
> > recovery of 0/2000A000 happens, it sees the undo record that it's
> > about to insert, is already discarded as per meta.discard (flushed by
> > checkpoint). In this case, should we just skip inserting the undo
> > record?
>
> I see two options:
>
> 1.  We make it so that if you're allocating in recovery and discard >
> insert, we'll just set discard = insert so you can proceed.  The code
> in undofile_get_segment_file() already copes with missing files during
> recovery.
>
Interesting. This should work.

>
> > 3. Currently, we create a backup image of the unlogged part of the
> > undo log's metadata only when some backend allocates some space from
> > the undo log (in UndoLogAllocate). This helps us restore the unlogged
> > meta part after a checkpoint.
> > When we perform an undo action, we also update the undo action
> > progress and emit an WAL record. The same operation can performed by
> > the undo worker which doesn't allocate any space from the undo log.
> > So, if an undo worker emits an WAL record to update undo action
> > progress after a checkpoint, it'll not be able to WAL log the backup
> > image of the meta unlogged part. IMHO, this breaks the recovery logic
> > of unlogged part of undo meta.
>
> I thought that was OK because those undo data updates don't depend on
> the insert pointer.  But I see what you mean: the next modification of
> the page that DOES depend on the insert pointer might not log the
> meta-data if it's not the first WAL record to touch it after a
> checkpoint.  Rats.  I'll have to think about that some more.
Cool.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: Re: Define jsonpath functions as stable
Next
From: Jeff Janes
Date:
Subject: Re: Default JIT setting in V12