Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: POC: Cleaning up orphaned files using undo logs
Date
Msg-id CAA4eK1L_8RJHTooM4prJTaJFQCbRTpaPKOtgLRcqRj3m0fqeQQ@mail.gmail.com
Whole thread Raw
In response to Re: POC: Cleaning up orphaned files using undo logs  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Fri, Jun 14, 2019 at 8:26 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> * current versions of the record and worker code discussed upthread by
> Amit and others
>

Thanks for posting the complete patchset.

Last time, I mentioned the remaining work in undo-processing patchset,
the status of which is as follows:
1. Enhance uur_progress so that it updates undo action apply progress
at regular intervals.  This has been done.  The idea is that we update
the transaction's undo apply progress at regular intervals so that
after a crash we can skip already applied undo.   The undo apply
progress is updated in terms of the number of blocks processed.
I think it is better to change the name of uur_progress to something
like uur_apply_progress.  Any suggestions?

2. Enhance to support oldestXidHavingUnappliedUndo, more on that
later.  This has been done.  The idea here is that we register all the
undo apply (transaction abort) requests in the hash table (referred to
as Rollback Hash Table in the patch) and we have a hard limit (after
that we won't allow new transactions to write undo) on how many such
requests can be pending.  So scanning this table gives us the value of
oldestXidHavingUnappliedUndo (actually the value for this will be
smallest of 'xid having pending undo' and 'oldestXmin').  As this
rollback hash table is not persistent, after start, we need to take a
pass over undo logs to register all the pending abort requests in the
rollback hash table.  There are two main purposes which this value
serves (a) Any Xid below this is all-visible, so it can help in
visibility checks, (b) it can help us implementing the rule that "No
aborted XID with an age >2^31 can have unapplied undo.".  This part
helps us to decide to truncate the clog because we can't truncate the
clog for transactions having undo.

3. Split the patch.  The patch is split into five patches.  I will
give a brief description of each patch which to a good extent is
mentioned in the commit message for each patch as well:
0010-Extend-binary-heap-functionality -  This patch adds the routines
to allocate binary heap in shared memory and to remove nth element
from binary heap.  These routines will be used by a later patch that
will allow an efficient way to process the pending rollback requests.

0011-Infrastructure-to-register-and-fetch-undo-action-req - This patch
provides an infrastructure to register and fetch undo action requests.
This infrastructure provides a way to allow execution of undo actions.
One might think that we can always execute undo actions on error or
explicit rollback by the user, however, there are cases when that is
not possible.  For example, (a) if the system crash while doing the
operation, then after startup, we need a way to perform undo actions;
(b) If we get an error while
performing undo actions.

Apart from this, when there are large rollback requests, then it is
quite inefficient to perform all the undo actions and then return
control to the user.

0012-Infrastructure-to-execute-pending-undo-actions - This provides an
infrastructure to execute pending undo actions.  To apply the undo
actions, we collect the undo records in bulk and try to process them
together.  We ensure to update the transaction's progress at regular
intervals so that after a crash we can skip already applied undo.
This needs some more work to generalize the processing of undo records
so that this infrastructure can be used by other AM's as well.

0013-Allow-foreground-transactions-to-perform-undo-action - This patch
allows foreground transactions to perform undo actions on abort.  We
always perform rollback actions after cleaning up the current
(sub)transaction.  This will ensure that we perform the actions
immediately after an error (and release the locks) rather than when
the user issues Rollback command at some later point of time.  We are
releasing the locks after the undo actions are applied.  The reason to
delay lock release is that if we release locks before applying undo
actions, then the parallel session can acquire the lock before us
which can lead to deadlock.

0014-Allow-execution-and-discard-of-undo-by-background-wo-  - This
patch allows execution and discard of undo by background workers.
Undo launcher is responsible for launching the workers iff there is
some work available in one of the work queues and there are more
workers available.  The worker is launched to handle requests for a
particular database.

The discard worker is responsible for discarding the undo log of
transactions that are committed and all-visible or are rolled-back.
It also registers the request for aborted transactions in the work
queues.  It iterates through all the active logs one-by-one and tries
to discard the transactions that are old enough to matter.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: pg_upgrade: Improve invalid option handling
Next
From: "张文杰"
Date:
Subject: Check the number of potential synchronous standbys