Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CAA4eK1L_8RJHTooM4prJTaJFQCbRTpaPKOtgLRcqRj3m0fqeQQ@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Thomas Munro <thomas.munro@gmail.com>) |
List | pgsql-hackers |
On Fri, Jun 14, 2019 at 8:26 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > * current versions of the record and worker code discussed upthread by > Amit and others > Thanks for posting the complete patchset. Last time, I mentioned the remaining work in undo-processing patchset, the status of which is as follows: 1. Enhance uur_progress so that it updates undo action apply progress at regular intervals. This has been done. The idea is that we update the transaction's undo apply progress at regular intervals so that after a crash we can skip already applied undo. The undo apply progress is updated in terms of the number of blocks processed. I think it is better to change the name of uur_progress to something like uur_apply_progress. Any suggestions? 2. Enhance to support oldestXidHavingUnappliedUndo, more on that later. This has been done. The idea here is that we register all the undo apply (transaction abort) requests in the hash table (referred to as Rollback Hash Table in the patch) and we have a hard limit (after that we won't allow new transactions to write undo) on how many such requests can be pending. So scanning this table gives us the value of oldestXidHavingUnappliedUndo (actually the value for this will be smallest of 'xid having pending undo' and 'oldestXmin'). As this rollback hash table is not persistent, after start, we need to take a pass over undo logs to register all the pending abort requests in the rollback hash table. There are two main purposes which this value serves (a) Any Xid below this is all-visible, so it can help in visibility checks, (b) it can help us implementing the rule that "No aborted XID with an age >2^31 can have unapplied undo.". This part helps us to decide to truncate the clog because we can't truncate the clog for transactions having undo. 3. Split the patch. The patch is split into five patches. I will give a brief description of each patch which to a good extent is mentioned in the commit message for each patch as well: 0010-Extend-binary-heap-functionality - This patch adds the routines to allocate binary heap in shared memory and to remove nth element from binary heap. These routines will be used by a later patch that will allow an efficient way to process the pending rollback requests. 0011-Infrastructure-to-register-and-fetch-undo-action-req - This patch provides an infrastructure to register and fetch undo action requests. This infrastructure provides a way to allow execution of undo actions. One might think that we can always execute undo actions on error or explicit rollback by the user, however, there are cases when that is not possible. For example, (a) if the system crash while doing the operation, then after startup, we need a way to perform undo actions; (b) If we get an error while performing undo actions. Apart from this, when there are large rollback requests, then it is quite inefficient to perform all the undo actions and then return control to the user. 0012-Infrastructure-to-execute-pending-undo-actions - This provides an infrastructure to execute pending undo actions. To apply the undo actions, we collect the undo records in bulk and try to process them together. We ensure to update the transaction's progress at regular intervals so that after a crash we can skip already applied undo. This needs some more work to generalize the processing of undo records so that this infrastructure can be used by other AM's as well. 0013-Allow-foreground-transactions-to-perform-undo-action - This patch allows foreground transactions to perform undo actions on abort. We always perform rollback actions after cleaning up the current (sub)transaction. This will ensure that we perform the actions immediately after an error (and release the locks) rather than when the user issues Rollback command at some later point of time. We are releasing the locks after the undo actions are applied. The reason to delay lock release is that if we release locks before applying undo actions, then the parallel session can acquire the lock before us which can lead to deadlock. 0014-Allow-execution-and-discard-of-undo-by-background-wo- - This patch allows execution and discard of undo by background workers. Undo launcher is responsible for launching the workers iff there is some work available in one of the work queues and there are more workers available. The worker is launched to handle requests for a particular database. The discard worker is responsible for discarding the undo log of transactions that are committed and all-visible or are rolled-back. It also registers the request for aborted transactions in the work queues. It iterates through all the active logs one-by-one and tries to discard the transactions that are old enough to matter. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: