Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CAJ3gD9d+No_Y_2EHMeYTDbC=9f68RBq2fuMEqeMsZD3v23qctQ@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
|
List | pgsql-hackers |
On Fri, 26 Jul 2019 at 12:25, Amit Kapila <amit.kapila16@gmail.com> wrote: > I agree with all your other comments. Thanks for addressing the comments. Below is the continuation of my comments from 0014-Allow-execution-and-discard-of-undo-by-background-wo.patch : + * Perform rollback request. We need to connect to the database for first + * request and that is required because we access system tables while for first request and that is required => for the first request. This is required --------------- +UndoLauncherShmemSize(void) +{ + Size size; + + /* + * Need the fixed struct and the array of LogicalRepWorker. + */ + size = sizeof(UndoApplyCtxStruct); The fixed structure size should be offsetof(UndoApplyCtxStruct, workers) rather than sizeof(UndoApplyCtxStruct) --------------- In UndoWorkerCleanup(), we set individual fields of the UndoApplyWorker structure, whereas in UndoLauncherShmemInit(), for all the UndoApplyWorker array elements, we just memset all the UndoApplyWorker structure elements to 0. I think we should be consistent for the two cases. I guess we can just memset to 0 as you do in UndoLauncherShmemInit(), but this will cause the worker->undo_worker_queue to be 0 i.e. XID_QUEUE , whereas in UndoWorkerCleanup(), it is set to -1. Is the -1 value essential, or we can just set it to XID_QUEUE initially ? Also, if we just use memset in UndoWorkerCleanup(), we need to first save generation into a temp variable, and then after memset(), restore it back. That brought me to another point : We already have a macro ResetUndoRequestInfo(), so UndoWorkerCleanup() can just call ResetUndoRequestInfo(). ------------ + bool allow_peek; + + CHECK_FOR_INTERRUPTS(); + + allow_peek = !TimestampDifferenceExceeds(started_at, Some comments would be good about what is allow_peek used for. Something like : "Arrange to prevent the worker from restarting quickly to switch databases" ----------------- +++ b/src/backend/access/undo/README.UndoProcessing ----------------- +worker then start reading from one of the queues the requests for that start=>starts --------------- +work, it lingers for UNDO_WORKER_LINGER_MS (10s as default). This avoids As per the latest definition, it is 20s. IMHO, there's no need to mention the default value in the readme. --------------- +++ b/src/backend/access/undo/discardworker.c --------------- + * portion of transaction that is overflowed into a separate log can be processed 80-col crossed. +#include "access/undodiscard.h" +#include "access/discardworker.h" Not in alphabetical order +++ b/src/backend/access/undo/undodiscard.c --------------- + next_insert = UndoLogGetNextInsertPtr(logno); I checked UndoLogGetNextInsertPtr() definition. It calls find_undo_log_slot() to get back the slot from logno. Why not make it accept slot as against logno ? At all other places, the slot->logno is passed, so it is convenient to just pass the slot there. And in UndoDiscardOneLog(), first call find_undo_log_slot() just before the above line (or call it at the end of the do-while loop). This way, during each of the UndoLogGetNextInsertPtr() calls in undorequest.c, we will have one less find_undo_log_slot() call. My suggestion is of course valid only under the assumption that when you call UndoLogGetNextInsertPtr(fooslot->logno), then inside UndoLogGetNextInsertPtr(), find_undo_log_slot() will return back the same fooslot. ------------- In UndoDiscardOneLog(), there are at least 2 variable declarations that can be moved inside the do-while loop : uur and next_insert. I am not sure about the other variables viz : undofxid and latest_discardxid. Values of these variables in one iteration continue across to the second iteration. For latest_discardxid, it looks like we do want its value to be carried forward, but is it also true for undofxid ? + /* If we reach here, this means there is something to discard. */ + need_discard = true; + } while (true); Also, about need_discard; there is no place where need_discard is set to false. That means, from 2nd iteration onwards, it will never be false. So even if the code that explicitly sets need_discard to true does not get run, still the undolog will be discarded. Is this expected ? ------------- + if (request_rollback && dbid_exists(uur->uur_txn->urec_dbid)) + { + (void) RegisterRollbackReq(InvalidUndoRecPtr, + undo_recptr, + uur->uur_txn->urec_dbid, + uur->uur_fxid); + + pending_abort = true; + } We can get rid of request_rollback variable. Whatever the "if" block above is doing, do it in this upper condition : if (!IsXactApplyProgressCompleted(uur->uur_txn->urec_progress)) Something like this : if (!IsXactApplyProgressCompleted(uur->uur_txn->urec_progress)) { if (dbid_exists(uur->uur_txn->urec_dbid)) { (void) RegisterRollbackReq(InvalidUndoRecPtr, undo_recptr, uur->uur_txn->urec_dbid, uur->uur_fxid); pending_abort = true; } } ------------- + UndoRecordRelease(uur); + uur = NULL; + } ..... ..... + Assert(uur == NULL); + + /* If we reach here, this means there is something to discard. */ + need_discard = true; + } while (true); Looks like it is neither necessary to set uur to NULL, nor is it necessary to have the Assert(uur == NULL). At the start of each iteration uur is anyway assigned a fresh value, which may or may not be NULL. ------------- + * over undo logs is complete, new undo can is allowed to be written in the new undo can is allowed => new undo is allowed + * hash table size. So before start allowing any new transaction to write the before start allowing => before allowing any new transactions to start writing the ------------- + /* Get the smallest of 'xid having pending undo' and 'oldestXmin' */ + oldestXidHavingUndo = RollbackHTGetOldestFullXid(oldestXidHavingUndo); + .... + .... + if (FullTransactionIdIsValid(oldestXidHavingUndo)) + pg_atomic_write_u64(&ProcGlobal->oldestFullXidHavingUnappliedUndo, + U64FromFullTransactionId(oldestXidHavingUndo)); Is it possible that the FullTransactionId returned by RollbackHTGetOldestFullXid() could be invalid ? If not, then the if condition above can be changed to an Assert(). ------------- + * If the log is already discarded, then we are done. It is important + * to first check this to ensure that tablespace containing this log + * doesn't get dropped concurrently. + */ + LWLockAcquire(&slot->mutex, LW_SHARED); + /* + * We don't have to worry about slot recycling and check the logno + * here, since we don't care about the identity of this slot, we're + * visiting all of them. I guess, it's accidental that the LWLockAcquire() call is *between* the two comments ? ----------- + if (UndoRecPtrGetCategory(undo_recptr) == UNDO_SHARED) + { + /* + * For the "shared" category, we only discard when the + * rm_undo_status callback tells us we can. + */ + status = RmgrTable[uur->uur_rmid].rm_undo_status(uur, &wait_xid); status variable could be declared in this block itself. ------------- Some variable declaration alignments and comments spacing need changes as per pgindent. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: