Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CAA4eK1K_6wQPd=f4JeeWeaJcKU-YopVd90WUouuKVodGRKFOAQ@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Amit Khandekar <amitdkhan.pg@gmail.com>) |
List | pgsql-hackers |
On Tue, Jul 23, 2019 at 8:12 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > -------------------- > > Some further review comments for undoworker.c : > > > +/* Sets the worker's lingering status. */ > +static void > +UndoWorkerIsLingering(bool sleep) > The function name sounds like "is the worker lingering ?". Can we > rename it to something like "UndoWorkerSetLingering" ? > makes sense, changed as per suggestion. > ------------- > > + errmsg("undo worker slot %d is empty, cannot attach", > + slot))); > > + } > + > + if (MyUndoWorker->proc) > + { > + LWLockRelease(UndoWorkerLock); > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("undo worker slot %d is already used by " > + "another worker, cannot attach", slot))); > > These two error messages can have a common error message "could not > attach to worker slot", with errdetail separate for each of them : > slot %d is empty. > slot %d is already used by another worker. > > -------------- > Changed as per suggestion. > +static int > +IsUndoWorkerAvailable(void) > +{ > + int i; > + int alive_workers = 0; > + > + LWLockAcquire(UndoWorkerLock, LW_EXCLUSIVE); > > Should have bool return value. > > Also, why is it keeping track of number of alive workers ? Sounds like > earlier it used to return number of alive workers ? If it indeed needs > to just return true/false, we can do away with alive_workers. > > Also, *exclusive* lock is unnecessary. > > -------------- > Changed as per suggestion. Additionally, I changed the name of the function to UndoWorkerIsAvailable(), so that it is similar to other functions in the file. > +if (UndoGetWork(false, false, &urinfo, NULL) && > + IsUndoWorkerAvailable()) > + UndoWorkerLaunch(urinfo); > > There is no lock acquired between IsUndoWorkerAvailable() and > UndoWorkerLaunch(); that means even though IsUndoWorkerAvailable() > returns true, there is a small window where UndoWorkerLaunch() does > not find any worker slot with in_use false, causing assertion failure > for (worker != NULL). > -------------- > I have removed the assert and instead added a warning. I have also added a comment from the place where we call UndoWorkerLaunch to mention the race condition. > +UndoWorkerGetSlotInfo(int slot, UndoRequestInfo *urinfo) > +{ > + /* Block concurrent access. */ > + LWLockAcquire(UndoWorkerLock, LW_EXCLUSIVE); > *Exclusive* lock is unnecessary. > ------------- > Right, changed to share lock. > + LWLockRelease(UndoWorkerLock); > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("undo worker slot %d is empty", > + slot))); > I believe there is no need to explicitly release an lwlock before > raising an error, since the lwlocks get released during error > recovery. Please check all other places where this is done. > ------------- > Fixed. > + * Start new undo apply background worker, if possible otherwise return false. > worker, if possible otherwise => worker if possible, otherwise > ------------- > > +static bool > +UndoWorkerLaunch(UndoRequestInfo urinfo) > We don't check UndoWorkerLaunch() return value. Can't we make it's > return value type void ? > Now, the function returns void and accordingly I have adjusted the comment which should address both the above comments. > Also, it would be better to have urinfo as pointer to UndoRequestInfo > rather than UndoRequestInfo, so as to avoid structure copy. > ------------- > Okay, changed as per suggestion. > +{ > + BackgroundWorker bgw; > + BackgroundWorkerHandle *bgw_handle; > + uint16 generation; > + int i; > + int slot = 0; > We can remove variable i, and use slot variable in place of i. > ----------- > > + snprintf(bgw.bgw_name, BGW_MAXLEN, "undo apply worker"); > I think it would be trivial to also append the worker->generation in > the bgw_name. > ------------- > I am not sure if adding 'generation' is any useful. It might be better to add database id as each worker can work on a particular database, so that could be useful information. > > + if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle)) > + { > + /* Failed to start worker, so clean up the worker slot. */ > + LWLockAcquire(UndoWorkerLock, LW_EXCLUSIVE); > + UndoWorkerCleanup(worker); > + LWLockRelease(UndoWorkerLock); > + > + return false; > + } > > Is it intentional that there is no (warning?) message logged when we > can't register a bg worker ? > ------------- > Added a warning in that code path. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: