Preventing hangups in bgworker start/stop during DB shutdown - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Preventing hangups in bgworker start/stop during DB shutdown |
Date | |
Msg-id | 661570.1608673226@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Preventing hangups in bgworker start/stop during DB shutdown
Re: Preventing hangups in bgworker start/stop during DB shutdown |
List | pgsql-hackers |
Here's an attempt at closing the race condition discussed in [1] (and in some earlier threads, though I'm too lazy to find them). The core problem is that the bgworker management APIs were designed without any thought for exception conditions, notably "we're not gonna launch any more workers because we're shutting down the database". A process waiting for a worker in WaitForBackgroundWorkerStartup or WaitForBackgroundWorkerShutdown will wait forever, so that the database fails to shut down without manual intervention. I'd supposed that we would need some incompatible changes in those APIs in order to fix this, but after further study it seems like we could hack things by just acting as though a request that won't be serviced has already run to completion. I'm not terribly satisfied with that as a long-term solution --- it seems to me that callers should be told that there was a failure. But this looks to be enough to solve the lockup condition for existing callers, and it seems like it'd be okay to backpatch. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/flat/16785-c0207d8c67fb5f25%40postgresql.org diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c index 7ef6259eb5..b3ab8b0fa3 100644 --- a/src/backend/postmaster/bgworker.c +++ b/src/backend/postmaster/bgworker.c @@ -231,13 +231,15 @@ FindRegisteredWorkerBySlotNumber(int slotno) } /* - * Notice changes to shared memory made by other backends. This code - * runs in the postmaster, so we must be very careful not to assume that - * shared memory contents are sane. Otherwise, a rogue backend could take - * out the postmaster. + * Notice changes to shared memory made by other backends. + * Accept new dynamic worker requests only if allow_new_workers is true. + * + * This code runs in the postmaster, so we must be very careful not to assume + * that shared memory contents are sane. Otherwise, a rogue backend could + * take out the postmaster. */ void -BackgroundWorkerStateChange(void) +BackgroundWorkerStateChange(bool allow_new_workers) { int slotno; @@ -297,6 +299,15 @@ BackgroundWorkerStateChange(void) continue; } + /* + * If this is a dynamic worker request, and we aren't allowing new + * dynamic workers, then immediately mark it for termination; the next + * stanza will take care of cleaning it up. + */ + if (slot->worker.bgw_restart_time == BGW_NEVER_RESTART && + !allow_new_workers) + slot->terminate = true; + /* * If the worker is marked for termination, we don't need to add it to * the registered workers list; we can just free the slot. However, if @@ -503,12 +514,54 @@ BackgroundWorkerStopNotifications(pid_t pid) } } +/* + * Cancel any not-yet-started worker requests that have BGW_NEVER_RESTART. + * + * This is called during a normal ("smart" or "fast") database shutdown. + * After this point, no new background workers will be started, so any + * dynamic worker requests should be killed off, allowing anything that + * might be waiting for them to clean up and exit. + * + * This function should only be called from the postmaster. + */ +void +ForgetUnstartedBackgroundWorkers(void) +{ + slist_mutable_iter iter; + + slist_foreach_modify(iter, &BackgroundWorkerList) + { + RegisteredBgWorker *rw; + BackgroundWorkerSlot *slot; + + rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur); + Assert(rw->rw_shmem_slot < max_worker_processes); + slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot]; + + /* If it's not yet started, and it's a dynamic worker ... */ + if (slot->pid == InvalidPid && + rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART) + { + /* ... then zap it, and notify anything that was waiting */ + int notify_pid = rw->rw_worker.bgw_notify_pid; + + ForgetBackgroundWorker(&iter); + if (notify_pid != 0) + kill(notify_pid, SIGUSR1); + } + } +} + /* * Reset background worker crash state. * * We assume that, after a crash-and-restart cycle, background workers without * the never-restart flag should be restarted immediately, instead of waiting - * for bgw_restart_time to elapse. + * for bgw_restart_time to elapse. On the other hand, workers with that flag + * should be forgotten, because they are dynamic requests from processes that + * no longer exist. + * + * This function should only be called from the postmaster. */ void ResetBackgroundWorkerCrashTimes(void) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 5d09822c81..fa35bf4369 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -3796,6 +3796,13 @@ PostmasterStateMachine(void) */ if (pmState == PM_STOP_BACKENDS) { + /* + * Forget any pending requests for dynamic background workers, since + * we're no longer willing to launch any new workers. (If additional + * requests arrive, BackgroundWorkerStateChange will reject them.) + */ + ForgetUnstartedBackgroundWorkers(); + /* Signal all backend children except walsenders */ SignalSomeChildren(SIGTERM, BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND); @@ -5153,7 +5160,8 @@ sigusr1_handler(SIGNAL_ARGS) /* Process background worker state change. */ if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE)) { - BackgroundWorkerStateChange(); + /* Accept new dynamic worker requests only if not stopping. */ + BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS); StartWorkerNeeded = true; } diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h index f7e24664d5..dd6cabc45b 100644 --- a/src/include/postmaster/bgworker_internals.h +++ b/src/include/postmaster/bgworker_internals.h @@ -46,11 +46,12 @@ extern slist_head BackgroundWorkerList; extern Size BackgroundWorkerShmemSize(void); extern void BackgroundWorkerShmemInit(void); -extern void BackgroundWorkerStateChange(void); +extern void BackgroundWorkerStateChange(bool allow_new_workers); extern void ForgetBackgroundWorker(slist_mutable_iter *cur); extern void ReportBackgroundWorkerPID(RegisteredBgWorker *); extern void ReportBackgroundWorkerExit(slist_mutable_iter *cur); extern void BackgroundWorkerStopNotifications(pid_t pid); +extern void ForgetUnstartedBackgroundWorkers(void); extern void ResetBackgroundWorkerCrashTimes(void); /* Function to start a background worker, called from postmaster.c */
pgsql-hackers by date: