Re: Preventing hangups in bgworker start/stop during DB shutdown - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Preventing hangups in bgworker start/stop during DB shutdown |
Date | |
Msg-id | CALj2ACWWiqC1Ci3dr4nTbpxH78z=fGO14uk6NEaT14zpbutKnA@mail.gmail.com Whole thread Raw |
In response to | Preventing hangups in bgworker start/stop during DB shutdown (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Preventing hangups in bgworker start/stop during DB shutdown
|
List | pgsql-hackers |
On Wed, Dec 23, 2020 at 3:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Here's an attempt at closing the race condition discussed in [1] > (and in some earlier threads, though I'm too lazy to find them). > > The core problem is that the bgworker management APIs were designed > without any thought for exception conditions, notably "we're not > gonna launch any more workers because we're shutting down the database". > A process waiting for a worker in WaitForBackgroundWorkerStartup or > WaitForBackgroundWorkerShutdown will wait forever, so that the database > fails to shut down without manual intervention. > > I'd supposed that we would need some incompatible changes in those APIs > in order to fix this, but after further study it seems like we could > hack things by just acting as though a request that won't be serviced > has already run to completion. I'm not terribly satisfied with that > as a long-term solution --- it seems to me that callers should be told > that there was a failure. But this looks to be enough to solve the > lockup condition for existing callers, and it seems like it'd be okay > to backpatch. > > Thoughts? > > [1] https://www.postgresql.org/message-id/flat/16785-c0207d8c67fb5f25%40postgresql.org 1) Yeah, the postmaster will not be able to start the bg workers in following cases, when bgworker_should_start_now returns false. So we might encounter the hang issue. static bool bgworker_should_start_now(BgWorkerStartTime start_time) { switch (pmState) { case PM_NO_CHILDREN: case PM_WAIT_DEAD_END: case PM_SHUTDOWN_2: case PM_SHUTDOWN: case PM_WAIT_BACKENDS: case PM_STOP_BACKENDS: break; 2) What if postmaster enters pmState >= PM_STOP_BACKENDS state after it calls BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS)? First of all, is it possible? I think yes, because we are in sigusr1_handler(), and I don't see we blocking the signal that sets pmState >= PM_STOP_BACKENDS either in sigusr1_handler or in BackgroundWorkerStateChange(). Though it's a small window, we might get into the hangup issue? If yes, can we check the pmState in the for loop in BackgroundWorkerStateChange()? if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE)) { - BackgroundWorkerStateChange(); + /* Accept new dynamic worker requests only if not stopping. */ + BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS); StartWorkerNeeded = true; } 3) Can we always say that if bgw_restart_time is BGW_NEVER_RESTART, then it's a dynamic bg worker? I think we can also have normal bgworkers with BGW_NEVER_RESTART flag(I see one in worker_spi.c _PG_init()), will there be any problem? Or do we need some comment tweaking? + /* + * If this is a dynamic worker request, and we aren't allowing new + * dynamic workers, then immediately mark it for termination; the next + * stanza will take care of cleaning it up. + */ + if (slot->worker.bgw_restart_time == BGW_NEVER_RESTART && + !allow_new_workers) + slot->terminate = true; 4) IIUC, in the patch we mark slot->terminate = true only for BGW_NEVER_RESTART kind bg workers, what happens if a bg worker has bgw_restart_time seconds and don't we hit the hanging issue(that we are trying to solve here) for those bg workers? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: