Re: Preventing hangups in bgworker start/stop during DB shutdown - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Preventing hangups in bgworker start/stop during DB shutdown
Date
Msg-id CAGRY4nxAD8NS6t9ExLKksnqxrgc7fV-wFrJytsPPPq4p3TC-Xw@mail.gmail.com
Whole thread Raw
In response to Preventing hangups in bgworker start/stop during DB shutdown  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, 23 Dec 2020 at 05:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Here's an attempt at closing the race condition discussed in [1]
(and in some earlier threads, though I'm too lazy to find them).

The core problem is that the bgworker management APIs were designed
without any thought for exception conditions, notably "we're not
gonna launch any more workers because we're shutting down the database".
A process waiting for a worker in WaitForBackgroundWorkerStartup or
WaitForBackgroundWorkerShutdown will wait forever, so that the database
fails to shut down without manual intervention.

I'd supposed that we would need some incompatible changes in those APIs
in order to fix this, but after further study it seems like we could
hack things by just acting as though a request that won't be serviced
has already run to completion.  I'm not terribly satisfied with that
as a long-term solution --- it seems to me that callers should be told
that there was a failure.  But this looks to be enough to solve the
lockup condition for existing callers, and it seems like it'd be okay
to backpatch.

Thoughts?

Callers who launch bgworkers already have to cope with conditions such as the worker failing immediately after launch, or before attaching to the shmem segment used for worker management by whatever extension is launching it.

So I think it's reasonable to lie and say we launched it. The caller must already cope with this case to behave correctly.

Patch specifics:

> This function should only be called from the postmaster

It'd be good to

    Assert(IsPostmasterEnvironment && !IsUnderPostmaster)

in these functions.

Otherwise at first read the patch and rationale looks sensible to me.

(When it comes to the bgw APIs in general I have a laundry list of things I'd like to change or improve around signal handling, error trapping and recovery, and lots more, but that's for another thread.)

pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Next
From: Michael Paquier
Date:
Subject: Cleanup some -I$(libpq_srcdir) in makefiles