Re: Proposals for making it easier to write correct bgworkers - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: Proposals for making it easier to write correct bgworkers
Date
Msg-id CABUevEy6TY9KjYMDm4=+z1AnDOZL_iroQSBgTROF-QCLasGMcQ@mail.gmail.com
Whole thread Raw
In response to Proposals for making it easier to write correct bgworkers  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers


On Thu, Sep 10, 2020 at 5:02 AM Craig Ringer <craig@2ndquadrant.com> wrote:
Hi all

As I've gained experience working on background workers, it's become increasingly clear that they're a bit too different to normal backends for many nontrivial uses.

<snip> a lot of proposals I agree with.



PROPOSED GENERALISED WORKER MANAGEMENT
----

Finally I'm wondering if there's any interest in generalizing the logical rep worker management for other bgworkers. I've done a ton of work with worker management and it's something I'm sure I could take on but I don't want to write it without knowing there's some chance of acceptance.

The general idea is to provide a way for bgworkers to start up managers for pools / sets of workers. They launch them and have a function they can call in their mainloop that watches their child worker states, invoking callbacks when they fail to launch, launch successfully, exit cleanly after finishing their work, or die with an error. Workers are tracked in a shmem seg where the start of the seg must be a key struct (akin to how the hash API works). We would provide calls to look up a worker shmem struct by key, signal a worker by key, wait for a worker to exit (up to timeout), etc. Like in the logical rep code, access to the worker registration shmem would be controlled by LWLock. The extension code can put whatever it wants in the worker shmem entries after the key, including various unions or whatever - the worker management API won't care.

This abstracts all the low level mess away from bgworker implementations and lets them focus on writing the code they want to run.

I'd probably suggest doing so by extracting the logical rep worker management, and making the logical rep code use the generalized worker management. So it'd be proven, and have in core users.

Yes, there is definitely a lot of interest in this.

It would also be good to somehow generalize away the difference between static bgworkers and dynamic ones. That's something that really annoyed us with the work on the "online checksums" patch, and I've also run into that issue in other cases. I think finding a way to launch a dynamic worker out of the postmaster would be a way to do that -- I haven't looked into the detail, but if we're looking at generalizing the worker management this is definitely something we should include in the consideration.

I haven't looked at the different places we could in theory extract the management out of and reuse, but it makes sense that the logical replication one would be the most appropriate since it's the newest one (vs autovacuum which is the other one that can at least do similar things). And yes, it definitely makes sense to have a generalized set of code for this, because it's certainly a fairly complicated pattern that we shouldn't be re-inventing over and over again with slightly different bugs.

--

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Inconsistency in determining the timestamp of the db statfile.
Next
From: Amit Kapila
Date:
Subject: Re: Inconsistency in determining the timestamp of the db statfile.