Thread: Making BackgroundWorkerHandle a complete type or offering a worker enumeration API?

Hi all

While working on BDR, I've run into a situation that I think highlights
a limitation of the dynamic bgworker API that should be fixed.
Specifically, when the postmaster crashes and restarts shared memory
gets cleared but dynamic bgworkers don't get unregistered, and that's a
mess.

I have a few suggestions and would like your thoughts / preferences. If
I don't hear anything I'll post a patch to add a
BGW_UNREGISTER_ON_POSTMASTER_RESTART flag soon.


Details:


I think we need one of:

* An API to enumerate registered BackgroundWorker s and get their
BackgroundWorkerHandle s so a restarting extension can clean up its old
workers from before the restart by killing and unregistering them using
their handles;

* Make background workers unconditionally unregistered on postmaster
restart. This would probably have been the correct design, but I think
it's too late to change as an unconditional default now.

* Add a BGW_UNREGISTER_ON_POSTMASTER_CRASH (or whatever) flag that lets
extensions tell the postmaster to discard a dynamic bgworker when it
restarts and clears shmem. There was some earlier discussion (see below)
on that when unregistration on exit code 0 got added, but it didn't make
it in the final patch.

An API to enumerate bgworkers seems pretty easy to add, and generally
useful. The worker lib + funcname + argument + name should be sufficient
to identify a worker usefully. Worth having? When it came up before it
was bounced because we were too close to releasing 9.4 (heh) for proper
API discussion and review.

Overall I'd prefer to have a simple flag to unregister workers on
postmaster restart, but I think an enumeration API could be generally a
useful thing.

Note that in the prior thread related to this:

http://www.postgresql.org/message-id/534E250F.2060705@2ndquadrant.com

I was at that time passing pointers into postmaster-allocated memory
directly to bgworkers. That's no longer the case. The same problem
exists when you pass an offset into a shared memory array if you can't
guarantee that the array is always initialized with the same entries in
the same order when the postmaster restarts, though.

See in particular:

http://www.postgresql.org/message-id/20140416112144.GD17874@awork2.anarazel.de

--- Explanation ---

The latest BDR extension has a single static bgworker registered at
shared_preload_libraries time. This worker launches one dynamic bgworker
per database. Those dynamic bgworkers are in turn responsible for
launching workers for each connection to another node in the mesh
topology (and for various other tasks). They communicate via shared
memory blocks, where each worker has an offset into a shared memory array.

That's all fine until the postmaster crashes and restarts, zeroing
shared memory. The dynamic background workers are killed by the
postmaster, but *not* unregistered. Workers only get unregistered if
they exit with exit code 0, which isn't the case when they're killed, or
when their restart interval is BGW_NO_RESTART .

This means that when the workers start they try to access their shared
memory blocks via the offsets passed as arguments in the
BackgroundWorker struct and find that their shared memory block is
zeroed out, so they can do nothing but exit. Or worse, their shared
memory block might've since been initialized with data for some other
unrelated worker launched after the postmaster restart.

Meanwhile the static worker that manages BDR has been restarted and has
to set up shared memory and register workers. It doesn't have any way of
knowing which workers from before the postmaster restart were registered
or what their offsets into shared memory are. So it has to launch a new
batch of workers, but it has no way to stop the old workers from seeing
the new workers' shared memory blocks as their own.

To work around this, at Andres's suggestion I added a "worker generation
number" as a postmaster var that gets copied into the BDR shared memory
control segment at startup. The generation number is passed to workers
as the high 16 bits of their int32 worker argument, with the low 16 bits
being their offset into the shared memory array.

Workers check the generation number before trying to access their shared
memory blocks and proc_exit(0) if they see they're from a previous
generation or if the global generation number is zero (indicating shmem
got zeroed and hasn't been set up again yet).

This works, but it's ugly, especially the need to jam the generation
number into the worker argument. I think this problem will be common  as
adoption of dynamic bgworkers increases and should be fixed in 9.5 if
possible.

So: comments, preferences?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On Sat, Dec 13, 2014 at 4:13 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> While working on BDR, I've run into a situation that I think highlights
> a limitation of the dynamic bgworker API that should be fixed.
> Specifically, when the postmaster crashes and restarts shared memory
> gets cleared but dynamic bgworkers don't get unregistered, and that's a
> mess.

I've noticed this as well.  What I was thinking of proposing is that
we change things so that a BGW_NEVER_RESTART worker is unregistered
when a crash-and-restart cycle happens, but workers with any other
restart time are retained. What's happened to me a few times is that
the database crashes after registering BGW_NO_RESTART workers but
before the postmaster launches them; the postmaster fires them up
after completing the crash-and-restart cycle, but by then the dynamic
shared memory segments they are supposed to map are gone, so they just
start up uselessly and then die.

> The latest BDR extension has a single static bgworker registered at
> shared_preload_libraries time. This worker launches one dynamic bgworker
> per database. Those dynamic bgworkers are in turn responsible for
> launching workers for each connection to another node in the mesh
> topology (and for various other tasks). They communicate via shared
> memory blocks, where each worker has an offset into a shared memory array.
>
> That's all fine until the postmaster crashes and restarts, zeroing
> shared memory. The dynamic background workers are killed by the
> postmaster, but *not* unregistered. Workers only get unregistered if
> they exit with exit code 0, which isn't the case when they're killed, or
> when their restart interval is BGW_NO_RESTART .

Maybe it would be best to make the per-database workers BGW_NO_RESTART
and have the static bgworker, rather than the postmaster, be
responsible for starting them.  Then the fix mentioned above would
suffice.

If that's not good for some reason, my second choice is adding a
BGWORKER_UNREGISTER_AFTER_CRASH flag.  That seems much simpler and
less cumbersome than your other proposal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 12/16/2014 12:12 AM, Robert Haas wrote:
> On Sat, Dec 13, 2014 at 4:13 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> While working on BDR, I've run into a situation that I think highlights
>> a limitation of the dynamic bgworker API that should be fixed.
>> Specifically, when the postmaster crashes and restarts shared memory
>> gets cleared but dynamic bgworkers don't get unregistered, and that's a
>> mess.
> 
> I've noticed this as well.  What I was thinking of proposing is that
> we change things so that a BGW_NEVER_RESTART worker is unregistered
> when a crash-and-restart cycle happens, but workers with any other
> restart time are retained.

Personally I need workers that get restarted, but are discarded on
crash. They access shared memory, so when shmem is cleared I need them
to be discarded too, but otherwise I wish them to be persistent
until/unless they're intentionally unregistered.

If I have to use BGW_NO_RESTART then I land up having to implement
monitoring of worker status and manual re-registration in a supervisor
static worker. Which is a pain, since it's duplicating work the
postmaster would otherwise just be doing for me.

I'd really rather a separate flag.

> Maybe it would be best to make the per-database workers BGW_NO_RESTART
> and have the static bgworker, rather than the postmaster, be
> responsible for starting them.  Then the fix mentioned above would
> suffice.

Yeah... it would, but it involves a bunch of code that duplicates
process management the postmaster already does.

More importantly, if the supervisor worker crashes / is killed it loses
its handles to the other workers and the signals they send no longer go
to the right worker. So it's not robust.

> If that's not good for some reason, my second choice is adding a
> BGWORKER_UNREGISTER_AFTER_CRASH flag.  That seems much simpler and
> less cumbersome than your other proposal.

That'd be my preference.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On Mon, Dec 15, 2014 at 11:28 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> If that's not good for some reason, my second choice is adding a
>> BGWORKER_UNREGISTER_AFTER_CRASH flag.  That seems much simpler and
>> less cumbersome than your other proposal.
>
> That'd be my preference.

OK, let's do that, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 12/16/2014 12:31 AM, Robert Haas wrote:
> On Mon, Dec 15, 2014 at 11:28 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>> >> If that's not good for some reason, my second choice is adding a
>>> >> BGWORKER_UNREGISTER_AFTER_CRASH flag.  That seems much simpler and
>>> >> less cumbersome than your other proposal.
>> >
>> > That'd be my preference.
> OK, let's do that, then.

Right-o.

I had an earlier patch that added unregistration on exit(0) and also
added a flag like this. Only the first part got committed. I'll
resurrect it and rebase it on top of master.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services