Re: Making BackgroundWorkerHandle a complete type or offering a worker enumeration API? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Making BackgroundWorkerHandle a complete type or offering a worker enumeration API?
Date
Msg-id CA+TgmoYa+RCe9KA6mt0jhpc7kWUdCYB5DwVQ3NbZnFwe6DNcWg@mail.gmail.com
Whole thread Raw
In response to Making BackgroundWorkerHandle a complete type or offering a worker enumeration API?  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: Making BackgroundWorkerHandle a complete type or offering a worker enumeration API?  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
On Sat, Dec 13, 2014 at 4:13 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> While working on BDR, I've run into a situation that I think highlights
> a limitation of the dynamic bgworker API that should be fixed.
> Specifically, when the postmaster crashes and restarts shared memory
> gets cleared but dynamic bgworkers don't get unregistered, and that's a
> mess.

I've noticed this as well.  What I was thinking of proposing is that
we change things so that a BGW_NEVER_RESTART worker is unregistered
when a crash-and-restart cycle happens, but workers with any other
restart time are retained. What's happened to me a few times is that
the database crashes after registering BGW_NO_RESTART workers but
before the postmaster launches them; the postmaster fires them up
after completing the crash-and-restart cycle, but by then the dynamic
shared memory segments they are supposed to map are gone, so they just
start up uselessly and then die.

> The latest BDR extension has a single static bgworker registered at
> shared_preload_libraries time. This worker launches one dynamic bgworker
> per database. Those dynamic bgworkers are in turn responsible for
> launching workers for each connection to another node in the mesh
> topology (and for various other tasks). They communicate via shared
> memory blocks, where each worker has an offset into a shared memory array.
>
> That's all fine until the postmaster crashes and restarts, zeroing
> shared memory. The dynamic background workers are killed by the
> postmaster, but *not* unregistered. Workers only get unregistered if
> they exit with exit code 0, which isn't the case when they're killed, or
> when their restart interval is BGW_NO_RESTART .

Maybe it would be best to make the per-database workers BGW_NO_RESTART
and have the static bgworker, rather than the postmaster, be
responsible for starting them.  Then the fix mentioned above would
suffice.

If that's not good for some reason, my second choice is adding a
BGWORKER_UNREGISTER_AFTER_CRASH flag.  That seems much simpler and
less cumbersome than your other proposal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Fractions in GUC variables
Next
From: Bruce Momjian
Date:
Subject: Re: Commitfest problems