On Wed, Mar 18, 2015 at 2:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Can you try this:
>>
>> diff --git a/src/backend/postmaster/bgworker.c
>> b/src/backend/postmaster/bgworker.c
>> index f80141a..39b919f 100644
>> --- a/src/backend/postmaster/bgworker.c
>> +++ b/src/backend/postmaster/bgworker.c
>> @@ -244,6 +244,8 @@ BackgroundWorkerStateChange(void)
>> rw->rw_terminate = true;
>> if (rw->rw_pid != 0)
>> kill(rw->rw_pid, SIGTERM);
>> + else
>> + ReportBackgroundWorkerPID(rw);
>> }
>> continue;
>> }
>>
>
> It didn't fix the problem. IIUC, you have done this to ensure that
> if worker is not already started, then update it's pid, so that we
> can get the required status in WaitForBackgroundWorkerShutdown().
> As this is a timing issue, it can so happen that before Postmaster
> gets a chance to report the pid, backend has already started waiting
> on WaitLatch().
I think I figured out the problem. That fix only helps in the case
where the postmaster noticed the new registration previously but
didn't start the worker, and then later notices the termination.
What's much more likely to happen is that the worker is started and
terminated so quickly that both happen before we create a
RegisteredBgWorker for it. The attached patch fixes that case, too.
Assuming this actually fixes the problem, I think we should back-patch
it into 9.4. To recap, the problem is that, at present, if you
register a worker and then terminate it before it's launched,
GetBackgroundWorkerPid() will still return BGWH_NOT_YET_STARTED, which
it makes it seem like we're still waiting for it to start. But when
or if the slot is reused for an unrelated registration, then
GetBackgroundWorkerPid() will switch to returning BGWH_STOPPED. It's
hard to believe that's the behavior anyone wants. With this patch,
the return value will always be BGWH_STOPPED in this situation. That
has the virtue of being consistent, and practically speaking I think
it's the behavior that everyone will want, because the case where this
matters is when you are waiting for workers to start or waiting for
worker to stop, and in either case you will want to treat a worker
that was marked for termination before the postmaster actually started
it as already-stopped rather than not-yet-started.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company