On Fri, Mar 13, 2015 at 7:00 PM, Robert Haas <
robertmhaas@gmail.com> wrote:
>
> On Fri, Mar 13, 2015 at 8:59 AM, Amit Kapila <
amit.kapila16@gmail.com> wrote:
> > We can't directly call DestroyParallelContext() to terminate workers as
> > it can so happen that by that time some of the workers are still not
> > started.
>
> That shouldn't be a problem. TerminateBackgroundWorker() not only
> kills an existing worker if there is one, but also tells the
> postmaster that if it hasn't started the worker yet, it should not
> bother. So at the conclusion of the first loop inside
> DestroyParallelContext(), every running worker will have received
> SIGTERM and no more workers will be started.
>
The problem occurs in second loop inside DestroyParallelContext()
where it calls WaitForBackgroundWorkerShutdown(). Basically
WaitForBackgroundWorkerShutdown() just checks for BGWH_STOPPED
status, refer below code in parallel-mode patch:
+ status = GetBackgroundWorkerPid(handle, &pid);
+ if (status == BGWH_STOPPED)
+ return status;
So if the status here returned is BGWH_NOT_YET_STARTED, then it
will go for WaitLatch and will there forever.
I think fix is to check if status is BGWH_STOPPED or BGWH_NOT_YET_STARTED,
then just return the status.
What do you say?