Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan
Date
Msg-id CAA4eK1L16BT1RAc-N8E03oE9xtATM3Xip4hcv1oo1g1t3pkacQ@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Seq Scan
List pgsql-hackers
On Fri, Mar 13, 2015 at 7:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Mar 13, 2015 at 8:59 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > We can't directly call DestroyParallelContext() to terminate workers as
> > it can so happen that by that time some of the workers are still not
> > started.
>
> That shouldn't be a problem.  TerminateBackgroundWorker() not only
> kills an existing worker if there is one, but also tells the
> postmaster that if it hasn't started the worker yet, it should not
> bother.  So at the conclusion of the first loop inside
> DestroyParallelContext(), every running worker will have received
> SIGTERM and no more workers will be started.
>

The problem occurs in second loop inside DestroyParallelContext()
where it calls WaitForBackgroundWorkerShutdown().  Basically
WaitForBackgroundWorkerShutdown() just checks for BGWH_STOPPED 
status, refer below code in parallel-mode patch:

+ status = GetBackgroundWorkerPid(handle, &pid);
+ if (status == BGWH_STOPPED)
+ return status;

So if the status here returned is BGWH_NOT_YET_STARTED, then it
will go for WaitLatch and will there forever.

I think fix is to check if status is BGWH_STOPPED or  BGWH_NOT_YET_STARTED,
then just return the status.

What do you say?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Amit Khandekar
Date:
Subject: Resetting crash time of background worker
Next
From: Craig Ringer
Date:
Subject: Re: Question about TEMP tables