Re: Parallel Seq Scan - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Parallel Seq Scan |
Date | |
Msg-id | CA+TgmoYxoEFe6T_RwUv-b4S=5piqrVAWsw97aeycbh4=afXH9w@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel Seq Scan
|
List | pgsql-hackers |
On Tue, Oct 20, 2015 at 3:04 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > I have rebased the partial seq scan patch based on the above committed > patches. Now for rescanning it reuses the dsm and to achieve that we > need to ensure that workers have been completely shutdown and then > reinitializes the dsm. To ensure complete shutdown of workers, current > function WaitForParallelWorkersToFinish is not sufficient as that just > waits for the last message to receive from worker backend, so I have > written a new function WaitForParallelWorkersToDie. Also on receiving > 'X' message in HandleParallelMessage, it just frees the worker handle > without ensuring if the worker is died due to which later it will be > difficult > to even find whether worker is died or not, so I have removed that code > from HandleParallelMessage. Another change is that after receiving last > tuple in Gather node, it just shutdown down the workers without > destroying the dsm. + /* + * We can't finish transaction commit or abort until all of the + * workers are dead. This means, in particular, that we can't respond + * to interrupts at this stage. + */ + HOLD_INTERRUPTS(); + status = WaitForBackgroundWorkerShutdown(pcxt->worker[i].bgwhandle); + RESUME_INTERRUPTS(); These comments are correct when this code is called from DestroyParallelContext(), but they're flat wrong when called from ReinitializeParallelDSM(). I suggest moving the comment back to DestroyParallelContext and following it with this: HOLD_INTERRUPTS(); WaitForParallelWorkersToDie(); RESUME_INTERRUPTS(); Then ditch the HOLD/RESUME interrupts in WaitForParallelWorkersToDie() itself. This hunk is a problem: case 'X': /* Terminate, indicating clean exit */ { - pfree(pcxt->worker[i].bgwhandle); pfree(pcxt->worker[i].error_mqh); - pcxt->worker[i].bgwhandle = NULL; pcxt->worker[i].error_mqh= NULL; break; } If you do that on receipt of the 'X' message, then DestroyParallelContext() might SIGTERM a worker that has supposedly exited cleanly. That seems bad. I think maybe the solution is to make DestroyParallelContext() terminate the worker only if pcxt->worker[i].error_mqh != NULL. So make error_mqh == NULL mean a clean loss of a worker: either we couldn't register it, or it exited cleanly. And bgwhandle == NULL would mean it's actually gone. It makes sense to have ExecShutdownGather and ExecShutdownGatherWorkers, but couldn't the former call the latter instead of duplicating the code? I think ReInitialize should be capitalized as Reinitialize throughout. ExecParallelReInitializeTupleQueues is almost a cut-and-paste duplicate of ExecParallelSetupTupleQueues. Please refactor this to avoid duplication - e.g. change ExecParallelSetupTupleQueues(ParallelContext *pcxt) to take a second argument bool reinit. ExecParallelReInitializeTupleQueues can just do ExecParallelSetupTupleQueues(pxct, true). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: