Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start |
Date | |
Msg-id | 3793541.1731088615@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
|
List | pgsql-bugs |
I wrote: > Here's a proposed patch along that line. I left the test case from > ac04aa84a alone, since it works perfectly well to test this way too. I'd modeled that on the existing recovery code for DSM segment creation failure, just below. But a look at the code coverage report shows (unsurprisingly) that that path is never exercised in our regression tests, so I wondered if it actually works ... and it doesn't work very well. To test, I lobotomized InitializeParallelDSM to always force pcxt->nworkers = 0. That results in a bunch of unsurprising regression test diffs, plus a couple of +ERROR: could not find key 4 in shm TOC at 0x229f138 which turns out to be the fault of ExecHashJoinReInitializeDSM: it's not accounting for the possibility that we didn't really start a parallel hash join. I'm also not happy about ReinitializeParallelWorkers' Assert(pcxt->nworkers >= nworkers_to_launch); The one existing caller manages not to trigger that because it's careful to reduce its request based on pcxt->nworkers, but it doesn't seem to me that callers should be expected to have to. So I end with the attached. There might still be some more issues that the regression tests don't reach, but I think this is the best we can do for today. regards, tom lane diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index 4a2e352d57..a10bf02ccf 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -230,6 +230,15 @@ InitializeParallelDSM(ParallelContext *pcxt) shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FixedParallelState)); shm_toc_estimate_keys(&pcxt->estimator, 1); + /* + * If we manage to reach here while non-interruptible, it's unsafe to + * launch any workers: we would fail to process interrupts sent by them. + * We can deal with that edge case by pretending no workers were + * requested. + */ + if (!INTERRUPTS_CAN_BE_PROCESSED()) + pcxt->nworkers = 0; + /* * Normally, the user will have requested at least one worker process, but * if by chance they have not, we can skip a bunch of things here. @@ -476,6 +485,9 @@ InitializeParallelDSM(ParallelContext *pcxt) shm_toc_insert(pcxt->toc, PARALLEL_KEY_ENTRYPOINT, entrypointstate); } + /* Update nworkers_to_launch, in case we changed nworkers above. */ + pcxt->nworkers_to_launch = pcxt->nworkers; + /* Restore previous memory context. */ MemoryContextSwitchTo(oldcontext); } @@ -539,10 +551,11 @@ ReinitializeParallelWorkers(ParallelContext *pcxt, int nworkers_to_launch) { /* * The number of workers that need to be launched must be less than the - * number of workers with which the parallel context is initialized. + * number of workers with which the parallel context is initialized. But + * the caller might not know that InitializeParallelDSM reduced nworkers, + * so just silently trim the request. */ - Assert(pcxt->nworkers >= nworkers_to_launch); - pcxt->nworkers_to_launch = nworkers_to_launch; + pcxt->nworkers_to_launch = Min(pcxt->nworkers, nworkers_to_launch); } /* diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c index 2f7170604d..6c3009fba0 100644 --- a/src/backend/executor/nodeHashjoin.c +++ b/src/backend/executor/nodeHashjoin.c @@ -1713,8 +1713,13 @@ void ExecHashJoinReInitializeDSM(HashJoinState *state, ParallelContext *pcxt) { int plan_node_id = state->js.ps.plan->plan_node_id; - ParallelHashJoinState *pstate = - shm_toc_lookup(pcxt->toc, plan_node_id, false); + ParallelHashJoinState *pstate; + + /* Nothing to do if we failed to create a DSM segment. */ + if (pcxt->seg == NULL) + return; + + pstate = shm_toc_lookup(pcxt->toc, plan_node_id, false); /* * It would be possible to reuse the shared hash table in single-batch diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 3e3f0d486a..1f78dc3d53 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -342,11 +342,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions, * we want to allow parallel inserts in general; updates and deletes have * additional problems especially around combo CIDs.) * - * We don't try to use parallel mode unless interruptible. The leader - * expects ProcessInterrupts() calls to reach HandleParallelMessages(). - * Even if we called HandleParallelMessages() another way, starting a - * parallel worker is too delay-prone to be prudent when uncancellable. - * * For now, we don't try to use parallel mode if we're running inside a * parallel worker. We might eventually be able to relax this * restriction, but for now it seems best not to have parallel workers @@ -357,7 +352,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions, parse->commandType == CMD_SELECT && !parse->hasModifyingCTE && max_parallel_workers_per_gather > 0 && - INTERRUPTS_CAN_BE_PROCESSED() && !IsParallelWorker()) { /* all the cheap tests pass, so scan the query tree */
pgsql-bugs by date: