Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start - Mailing list pgsql-bugs
| From | Tom Lane |
|---|---|
| Subject | Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start |
| Date | |
| Msg-id | 3793541.1731088615@sss.pgh.pa.us Whole thread Raw |
| In response to | Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start (Tom Lane <tgl@sss.pgh.pa.us>) |
| Responses |
Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
|
| List | pgsql-bugs |
I wrote:
> Here's a proposed patch along that line. I left the test case from
> ac04aa84a alone, since it works perfectly well to test this way too.
I'd modeled that on the existing recovery code for DSM segment creation
failure, just below. But a look at the code coverage report shows
(unsurprisingly) that that path is never exercised in our regression
tests, so I wondered if it actually works ... and it doesn't work
very well. To test, I lobotomized InitializeParallelDSM to always
force pcxt->nworkers = 0. That results in a bunch of unsurprising
regression test diffs, plus a couple of
+ERROR: could not find key 4 in shm TOC at 0x229f138
which turns out to be the fault of ExecHashJoinReInitializeDSM:
it's not accounting for the possibility that we didn't really
start a parallel hash join.
I'm also not happy about ReinitializeParallelWorkers'
Assert(pcxt->nworkers >= nworkers_to_launch);
The one existing caller manages not to trigger that because it's
careful to reduce its request based on pcxt->nworkers, but it
doesn't seem to me that callers should be expected to have to.
So I end with the attached. There might still be some more issues
that the regression tests don't reach, but I think this is the
best we can do for today.
regards, tom lane
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 4a2e352d57..a10bf02ccf 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -230,6 +230,15 @@ InitializeParallelDSM(ParallelContext *pcxt)
shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FixedParallelState));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * If we manage to reach here while non-interruptible, it's unsafe to
+ * launch any workers: we would fail to process interrupts sent by them.
+ * We can deal with that edge case by pretending no workers were
+ * requested.
+ */
+ if (!INTERRUPTS_CAN_BE_PROCESSED())
+ pcxt->nworkers = 0;
+
/*
* Normally, the user will have requested at least one worker process, but
* if by chance they have not, we can skip a bunch of things here.
@@ -476,6 +485,9 @@ InitializeParallelDSM(ParallelContext *pcxt)
shm_toc_insert(pcxt->toc, PARALLEL_KEY_ENTRYPOINT, entrypointstate);
}
+ /* Update nworkers_to_launch, in case we changed nworkers above. */
+ pcxt->nworkers_to_launch = pcxt->nworkers;
+
/* Restore previous memory context. */
MemoryContextSwitchTo(oldcontext);
}
@@ -539,10 +551,11 @@ ReinitializeParallelWorkers(ParallelContext *pcxt, int nworkers_to_launch)
{
/*
* The number of workers that need to be launched must be less than the
- * number of workers with which the parallel context is initialized.
+ * number of workers with which the parallel context is initialized. But
+ * the caller might not know that InitializeParallelDSM reduced nworkers,
+ * so just silently trim the request.
*/
- Assert(pcxt->nworkers >= nworkers_to_launch);
- pcxt->nworkers_to_launch = nworkers_to_launch;
+ pcxt->nworkers_to_launch = Min(pcxt->nworkers, nworkers_to_launch);
}
/*
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2f7170604d..6c3009fba0 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1713,8 +1713,13 @@ void
ExecHashJoinReInitializeDSM(HashJoinState *state, ParallelContext *pcxt)
{
int plan_node_id = state->js.ps.plan->plan_node_id;
- ParallelHashJoinState *pstate =
- shm_toc_lookup(pcxt->toc, plan_node_id, false);
+ ParallelHashJoinState *pstate;
+
+ /* Nothing to do if we failed to create a DSM segment. */
+ if (pcxt->seg == NULL)
+ return;
+
+ pstate = shm_toc_lookup(pcxt->toc, plan_node_id, false);
/*
* It would be possible to reuse the shared hash table in single-batch
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e3f0d486a..1f78dc3d53 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -342,11 +342,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
* we want to allow parallel inserts in general; updates and deletes have
* additional problems especially around combo CIDs.)
*
- * We don't try to use parallel mode unless interruptible. The leader
- * expects ProcessInterrupts() calls to reach HandleParallelMessages().
- * Even if we called HandleParallelMessages() another way, starting a
- * parallel worker is too delay-prone to be prudent when uncancellable.
- *
* For now, we don't try to use parallel mode if we're running inside a
* parallel worker. We might eventually be able to relax this
* restriction, but for now it seems best not to have parallel workers
@@ -357,7 +352,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
parse->commandType == CMD_SELECT &&
!parse->hasModifyingCTE &&
max_parallel_workers_per_gather > 0 &&
- INTERRUPTS_CAN_BE_PROCESSED() &&
!IsParallelWorker())
{
/* all the cheap tests pass, so scan the query tree */
pgsql-bugs by date: