Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Date
Msg-id 3793541.1731088615@sss.pgh.pa.us
Whole thread Raw
In response to Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
List pgsql-bugs
I wrote:
> Here's a proposed patch along that line.  I left the test case from
> ac04aa84a alone, since it works perfectly well to test this way too.

I'd modeled that on the existing recovery code for DSM segment creation
failure, just below.  But a look at the code coverage report shows
(unsurprisingly) that that path is never exercised in our regression
tests, so I wondered if it actually works ... and it doesn't work
very well.  To test, I lobotomized InitializeParallelDSM to always
force pcxt->nworkers = 0.  That results in a bunch of unsurprising
regression test diffs, plus a couple of

+ERROR:  could not find key 4 in shm TOC at 0x229f138

which turns out to be the fault of ExecHashJoinReInitializeDSM:
it's not accounting for the possibility that we didn't really
start a parallel hash join.

I'm also not happy about ReinitializeParallelWorkers'

    Assert(pcxt->nworkers >= nworkers_to_launch);

The one existing caller manages not to trigger that because it's
careful to reduce its request based on pcxt->nworkers, but it
doesn't seem to me that callers should be expected to have to.

So I end with the attached.  There might still be some more issues
that the regression tests don't reach, but I think this is the
best we can do for today.

            regards, tom lane

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 4a2e352d57..a10bf02ccf 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -230,6 +230,15 @@ InitializeParallelDSM(ParallelContext *pcxt)
     shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FixedParallelState));
     shm_toc_estimate_keys(&pcxt->estimator, 1);

+    /*
+     * If we manage to reach here while non-interruptible, it's unsafe to
+     * launch any workers: we would fail to process interrupts sent by them.
+     * We can deal with that edge case by pretending no workers were
+     * requested.
+     */
+    if (!INTERRUPTS_CAN_BE_PROCESSED())
+        pcxt->nworkers = 0;
+
     /*
      * Normally, the user will have requested at least one worker process, but
      * if by chance they have not, we can skip a bunch of things here.
@@ -476,6 +485,9 @@ InitializeParallelDSM(ParallelContext *pcxt)
         shm_toc_insert(pcxt->toc, PARALLEL_KEY_ENTRYPOINT, entrypointstate);
     }

+    /* Update nworkers_to_launch, in case we changed nworkers above. */
+    pcxt->nworkers_to_launch = pcxt->nworkers;
+
     /* Restore previous memory context. */
     MemoryContextSwitchTo(oldcontext);
 }
@@ -539,10 +551,11 @@ ReinitializeParallelWorkers(ParallelContext *pcxt, int nworkers_to_launch)
 {
     /*
      * The number of workers that need to be launched must be less than the
-     * number of workers with which the parallel context is initialized.
+     * number of workers with which the parallel context is initialized.  But
+     * the caller might not know that InitializeParallelDSM reduced nworkers,
+     * so just silently trim the request.
      */
-    Assert(pcxt->nworkers >= nworkers_to_launch);
-    pcxt->nworkers_to_launch = nworkers_to_launch;
+    pcxt->nworkers_to_launch = Min(pcxt->nworkers, nworkers_to_launch);
 }

 /*
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2f7170604d..6c3009fba0 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1713,8 +1713,13 @@ void
 ExecHashJoinReInitializeDSM(HashJoinState *state, ParallelContext *pcxt)
 {
     int            plan_node_id = state->js.ps.plan->plan_node_id;
-    ParallelHashJoinState *pstate =
-        shm_toc_lookup(pcxt->toc, plan_node_id, false);
+    ParallelHashJoinState *pstate;
+
+    /* Nothing to do if we failed to create a DSM segment. */
+    if (pcxt->seg == NULL)
+        return;
+
+    pstate = shm_toc_lookup(pcxt->toc, plan_node_id, false);

     /*
      * It would be possible to reuse the shared hash table in single-batch
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e3f0d486a..1f78dc3d53 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -342,11 +342,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
      * we want to allow parallel inserts in general; updates and deletes have
      * additional problems especially around combo CIDs.)
      *
-     * We don't try to use parallel mode unless interruptible.  The leader
-     * expects ProcessInterrupts() calls to reach HandleParallelMessages().
-     * Even if we called HandleParallelMessages() another way, starting a
-     * parallel worker is too delay-prone to be prudent when uncancellable.
-     *
      * For now, we don't try to use parallel mode if we're running inside a
      * parallel worker.  We might eventually be able to relax this
      * restriction, but for now it seems best not to have parallel workers
@@ -357,7 +352,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
         parse->commandType == CMD_SELECT &&
         !parse->hasModifyingCTE &&
         max_parallel_workers_per_gather > 0 &&
-        INTERRUPTS_CAN_BE_PROCESSED() &&
         !IsParallelWorker())
     {
         /* all the cheap tests pass, so scan the query tree */

pgsql-bugs by date:

Previous
From: Noah Misch
Date:
Subject: Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Next
From: Jeff Davis
Date:
Subject: Re: HashAgg degenerate case