Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Date
Msg-id 3694355.1731084396@sss.pgh.pa.us
Whole thread Raw
In response to Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start  (Noah Misch <noah@leadboat.com>)
Responses Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
List pgsql-bugs
Noah Misch <noah@leadboat.com> writes:
> On Thu, Nov 07, 2024 at 02:29:19PM -0500, Tom Lane wrote:
>> Ah, that could be a way out.  Stick an INTERRUPTS_CAN_BE_PROCESSED()
>> call somewhere in there?

> Exactly.  If !INTERRUPTS_CAN_BE_PROCESSED(), proceed as though no workers can
> be launched.

>> That could even allow us to revert the
>> planner change, which would simplify testing of the executor change.

> True.

Here's a proposed patch along that line.  I left the test case from
ac04aa84a alone, since it works perfectly well to test this way too.

We could argue about whether or not to revert the planner change.
But I'd prefer to do so, because as things stand it creates a
hard-to-reason-about source of plan instability.

            regards, tom lane

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 4a2e352d57..0c86543d40 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -230,6 +230,15 @@ InitializeParallelDSM(ParallelContext *pcxt)
     shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FixedParallelState));
     shm_toc_estimate_keys(&pcxt->estimator, 1);

+    /*
+     * If we manage to reach here while non-interruptible, it's unsafe to
+     * launch any workers: we would fail to process interrupts sent by them.
+     * We can deal with that edge case by pretending no workers were
+     * requested.
+     */
+    if (!INTERRUPTS_CAN_BE_PROCESSED())
+        pcxt->nworkers = 0;
+
     /*
      * Normally, the user will have requested at least one worker process, but
      * if by chance they have not, we can skip a bunch of things here.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e3f0d486a..1f78dc3d53 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -342,11 +342,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
      * we want to allow parallel inserts in general; updates and deletes have
      * additional problems especially around combo CIDs.)
      *
-     * We don't try to use parallel mode unless interruptible.  The leader
-     * expects ProcessInterrupts() calls to reach HandleParallelMessages().
-     * Even if we called HandleParallelMessages() another way, starting a
-     * parallel worker is too delay-prone to be prudent when uncancellable.
-     *
      * For now, we don't try to use parallel mode if we're running inside a
      * parallel worker.  We might eventually be able to relax this
      * restriction, but for now it seems best not to have parallel workers
@@ -357,7 +352,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
         parse->commandType == CMD_SELECT &&
         !parse->hasModifyingCTE &&
         max_parallel_workers_per_gather > 0 &&
-        INTERRUPTS_CAN_BE_PROCESSED() &&
         !IsParallelWorker())
     {
         /* all the cheap tests pass, so scan the query tree */

pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: HashAgg degenerate case
Next
From: Ľuboslav Špilák
Date:
Subject: Segmentation fault - PostgreSQL 17.0