Thread: Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start

On Mon, Sep 16, 2024 at 09:35:13PM +0200, Francesco Degrassi wrote:
> The problem appears to manifest when a backend is holding an LWLock and
> starting a query, and the planner chooses a parallel plan for the
> latter.

Thanks for the detailed report and for the fix.

> Potential fixes
> ---------------
> 
> As an experiment, we modified the planner code to consider the state of
> `InterruptHoldoffCount` when determining the value of
> `glob->parallelOK`: if `InterruptHoldoffCount` > 0, then `parallelOK`
> is set to false.
> 
> This ensures a sequential plan is executed if interrupts are being held
> on the leader backend, and the query completes normally.
> 
> The patch is attached as `no_parallel_on_interrupts_held.patch`.

Looks good.  An alternative would be something like the leader periodically
waking up to call HandleParallelMessages() outside of ProcessInterrupts().  I
like your patch better, though.  Parallel query is a lot of infrastructure to
be running while immune to statement_timeout, pg_cancel_backend(), etc.  I
opted to check INTERRUPTS_CAN_BE_PROCESSED(), since QueryCancelHoldoffCount!=0
doesn't cause the hang but still qualifies as a good reason to stay out of
parallel query.  Pushed that way:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ac04aa8

> Related issues
> ==============
> 
> -   Query stuck with wait event IPC / ParallelFinish
>     -
> https://www.postgresql.org/message-id/0f64b4c7fc200890f2055ce4d6650e9c2191fac2.camel\@j-davis.com

This one didn't reproduce for me.  Like your test, it involves custom code
running inside an opclass.  I'm comfortable assuming it's the same problem.

> -   BUG \#18586: Process (and transaction) is stuck in IPC when the DB
>     is under high load
>     -
> https://www.postgresql.org/message-id/flat/18586-03e1535b1b34db81%40postgresql.org

Here, I'm not seeing enough detail to judge if it's the same.  That's okay.



Noah Misch <noah@leadboat.com> writes:
> On Mon, Sep 16, 2024 at 09:35:13PM +0200, Francesco Degrassi wrote:
>> The problem appears to manifest when a backend is holding an LWLock and
>> starting a query, and the planner chooses a parallel plan for the
>> latter.

> Thanks for the detailed report and for the fix.

I do not have much faith in this patch.  It assumes that the
condition "interrupts can be processed" is the same at plan time and
execution time.  For plans extracted from the plan cache, there seems
little reason to assume that must be true.  The proposed test case
cannot trigger that (today anyway) because SQL-language functions
don't deal in cached plans, but I suspect it could be broken with a
test case using a plpgsql function instead.

I actually think that the problem is somewhere upstream of here:
what in the world are we doing invoking planning and execution of
arbitrary queries in a context where interrupts are held off?
That seems certain to fail in multiple ways besides parallel
workers not working.  I think we need to fix whatever code
believes that that could be OK.  And maybe then insert
"Assert(INTERRUPTS_CAN_BE_PROCESSED())" at planner start
and executor start.

            regards, tom lane



On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Mon, Sep 16, 2024 at 09:35:13PM +0200, Francesco Degrassi wrote:
> >> The problem appears to manifest when a backend is holding an LWLock and
> >> starting a query, and the planner chooses a parallel plan for the
> >> latter.
> 
> > Thanks for the detailed report and for the fix.
> 
> I do not have much faith in this patch.  It assumes that the
> condition "interrupts can be processed" is the same at plan time and
> execution time.  For plans extracted from the plan cache, there seems
> little reason to assume that must be true.  The proposed test case
> cannot trigger that (today anyway) because SQL-language functions
> don't deal in cached plans, but I suspect it could be broken with a
> test case using a plpgsql function instead.

Good point.  I missed that.

> I actually think that the problem is somewhere upstream of here:
> what in the world are we doing invoking planning and execution of
> arbitrary queries in a context where interrupts are held off?
> That seems certain to fail in multiple ways besides parallel
> workers not working.  I think we need to fix whatever code
> believes that that could be OK.  And maybe then insert
> "Assert(INTERRUPTS_CAN_BE_PROCESSED())" at planner start
> and executor start.

It would be nice to never run planning or execution with interrupts held.  The
concrete examples so far involve btree operator classes with non-C support
functions.  How practical would it be to release buffer content locks before
running index support functions?  An alternative would be blocking non-C
support functions (and instructing C support function authors to not use
planner/executor).  Non-C support functions have been handy for testing if
nothing else, though.  Do non-bundled modules rely on non-C support functions?

Thanks,
nm



Noah Misch <noah@leadboat.com> writes:
> On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
>> I actually think that the problem is somewhere upstream of here:
>> what in the world are we doing invoking planning and execution of
>> arbitrary queries in a context where interrupts are held off?

> It would be nice to never run planning or execution with interrupts held.  The
> concrete examples so far involve btree operator classes with non-C support
> functions.  How practical would it be to release buffer content locks before
> running index support functions?

Ugh.  Probably not very.  We have to be able to perform comparison
operations within the scanning of a page.  Even if it could be made
to work correctly to release and re-acquire the buffer lock many
times per page, it sounds awful for performance.

> An alternative would be blocking non-C
> support functions (and instructing C support function authors to not use
> planner/executor).  Non-C support functions have been handy for testing if
> nothing else, though.  Do non-bundled modules rely on non-C support functions?

Dunno ... but the OP claimed this is a case that's seen in the
field, so maybe somebody is doing it.  On the whole I don't see
how a btree support function can be considered not to be a low-level
thing, so maybe restricting what it can do is the best answer.
I fear though that the restriction can't simply be to forbid
parallel sub-queries.

On the third hand: you can't implement a btree comparison function
in SQL and simultaneously claim with a straight face that you
expect premium performance.  Could we set things up so that
buffer release/reacquires happen only with non-C support functions?
This still requires that such behavior is safe in the face of
concurrent index activity, which I'm not sure is practical.
(Also, I'm not sure to what extent we'd still be testing what
we wish to with test comparison functions written in SQL.)

            regards, tom lane



On Wed, Sep 18, 2024 at 01:18:48AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
> >> I actually think that the problem is somewhere upstream of here:
> >> what in the world are we doing invoking planning and execution of
> >> arbitrary queries in a context where interrupts are held off?
> 
> > It would be nice to never run planning or execution with interrupts held.  The
> > concrete examples so far involve btree operator classes with non-C support
> > functions.  How practical would it be to release buffer content locks before
> > running index support functions?
> 
> Ugh.  Probably not very.  We have to be able to perform comparison
> operations within the scanning of a page.  Even if it could be made
> to work correctly to release and re-acquire the buffer lock many
> times per page, it sounds awful for performance.

Sounds that way.

> > An alternative would be blocking non-C
> > support functions (and instructing C support function authors to not use
> > planner/executor).  Non-C support functions have been handy for testing if
> > nothing else, though.  Do non-bundled modules rely on non-C support functions?
> 
> Dunno ... but the OP claimed this is a case that's seen in the
> field, so maybe somebody is doing it.  On the whole I don't see
> how a btree support function can be considered not to be a low-level
> thing, so maybe restricting what it can do is the best answer.
> I fear though that the restriction can't simply be to forbid
> parallel sub-queries.
> 
> On the third hand: you can't implement a btree comparison function
> in SQL and simultaneously claim with a straight face that you
> expect premium performance.  Could we set things up so that
> buffer release/reacquires happen only with non-C support functions?
> This still requires that such behavior is safe in the face of
> concurrent index activity, which I'm not sure is practical.
> (Also, I'm not sure to what extent we'd still be testing what
> we wish to with test comparison functions written in SQL.)

Not to rule it out yet, but your point about reduced test fidelity is cause
for hesitation.  Also, I'd regret even having that code bloating up a btree
hot path.

What if we just ignored the plancache when uninterruptible?  The new planner
check would then suffice.  If it slows anything folks care about, either find
a way to become interruptible or make a C function that bypasses planner &
executor.



Noah Misch <noah@leadboat.com> writes:
> What if we just ignored the plancache when uninterruptible?  The new planner
> check would then suffice.

Only if you believe that parallel-query is the only problem,
which is something I seriously doubt.  I fear that the
committed patch is just a band-aid over one symptom of the
general problem that we can't permit arbitrary operations
to be invoked from a btree comparison function.  It's late
here so I'm not sufficiently awake to think of examples,
but I'm sure there are some.

However ... clearly a maliciously-coded btree support function can
do arbitrarily bad stuff.  We restrict creation of opclasses to
superusers for exactly that reason.  If our ambitions are only
to prevent support functions from *accidentally* causing problems,
is disabling parallel query enough?  I'm still pretty uncomfortable
about it, but it's less obviously insufficient than in the general
case.

            regards, tom lane



On Wed, Sep 18, 2024 at 1:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dunno ... but the OP claimed this is a case that's seen in the
> field, so maybe somebody is doing it.  On the whole I don't see
> how a btree support function can be considered not to be a low-level
> thing, so maybe restricting what it can do is the best answer.

Making it possible to do arbitrarily complicated things from B-Tree
support functions seems out of the question. We're not going to
maintain parallel versions of the code that releases buffer locks
before calling (say) the opclass ORDER proc. Just for example, how
would such a scheme work with code like _bt_check_unique?

> I fear though that the restriction can't simply be to forbid
> parallel sub-queries.

Why not?

The test case provided was intended to be illustrative of a problem
that some foreign data wrapper ran into, when it used SPI. I don't
think that the true problem was in any way related to B-Tree indexes.

--
Peter Geoghegan



On Wed, Sep 18, 2024 at 01:45:31AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > What if we just ignored the plancache when uninterruptible?  The new planner
> > check would then suffice.
> 
> Only if you believe that parallel-query is the only problem,
> which is something I seriously doubt.  I fear that the
> committed patch is just a band-aid over one symptom of the
> general problem that we can't permit arbitrary operations
> to be invoked from a btree comparison function.  It's late
> here so I'm not sufficiently awake to think of examples,
> but I'm sure there are some.

I bet one can cause an (undetected) lwlock deadlock, but I don't know of a way
to do that "accidentally".

> However ... clearly a maliciously-coded btree support function can
> do arbitrarily bad stuff.  We restrict creation of opclasses to
> superusers for exactly that reason.  If our ambitions are only
> to prevent support functions from *accidentally* causing problems,
> is disabling parallel query enough?  I'm still pretty uncomfortable
> about it, but it's less obviously insufficient than in the general
> case.

I'm okay stopping at blocking some known accidents.  I'm not aware of an
approach that would block 100% of unwanted outcomes here without also removing
a feature folks would miss.  Nice things about blocking parallel query:

- Cheap to block
- Always hangs if attempted, so blocking it doesn't take away something of value
- Query still completes, without parallelism
- Writers of opclass functions probably aren't thinking of parallel query

For what it's worth, I tried making standard_ExecutorStart() warn if
!INTERRUPTS_CAN_BE_PROCESSED().  Only this thread's new test and
004_verify_nbtree_unique reached the warning.  (That's not a surprise.)

On Wed, Sep 18, 2024 at 08:59:22AM -0400, Peter Geoghegan wrote:
> The test case provided was intended to be illustrative of a problem
> that some foreign data wrapper ran into, when it used SPI.

Ideally, we'd block those or at least warn under assertions so FDW authors
don't accidentally run the executor with an LWLock held.  Unlike the opclass
case, we so far don't have a valid use case for holding an LWLock there.  In
other words, if the opclass use case is the only known-valid one, it would be
nice to have checks so new use cases don't creep in.



On Thu, 19 Sept 2024 at 04:53, Noah Misch <noah@leadboat.com> wrote:
> For what it's worth, I tried making standard_ExecutorStart() warn if
> !INTERRUPTS_CAN_BE_PROCESSED().  Only this thread's new test and
> 004_verify_nbtree_unique reached the warning.  (That's not a surprise.)

The reproducer I provided is actually a minimization of
004_verify_nbtree_unique, so it's just the one case actually.

> On Wed, Sep 18, 2024 at 08:59:22AM -0400, Peter Geoghegan wrote:
> > The test case provided was intended to be illustrative of a problem
> > that some foreign data wrapper ran into, when it used SPI.
>
> Ideally, we'd block those or at least warn under assertions so FDW authors
> don't accidentally run the executor with an LWLock held.  Unlike the opclass
> case, we so far don't have a valid use case for holding an LWLock there.  In
> other words, if the opclass use case is the only known-valid one, it would be
> nice to have checks so new use cases don't creep in.

My 2c as a FDW developer, having a warning when calling into the SPI with a
LWLock held would have allowed us to identify this issue months ago, well
before we  stumbled into a parallel plan and hang.

Best regards

--
Francesco



Noah Misch <noah@leadboat.com> writes:
> On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
>> I do not have much faith in this patch.  It assumes that the
>> condition "interrupts can be processed" is the same at plan time and
>> execution time.  For plans extracted from the plan cache, there seems
>> little reason to assume that must be true.  The proposed test case
>> cannot trigger that (today anyway) because SQL-language functions
>> don't deal in cached plans, but I suspect it could be broken with a
>> test case using a plpgsql function instead.

> Good point.  I missed that.

While working on the release notes, I noticed that nothing further
got done about this concern.  What do you think of adding a test
somewhere early in executor startup, to the effect of

    if (!INTERRUPTS_CAN_BE_PROCESSED())
        ereport(ERROR,
                (errmsg("cannot start a query with interrupts disabled")));

It's definitely a band-aid, but it seems better than leaving
things at the status quo.

            regards, tom lane



On Thu, Nov 07, 2024 at 12:17:21PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Wed, Sep 18, 2024 at 12:23:42AM -0400, Tom Lane wrote:
> >> I do not have much faith in this patch.  It assumes that the
> >> condition "interrupts can be processed" is the same at plan time and
> >> execution time.  For plans extracted from the plan cache, there seems
> >> little reason to assume that must be true.  The proposed test case
> >> cannot trigger that (today anyway) because SQL-language functions
> >> don't deal in cached plans, but I suspect it could be broken with a
> >> test case using a plpgsql function instead.
> 
> > Good point.  I missed that.
> 
> While working on the release notes, I noticed that nothing further
> got done about this concern.  What do you think of adding a test
> somewhere early in executor startup, to the effect of
> 
>     if (!INTERRUPTS_CAN_BE_PROCESSED())
>         ereport(ERROR,
>                 (errmsg("cannot start a query with interrupts disabled")));
> 
> It's definitely a band-aid, but it seems better than leaving
> things at the status quo.

That would fire in queries that don't error or hang today, so I think that
would be worse than the status quo.

This thread's previous commit just forced a serial plan.  The executor
counterpart would look like having the executor do what it does when there are
no free worker slots.



Noah Misch <noah@leadboat.com> writes:
> On Thu, Nov 07, 2024 at 12:17:21PM -0500, Tom Lane wrote:
>> While working on the release notes, I noticed that nothing further
>> got done about this concern.  What do you think of adding a test
>> somewhere early in executor startup, to the effect of
>>
>>     if (!INTERRUPTS_CAN_BE_PROCESSED())
>>         ereport(ERROR,
>>                 (errmsg("cannot start a query with interrupts disabled")));
>>
>> It's definitely a band-aid, but it seems better than leaving
>> things at the status quo.

> That would fire in queries that don't error or hang today, so I think that
> would be worse than the status quo.

Good point.

> This thread's previous commit just forced a serial plan.

My concern is that the previous commit forced new plans to be serial,
but did nothing about re-use of an existing plan.

> The executor
> counterpart would look like having the executor do what it does when there are
> no free worker slots.

Ah, that could be a way out.  Stick an INTERRUPTS_CAN_BE_PROCESSED()
call somewhere in there?  That could even allow us to revert the
planner change, which would simplify testing of the executor change.

            regards, tom lane



On Thu, Nov 07, 2024 at 02:29:19PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > This thread's previous commit just forced a serial plan.
> 
> My concern is that the previous commit forced new plans to be serial,
> but did nothing about re-use of an existing plan.

I agree.  It solved one way of reaching the problem, leaving at least one
unsolved.

> > The executor
> > counterpart would look like having the executor do what it does when there are
> > no free worker slots.
> 
> Ah, that could be a way out.  Stick an INTERRUPTS_CAN_BE_PROCESSED()
> call somewhere in there?

Exactly.  If !INTERRUPTS_CAN_BE_PROCESSED(), proceed as though no workers can
be launched.

> That could even allow us to revert the
> planner change, which would simplify testing of the executor change.

True.



Noah Misch <noah@leadboat.com> writes:
> On Thu, Nov 07, 2024 at 02:29:19PM -0500, Tom Lane wrote:
>> Ah, that could be a way out.  Stick an INTERRUPTS_CAN_BE_PROCESSED()
>> call somewhere in there?

> Exactly.  If !INTERRUPTS_CAN_BE_PROCESSED(), proceed as though no workers can
> be launched.

>> That could even allow us to revert the
>> planner change, which would simplify testing of the executor change.

> True.

Here's a proposed patch along that line.  I left the test case from
ac04aa84a alone, since it works perfectly well to test this way too.

We could argue about whether or not to revert the planner change.
But I'd prefer to do so, because as things stand it creates a
hard-to-reason-about source of plan instability.

            regards, tom lane

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 4a2e352d57..0c86543d40 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -230,6 +230,15 @@ InitializeParallelDSM(ParallelContext *pcxt)
     shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FixedParallelState));
     shm_toc_estimate_keys(&pcxt->estimator, 1);

+    /*
+     * If we manage to reach here while non-interruptible, it's unsafe to
+     * launch any workers: we would fail to process interrupts sent by them.
+     * We can deal with that edge case by pretending no workers were
+     * requested.
+     */
+    if (!INTERRUPTS_CAN_BE_PROCESSED())
+        pcxt->nworkers = 0;
+
     /*
      * Normally, the user will have requested at least one worker process, but
      * if by chance they have not, we can skip a bunch of things here.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e3f0d486a..1f78dc3d53 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -342,11 +342,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
      * we want to allow parallel inserts in general; updates and deletes have
      * additional problems especially around combo CIDs.)
      *
-     * We don't try to use parallel mode unless interruptible.  The leader
-     * expects ProcessInterrupts() calls to reach HandleParallelMessages().
-     * Even if we called HandleParallelMessages() another way, starting a
-     * parallel worker is too delay-prone to be prudent when uncancellable.
-     *
      * For now, we don't try to use parallel mode if we're running inside a
      * parallel worker.  We might eventually be able to relax this
      * restriction, but for now it seems best not to have parallel workers
@@ -357,7 +352,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
         parse->commandType == CMD_SELECT &&
         !parse->hasModifyingCTE &&
         max_parallel_workers_per_gather > 0 &&
-        INTERRUPTS_CAN_BE_PROCESSED() &&
         !IsParallelWorker())
     {
         /* all the cheap tests pass, so scan the query tree */

On Fri, Nov 08, 2024 at 11:46:36AM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Thu, Nov 07, 2024 at 02:29:19PM -0500, Tom Lane wrote:
> >> Ah, that could be a way out.  Stick an INTERRUPTS_CAN_BE_PROCESSED()
> >> call somewhere in there?
> 
> > Exactly.  If !INTERRUPTS_CAN_BE_PROCESSED(), proceed as though no workers can
> > be launched.
> 
> >> That could even allow us to revert the
> >> planner change, which would simplify testing of the executor change.
> 
> > True.
> 
> Here's a proposed patch along that line.  I left the test case from
> ac04aa84a alone, since it works perfectly well to test this way too.
> 
> We could argue about whether or not to revert the planner change.
> But I'd prefer to do so, because as things stand it creates a
> hard-to-reason-about source of plan instability.

Looks perfect.  Thank you.



I wrote:
> Here's a proposed patch along that line.  I left the test case from
> ac04aa84a alone, since it works perfectly well to test this way too.

I'd modeled that on the existing recovery code for DSM segment creation
failure, just below.  But a look at the code coverage report shows
(unsurprisingly) that that path is never exercised in our regression
tests, so I wondered if it actually works ... and it doesn't work
very well.  To test, I lobotomized InitializeParallelDSM to always
force pcxt->nworkers = 0.  That results in a bunch of unsurprising
regression test diffs, plus a couple of

+ERROR:  could not find key 4 in shm TOC at 0x229f138

which turns out to be the fault of ExecHashJoinReInitializeDSM:
it's not accounting for the possibility that we didn't really
start a parallel hash join.

I'm also not happy about ReinitializeParallelWorkers'

    Assert(pcxt->nworkers >= nworkers_to_launch);

The one existing caller manages not to trigger that because it's
careful to reduce its request based on pcxt->nworkers, but it
doesn't seem to me that callers should be expected to have to.

So I end with the attached.  There might still be some more issues
that the regression tests don't reach, but I think this is the
best we can do for today.

            regards, tom lane

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 4a2e352d57..a10bf02ccf 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -230,6 +230,15 @@ InitializeParallelDSM(ParallelContext *pcxt)
     shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FixedParallelState));
     shm_toc_estimate_keys(&pcxt->estimator, 1);

+    /*
+     * If we manage to reach here while non-interruptible, it's unsafe to
+     * launch any workers: we would fail to process interrupts sent by them.
+     * We can deal with that edge case by pretending no workers were
+     * requested.
+     */
+    if (!INTERRUPTS_CAN_BE_PROCESSED())
+        pcxt->nworkers = 0;
+
     /*
      * Normally, the user will have requested at least one worker process, but
      * if by chance they have not, we can skip a bunch of things here.
@@ -476,6 +485,9 @@ InitializeParallelDSM(ParallelContext *pcxt)
         shm_toc_insert(pcxt->toc, PARALLEL_KEY_ENTRYPOINT, entrypointstate);
     }

+    /* Update nworkers_to_launch, in case we changed nworkers above. */
+    pcxt->nworkers_to_launch = pcxt->nworkers;
+
     /* Restore previous memory context. */
     MemoryContextSwitchTo(oldcontext);
 }
@@ -539,10 +551,11 @@ ReinitializeParallelWorkers(ParallelContext *pcxt, int nworkers_to_launch)
 {
     /*
      * The number of workers that need to be launched must be less than the
-     * number of workers with which the parallel context is initialized.
+     * number of workers with which the parallel context is initialized.  But
+     * the caller might not know that InitializeParallelDSM reduced nworkers,
+     * so just silently trim the request.
      */
-    Assert(pcxt->nworkers >= nworkers_to_launch);
-    pcxt->nworkers_to_launch = nworkers_to_launch;
+    pcxt->nworkers_to_launch = Min(pcxt->nworkers, nworkers_to_launch);
 }

 /*
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2f7170604d..6c3009fba0 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1713,8 +1713,13 @@ void
 ExecHashJoinReInitializeDSM(HashJoinState *state, ParallelContext *pcxt)
 {
     int            plan_node_id = state->js.ps.plan->plan_node_id;
-    ParallelHashJoinState *pstate =
-        shm_toc_lookup(pcxt->toc, plan_node_id, false);
+    ParallelHashJoinState *pstate;
+
+    /* Nothing to do if we failed to create a DSM segment. */
+    if (pcxt->seg == NULL)
+        return;
+
+    pstate = shm_toc_lookup(pcxt->toc, plan_node_id, false);

     /*
      * It would be possible to reuse the shared hash table in single-batch
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e3f0d486a..1f78dc3d53 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -342,11 +342,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
      * we want to allow parallel inserts in general; updates and deletes have
      * additional problems especially around combo CIDs.)
      *
-     * We don't try to use parallel mode unless interruptible.  The leader
-     * expects ProcessInterrupts() calls to reach HandleParallelMessages().
-     * Even if we called HandleParallelMessages() another way, starting a
-     * parallel worker is too delay-prone to be prudent when uncancellable.
-     *
      * For now, we don't try to use parallel mode if we're running inside a
      * parallel worker.  We might eventually be able to relax this
      * restriction, but for now it seems best not to have parallel workers
@@ -357,7 +352,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
         parse->commandType == CMD_SELECT &&
         !parse->hasModifyingCTE &&
         max_parallel_workers_per_gather > 0 &&
-        INTERRUPTS_CAN_BE_PROCESSED() &&
         !IsParallelWorker())
     {
         /* all the cheap tests pass, so scan the query tree */

On Fri, Nov 08, 2024 at 12:56:55PM -0500, Tom Lane wrote:
> I wrote:
> > Here's a proposed patch along that line.  I left the test case from
> > ac04aa84a alone, since it works perfectly well to test this way too.
> 
> I'd modeled that on the existing recovery code for DSM segment creation
> failure, just below.  But a look at the code coverage report shows
> (unsurprisingly) that that path is never exercised in our regression
> tests, so I wondered if it actually works ... and it doesn't work
> very well.  To test, I lobotomized InitializeParallelDSM to always
> force pcxt->nworkers = 0.  That results in a bunch of unsurprising
> regression test diffs, plus a couple of
> 
> +ERROR:  could not find key 4 in shm TOC at 0x229f138
> 
> which turns out to be the fault of ExecHashJoinReInitializeDSM:
> it's not accounting for the possibility that we didn't really
> start a parallel hash join.
> 
> I'm also not happy about ReinitializeParallelWorkers'
> 
>     Assert(pcxt->nworkers >= nworkers_to_launch);
> 
> The one existing caller manages not to trigger that because it's
> careful to reduce its request based on pcxt->nworkers, but it
> doesn't seem to me that callers should be expected to have to.
> 
> So I end with the attached.  There might still be some more issues
> that the regression tests don't reach, but I think this is the
> best we can do for today.

Looks good.