Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start - Mailing list pgsql-bugs

From Noah Misch
Subject Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
Date
Msg-id 20241108233649.01.nmisch@google.com
Whole thread Raw
In response to Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Fri, Nov 08, 2024 at 12:56:55PM -0500, Tom Lane wrote:
> I wrote:
> > Here's a proposed patch along that line.  I left the test case from
> > ac04aa84a alone, since it works perfectly well to test this way too.
> 
> I'd modeled that on the existing recovery code for DSM segment creation
> failure, just below.  But a look at the code coverage report shows
> (unsurprisingly) that that path is never exercised in our regression
> tests, so I wondered if it actually works ... and it doesn't work
> very well.  To test, I lobotomized InitializeParallelDSM to always
> force pcxt->nworkers = 0.  That results in a bunch of unsurprising
> regression test diffs, plus a couple of
> 
> +ERROR:  could not find key 4 in shm TOC at 0x229f138
> 
> which turns out to be the fault of ExecHashJoinReInitializeDSM:
> it's not accounting for the possibility that we didn't really
> start a parallel hash join.
> 
> I'm also not happy about ReinitializeParallelWorkers'
> 
>     Assert(pcxt->nworkers >= nworkers_to_launch);
> 
> The one existing caller manages not to trigger that because it's
> careful to reduce its request based on pcxt->nworkers, but it
> doesn't seem to me that callers should be expected to have to.
> 
> So I end with the attached.  There might still be some more issues
> that the regression tests don't reach, but I think this is the
> best we can do for today.

Looks good.



pgsql-bugs by date:

Previous
From: Jeff Davis
Date:
Subject: Re: HashAgg degenerate case
Next
From: Tomas Vondra
Date:
Subject: Re: Segmentation fault - PostgreSQL 17.0