On 11/18/25 15:06, David Geier wrote:
> Hi Tomas!
>
> On 15.11.2025 00:00, Tomas Vondra wrote:
>> On 11/14/25 19:20, David Geier wrote:
>>>
>>> Ooops. That can likely be fixed.
>>>
>
> I'll take a look at why this happens the next days, if you think this
> approach generally has a chance to be accepted. See below.
>
>>>> And I very much doubt inventing a new ad hoc way to signal workers is
>>>> the right solution (even if there wasn't the InstrEndLoop issue).
>>>>
>>
>> Good point, I completely forgot about (2).
>>
>
> In that light, could you take another look at my patch?
>
> Some clarifications: I'm not inventing a new way to signal workers but
> I'm using the existing SendProcSignal() machinery to inform parallel
> workers to stop. I just added another signal PROCSIG_PARALLEL_STOP and
> the corresponding functions to handle it from ProcessInterrupts().
>
Sure, but I still don't quite see the need to do all this.
> What is "new" is how I'm stopping the parallel workers once they've
> received the stop signal: the challenge is that the workers need to
> actually jump out of whatever they are doing - even if they aren't
> producing any rows at this point; but e.g. are scanning a table
> somewhere deep down in ExecScan() / SeqNext().
>
> The only way I can see to make this work, without a huge patch that adds
> new code all over the place, is to instruct process termination from
> inside ProcessInterrupts(). I'm siglongjmp-ing out of the ExecutorRun()
> function so that all parallel worker cleanup code still runs as if the
> worker processed to completion. I've tried to end the process without
> but that caused all sorts of fallout (instrumentation not collected,
> postmaster thinking the process stopped unexpectedly, ...).
>
> Instead of siglongjmp-ing we could maybe call some parallel worker
> shutdown function but that would require access to the parallel worker
> state variables, which are currently not globally accessible.
>
But why? The leader and workers already share state - the parallel scan
state (for the parallel-aware scan on the "driving" table). Why couldn't
the leader set a flag in the scan, and force it to end in workers? Which
AFAICS should lead to workers terminating shortly after that.
All the code / handling is already in place. It will need a bit of new
code in the parallel scans, but but not much I think.
regards
--
Tomas Vondra