Hi,
On 2022-05-25 14:47:41 +1200, Thomas Munro wrote:
> My question is: do we really need to suppress these non-ereporting
> interrupts in all the places we currently do HOLD_INTERRUPTS()?
Most of those should be fairly short / only block on lwlocks, small amounts of
IO. I'm not sure how much of an issue this is. Are there actually CFIs inside
those HOLD_INTERRUPT sections?
> The reason I'm wondering about this is because the new ProcSignalBarrier
> mechanism has to wait for any HOLD_INTERRUPTS() sections across all backends
> to complete, and that possibly includes long cleanup loops that perform disk
> I/O. While some future ProcSignalBarrier handler might indeed not be safe
> during eg cleanup (perhaps because it can ereport(ERROR)), it is free to
> return false to defer itself until the next CFI.
>
> Concretely, for example, where xact.c holds interrupts:
>
> /* Prevent cancel/die interrupt while cleaning up */
> HOLD_INTERRUPTS();
>
> ... or where dsm_detach does something similar, there is probably no
> reason we should have to delay a ProcSignalBarrier just to accomplish
> what the comment says. Presumably it really just wants to make sure
> it doesn't lose control of the program counter via non-local return.
I don't think that's quite it. There are elog(ERROR) reachable from within
HOLD_INTERRUPTS() sections (it's not a critical section after all). I think
it's more that there's no point in reacting to interrupts in those spots,
because e.g. processing ProcDiePending requires aborting the currently active
transaction.
Greetings,
Andres Freund