On Wed, Feb 26, 2025 at 09:08:53AM +0500, Andrey Borodin wrote:
> Looks like the oversight in 9d9b9d4. IMO the fix is correct.
if (pg_atomic_read_u32(&slot->pss_pid) != 0)
{
- SpinLockRelease(&slot->pss_mutex);
elog(LOG, "process %d taking over ProcSignal slot %d, but it's not empty",
MyProcPid, MyProcNumber);
}
This fix is not correct. No system function calls (well basically
most of them) or even more no PostgreSQL-specific calls should happen
while holding a spinlock. elog() is a good example of what not to do.
One example: imagine a palloc failure while holding this spinlock in
this elog().
The code should be restructured so as we read pss_pid while holding
the spinlock, release the spinlock, and then LOG. Based on the
structure of this routine, you could just assign a boolean to decide
if something should be logged and delay the LOG until the spinlock is
released, because we don't intend an early exit like in
CleanupProcSignalState(). In ~16, ProcSignalInit() is able to LOG
things early, but the ProcSignal is forced even if pss_pid is set, so
delaying the LOG to be generated after updating ProcSignal does not
matter.
--
Michael