Thread: Spinlock can be released twice in procsignal.c
Hi, it seems we can release spinlock twice in /src/backend/storage/ipc/procsignal.c file, method ProcSignalInit.
void
ProcSignalInit(bool cancel_key_valid, int32 cancel_key)
{
ProcSignalSlot *slot;
uint64 barrier_generation;
..............................................................................
slot = &ProcSignal->psh_slot[MyProcNumber];
/* sanity check */
SpinLockAcquire(&slot->pss_mutex);
if (pg_atomic_read_u32(&slot->pss_pid) != 0)
{
SpinLockRelease(&slot->pss_mutex);
elog(LOG, "process %d taking over ProcSignal slot %d, but it's not empty",
MyProcPid, MyProcNumber);
}
/* Clear out any leftover signal reasons */
MemSet(slot->pss_signalFlags, 0, NUM_PROCSIGNALS * sizeof(sig_atomic_t));
......................
slot->pss_cancel_key_valid = cancel_key_valid;
slot->pss_cancel_key = cancel_key;
pg_atomic_write_u32(&slot->pss_pid, MyProcPid);
SpinLockRelease(&slot->pss_mutex);
First in the if clause, second near the end of function. Such behavior can lead to unpredictable concurrent issues.
In applied patch I removed spinlock release in if clause.
Attachment
> On 26 Feb 2025, at 00:34, Maksim.Melnikov <m.melnikov@postgrespro.ru> wrote: > > In applied patch I removed spinlock release in if clause. Looks like the oversight in 9d9b9d4. IMO the fix is correct. Best regards, Andrey Borodin.
On Wed, Feb 26, 2025 at 09:08:53AM +0500, Andrey Borodin wrote: > Looks like the oversight in 9d9b9d4. IMO the fix is correct. if (pg_atomic_read_u32(&slot->pss_pid) != 0) { - SpinLockRelease(&slot->pss_mutex); elog(LOG, "process %d taking over ProcSignal slot %d, but it's not empty", MyProcPid, MyProcNumber); } This fix is not correct. No system function calls (well basically most of them) or even more no PostgreSQL-specific calls should happen while holding a spinlock. elog() is a good example of what not to do. One example: imagine a palloc failure while holding this spinlock in this elog(). The code should be restructured so as we read pss_pid while holding the spinlock, release the spinlock, and then LOG. Based on the structure of this routine, you could just assign a boolean to decide if something should be logged and delay the LOG until the spinlock is released, because we don't intend an early exit like in CleanupProcSignalState(). In ~16, ProcSignalInit() is able to LOG things early, but the ProcSignal is forced even if pss_pid is set, so delaying the LOG to be generated after updating ProcSignal does not matter. -- Michael
Attachment
This fix is not correct. No system function calls (well basically
most of them) or even more no PostgreSQL-specific calls should happen
while holding a spinlock. elog() is a good example of what not to do.
One example: imagine a palloc failure while holding this spinlock in
this elog().
--
Attachment
Michael, thanks a lot for the detailed explanation. > On 26 Feb 2025, at 11:57, Maxim Orlov <orlovmg@gmail.com> wrote: > > Indeed. PFA the correct one. > > <v2-0001-Avoid-double-spinlock-release.patch> I’d suggest to give some more descriptive name to “logit” and expand comment “/* sanity check */“. This comment was easierto understand when elog() was near, but now IMO we can have few words about what is going on. Best regards, Andrey Borodin.
I’d suggest to give some more descriptive name to “logit” and expand comment “/* sanity check */“. This comment was easier to understand when elog() was near, but now IMO we can have few words about what is going on.