Thread: Detect double-release of spinlock
Hi, On 2024-07-29 09:40:26 -0700, Andres Freund wrote: > On 2024-07-29 12:33:13 -0400, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > On 2024-07-29 11:31:56 -0400, Tom Lane wrote: > > >> There was some recent discussion about getting rid of > > >> --disable-spinlocks on the grounds that nobody would use > > >> hardware that lacked native spinlocks. But now I wonder > > >> if there is a testing/debugging reason to keep it. > > > > > Seems it'd be a lot more straightforward to just add an assertion to the > > > x86-64 spinlock implementation verifying that the spinlock isn't already free? > > FWIW, I quickly hacked that up, and it indeed quickly fails with 0393f542d72^ > and passes with 0393f542d72. Thought it'd be valuable to post a patch to go along with this, to -hackers. The thread started at [1] Other context from this discussion: > > I dunno, is that the only extra check that the --disable-spinlocks > > implementation is providing? > > I think it also provides the (valuable!) check that spinlocks were actually > initialized. But that again seems like something we'd be better off adding > more general infrastructure for - nobody runs --disable-spinlocks locally, we > shouldn't need to run this on the buildfarm to find problems like this. > > > > I'm kind of allergic to putting Asserts into spinlocked code segments, > > mostly on the grounds that it violates the straight-line-code precept. > > I suppose it's not really that bad for tests that you don't expect > > to fail, but still ... > > I don't think the spinlock implementation itself is really affected by that > rule - after all, the --disable-spinlocks implementation actually consists out > of several layers of external function calls (including syscalls in some > cases!). Greetings, Andres Freund [1] https://www.postgresql.org/message-id/E1sYSF2-001lEB-D1%40gemulon.postgresql.org
Attachment
On 29/07/2024 19:51, Andres Freund wrote: > On 2024-07-29 09:40:26 -0700, Andres Freund wrote: >> On 2024-07-29 12:33:13 -0400, Tom Lane wrote: >>> Andres Freund <andres@anarazel.de> writes: >>>> On 2024-07-29 11:31:56 -0400, Tom Lane wrote: >>>>> There was some recent discussion about getting rid of >>>>> --disable-spinlocks on the grounds that nobody would use >>>>> hardware that lacked native spinlocks. But now I wonder >>>>> if there is a testing/debugging reason to keep it. >>> >>>> Seems it'd be a lot more straightforward to just add an assertion to the >>>> x86-64 spinlock implementation verifying that the spinlock isn't already free? >> >> FWIW, I quickly hacked that up, and it indeed quickly fails with 0393f542d72^ >> and passes with 0393f542d72. +1. Thanks! >>> Other context from this discussion: >>> I dunno, is that the only extra check that the --disable-spinlocks >>> implementation is providing? >> >> I think it also provides the (valuable!) check that spinlocks were actually >> initialized. But that again seems like something we'd be better off adding >> more general infrastructure for - nobody runs --disable-spinlocks locally, we >> shouldn't need to run this on the buildfarm to find problems like this. Note that the "check" for double-release with the fallback implementation wasn't an explicit check either. It just incremented the underlying semaphore, which caused very weird failures later in completely unrelated code. An explicit assert would be much nicer. +1 for removing --disable-spinlocks, but let's add this assertion first. >>> I'm kind of allergic to putting Asserts into spinlocked code segments, >>> mostly on the grounds that it violates the straight-line-code precept. >>> I suppose it's not really that bad for tests that you don't expect >>> to fail, but still ... >> >> I don't think the spinlock implementation itself is really affected by that >> rule - after all, the --disable-spinlocks implementation actually consists out >> of several layers of external function calls (including syscalls in some >> cases!). Yeah I'm not worried about that at all. Also, the assert is made when you have already released the spinlock; you are already out of the critical section. -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > Yeah I'm not worried about that at all. Also, the assert is made when > you have already released the spinlock; you are already out of the > critical section. Not in the patch Andres posted. regards, tom lane
Hi, On 2024-07-29 13:25:22 -0400, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: > > Yeah I'm not worried about that at all. Also, the assert is made when > > you have already released the spinlock; you are already out of the > > critical section. > > Not in the patch Andres posted. Which seems fairly fundamental - once outside of the critical section, we can't actually assert that the lock isn't acquired, somebody else *validly* might have acquired it by then. However, I still don't think it's a problem to assert that the lock is held in in the unlock "routine". As mentioned before, the spinlock implementation itself has never followed the "just straight line code" rule that users of spinlocks are supposed to follow. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > However, I still don't think it's a problem to assert that the lock is held in > in the unlock "routine". As mentioned before, the spinlock implementation > itself has never followed the "just straight line code" rule that users of > spinlocks are supposed to follow. Yeah, that's fair. regards, tom lane
On 29/07/2024 20:48, Andres Freund wrote: > On 2024-07-29 13:25:22 -0400, Tom Lane wrote: >> Heikki Linnakangas <hlinnaka@iki.fi> writes: >>> Yeah I'm not worried about that at all. Also, the assert is made when >>> you have already released the spinlock; you are already out of the >>> critical section. >> >> Not in the patch Andres posted. > > Which seems fairly fundamental - once outside of the critical section, we > can't actually assert that the lock isn't acquired, somebody else *validly* > might have acquired it by then. You could do: bool was_free = S_LOCK_FREE(lock); S_UNLOCK(lock); Assert(!was_free); Depending on the underlying implementation, you could also use compare-and-exchange. That makes the assertion-enabled instructions a little different than without assertions though. > However, I still don't think it's a problem to assert that the lock is held in > in the unlock "routine". As mentioned before, the spinlock implementation > itself has never followed the "just straight line code" rule that users of > spinlocks are supposed to follow. Agreed. -- Heikki Linnakangas Neon (https://neon.tech)
Hi, On 2024-07-29 21:00:35 +0300, Heikki Linnakangas wrote: > On 29/07/2024 20:48, Andres Freund wrote: > > On 2024-07-29 13:25:22 -0400, Tom Lane wrote: > > > Heikki Linnakangas <hlinnaka@iki.fi> writes: > > > > Yeah I'm not worried about that at all. Also, the assert is made when > > > > you have already released the spinlock; you are already out of the > > > > critical section. > > > > > > Not in the patch Andres posted. > > > > Which seems fairly fundamental - once outside of the critical section, we > > can't actually assert that the lock isn't acquired, somebody else *validly* > > might have acquired it by then. > > You could do: > > bool was_free = S_LOCK_FREE(lock); > > S_UNLOCK(lock); > Assert(!was_free); I don't really see the point - we're about to crash with an assertion failure, why would we want to do that outside of the critical section? If anything that will make it harder to debug the issue in a core dump, because other backends might "destroy evidence" due to being able to acquire the spinlock. > Depending on the underlying implementation, you could also use > compare-and-exchange. That'd scale a lot worse, at least on x86-64, as it requires the unlock to be an atomic op, whereas today it's a simple store (+ compiler barrier). I've experimented with replacing all spinlocks with lwlocks, and the fact that you need an atomic op for an rwlock release is one of the two major reasons they have a higher overhead (the remainder is boring stuff like the overhead of external function calls and ownership management). Greetings, Andres Freund
Hi, Partially replying here to an email on -committers [1]. On 2024-07-29 13:57:02 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > However, I still don't think it's a problem to assert that the lock is held in > > in the unlock "routine". As mentioned before, the spinlock implementation > > itself has never followed the "just straight line code" rule that users of > > spinlocks are supposed to follow. > > Yeah, that's fair. Cool. On 2024-07-29 13:56:05 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > b) Instead define the spinlock to have 1 as the unlocked state and 0 as the > > locked state. That makes it a bit harder to understand that initialization > > is missing, compared to a dedicated state, as the first use of the spinlock > > just blocks. > > This option works for me. > > To make 1) b) easier to understand it might be worth changing the slock_t > > typedef to be something like > > > typedef struct slock_t > > { > > char is_free; > > } slock_t; > > +1 Cool. I've attached a prototype. I just realized there's a nice little advantage to the "inverted" representation - it detects missing initialization even in optimized builds. > How much of this would we change across platforms, and how much > would be x86-only? I think there are enough people developing on > ARM (e.g. Mac) now to make it worth covering that, but maybe we > don't care so much about anything else. Not sure. Right now I've only hacked up x86-64 (not even touching i386), but it shouldn't be hard to change at least some additional platforms. My current prototype requires S_UNLOCK, S_LOCK_FREE, S_INIT_LOCK to be implemented for x86-64 instead of using the "generic" implementation. That'd be mildly annoying duplication if we did so for a few more platforms. It'd be more palatable to just change all platforms if we made more of them use __sync_lock_test_and_set (or some other intrinsic(s))... Greetings, Andres Freund [1] https://postgr.es/m/2812376.1722275765%40sss.pgh.pa.us