Thread: Detect double-release of spinlock

Detect double-release of spinlock

From

Andres Freund

Date:

29 July 2024, 16:51:54

Hi,

On 2024-07-29 09:40:26 -0700, Andres Freund wrote:
> On 2024-07-29 12:33:13 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > On 2024-07-29 11:31:56 -0400, Tom Lane wrote:
> > >> There was some recent discussion about getting rid of
> > >> --disable-spinlocks on the grounds that nobody would use
> > >> hardware that lacked native spinlocks.  But now I wonder
> > >> if there is a testing/debugging reason to keep it.
> > 
> > > Seems it'd be a lot more straightforward to just add an assertion to the
> > > x86-64 spinlock implementation verifying that the spinlock isn't already free?
> 
> FWIW, I quickly hacked that up, and it indeed quickly fails with 0393f542d72^
> and passes with 0393f542d72.

Thought it'd be valuable to post a patch to go along with this, to
-hackers. The thread started at [1]

Other context from this discussion:
> > I dunno, is that the only extra check that the --disable-spinlocks
> > implementation is providing?
> 
> I think it also provides the (valuable!) check that spinlocks were actually
> initialized. But that again seems like something we'd be better off adding
> more general infrastructure for - nobody runs --disable-spinlocks locally, we
> shouldn't need to run this on the buildfarm to find problems like this.
> 
> 
> > I'm kind of allergic to putting Asserts into spinlocked code segments,
> > mostly on the grounds that it violates the straight-line-code precept.
> > I suppose it's not really that bad for tests that you don't expect
> > to fail, but still ...
> 
> I don't think the spinlock implementation itself is really affected by that
> rule - after all, the --disable-spinlocks implementation actually consists out
> of several layers of external function calls (including syscalls in some
> cases!).

Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/E1sYSF2-001lEB-D1%40gemulon.postgresql.org

Attachment

v1-0001-Detect-unlocking-an-unlocked-spinlock.patch

Re: Detect double-release of spinlock

From

Heikki Linnakangas

Date:

29 July 2024, 17:07:56

On 29/07/2024 19:51, Andres Freund wrote:
> On 2024-07-29 09:40:26 -0700, Andres Freund wrote:
>> On 2024-07-29 12:33:13 -0400, Tom Lane wrote:
>>> Andres Freund <andres@anarazel.de> writes:
>>>> On 2024-07-29 11:31:56 -0400, Tom Lane wrote:
>>>>> There was some recent discussion about getting rid of
>>>>> --disable-spinlocks on the grounds that nobody would use
>>>>> hardware that lacked native spinlocks.  But now I wonder
>>>>> if there is a testing/debugging reason to keep it.
>>>
>>>> Seems it'd be a lot more straightforward to just add an assertion to the
>>>> x86-64 spinlock implementation verifying that the spinlock isn't already free?
>>
>> FWIW, I quickly hacked that up, and it indeed quickly fails with 0393f542d72^
>> and passes with 0393f542d72.

+1. Thanks!

>>> Other context from this discussion:
>>> I dunno, is that the only extra check that the --disable-spinlocks
>>> implementation is providing?
>>
>> I think it also provides the (valuable!) check that spinlocks were actually
>> initialized. But that again seems like something we'd be better off adding
>> more general infrastructure for - nobody runs --disable-spinlocks locally, we
>> shouldn't need to run this on the buildfarm to find problems like this.

Note that the "check" for double-release with the fallback 
implementation wasn't an explicit check either. It just incremented the 
underlying semaphore, which caused very weird failures later in 
completely unrelated code. An explicit assert would be much nicer.

+1 for removing --disable-spinlocks, but let's add this assertion first.

>>> I'm kind of allergic to putting Asserts into spinlocked code segments,
>>> mostly on the grounds that it violates the straight-line-code precept.
>>> I suppose it's not really that bad for tests that you don't expect
>>> to fail, but still ...
>>
>> I don't think the spinlock implementation itself is really affected by that
>> rule - after all, the --disable-spinlocks implementation actually consists out
>> of several layers of external function calls (including syscalls in some
>> cases!).

Yeah I'm not worried about that at all. Also, the assert is made when 
you have already released the spinlock; you are already out of the 
critical section.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Detect double-release of spinlock

From

Tom Lane

Date:

29 July 2024, 17:25:22

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> Yeah I'm not worried about that at all. Also, the assert is made when 
> you have already released the spinlock; you are already out of the 
> critical section.

Not in the patch Andres posted.

            regards, tom lane

Re: Detect double-release of spinlock

From

Andres Freund

Date:

29 July 2024, 17:48:53

Hi,

On 2024-07-29 13:25:22 -0400, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > Yeah I'm not worried about that at all. Also, the assert is made when 
> > you have already released the spinlock; you are already out of the 
> > critical section.
> 
> Not in the patch Andres posted.

Which seems fairly fundamental - once outside of the critical section, we
can't actually assert that the lock isn't acquired, somebody else *validly*
might have acquired it by then.

However, I still don't think it's a problem to assert that the lock is held in
in the unlock "routine". As mentioned before, the spinlock implementation
itself has never followed the "just straight line code" rule that users of
spinlocks are supposed to follow.

Greetings,

Andres Freund

Re: Detect double-release of spinlock

From

Tom Lane

Date:

29 July 2024, 17:57:02

Andres Freund <andres@anarazel.de> writes:
> However, I still don't think it's a problem to assert that the lock is held in
> in the unlock "routine". As mentioned before, the spinlock implementation
> itself has never followed the "just straight line code" rule that users of
> spinlocks are supposed to follow.

Yeah, that's fair.

            regards, tom lane

Re: Detect double-release of spinlock

From

Heikki Linnakangas

Date:

29 July 2024, 18:00:35

On 29/07/2024 20:48, Andres Freund wrote:
> On 2024-07-29 13:25:22 -0400, Tom Lane wrote:
>> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>>> Yeah I'm not worried about that at all. Also, the assert is made when
>>> you have already released the spinlock; you are already out of the
>>> critical section.
>>
>> Not in the patch Andres posted.
> 
> Which seems fairly fundamental - once outside of the critical section, we
> can't actually assert that the lock isn't acquired, somebody else *validly*
> might have acquired it by then.

You could do:

bool was_free = S_LOCK_FREE(lock);

S_UNLOCK(lock);
Assert(!was_free);

Depending on the underlying implementation, you could also use 
compare-and-exchange. That makes the assertion-enabled instructions a 
little different than without assertions though.

> However, I still don't think it's a problem to assert that the lock is held in
> in the unlock "routine". As mentioned before, the spinlock implementation
> itself has never followed the "just straight line code" rule that users of
> spinlocks are supposed to follow.

Agreed.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Detect double-release of spinlock

From

Andres Freund

Date:

29 July 2024, 18:12:19

Hi,

On 2024-07-29 21:00:35 +0300, Heikki Linnakangas wrote:
> On 29/07/2024 20:48, Andres Freund wrote:
> > On 2024-07-29 13:25:22 -0400, Tom Lane wrote:
> > > Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > > > Yeah I'm not worried about that at all. Also, the assert is made when
> > > > you have already released the spinlock; you are already out of the
> > > > critical section.
> > > 
> > > Not in the patch Andres posted.
> > 
> > Which seems fairly fundamental - once outside of the critical section, we
> > can't actually assert that the lock isn't acquired, somebody else *validly*
> > might have acquired it by then.
> 
> You could do:
> 
> bool was_free = S_LOCK_FREE(lock);
> 
> S_UNLOCK(lock);
> Assert(!was_free);

I don't really see the point - we're about to crash with an assertion failure,
why would we want to do that outside of the critical section? If anything that
will make it harder to debug the issue in a core dump, because other backends
might "destroy evidence" due to being able to acquire the spinlock.

> Depending on the underlying implementation, you could also use
> compare-and-exchange.

That'd scale a lot worse, at least on x86-64, as it requires the unlock to be
an atomic op, whereas today it's a simple store (+ compiler barrier).

I've experimented with replacing all spinlocks with lwlocks, and the fact that
you need an atomic op for an rwlock release is one of the two major reasons
they have a higher overhead (the remainder is boring stuff like the overhead
of external function calls and ownership management).

Greetings,

Andres Freund

Re: Detect double-release of spinlock

From

Andres Freund

Date:

29 July 2024, 18:29:52

Hi,

Partially replying here to an email on -committers [1].

On 2024-07-29 13:57:02 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > However, I still don't think it's a problem to assert that the lock is held in
> > in the unlock "routine". As mentioned before, the spinlock implementation
> > itself has never followed the "just straight line code" rule that users of
> > spinlocks are supposed to follow.
>
> Yeah, that's fair.

Cool.

On 2024-07-29 13:56:05 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > b) Instead define the spinlock to have 1 as the unlocked state and 0 as the
> >    locked state. That makes it a bit harder to understand that initialization
> >    is missing, compared to a dedicated state, as the first use of the spinlock
> >    just blocks.
>
> This option works for me.

> > To make 1) b) easier to understand it might be worth changing the slock_t
> > typedef to be something like
>
> > typedef struct slock_t
> > {
> >         char is_free;
> > } slock_t;
>
> +1

Cool. I've attached a prototype.

I just realized there's a nice little advantage to the "inverted"
representation - it detects missing initialization even in optimized builds.

> How much of this would we change across platforms, and how much
> would be x86-only?  I think there are enough people developing on
> ARM (e.g. Mac) now to make it worth covering that, but maybe we
> don't care so much about anything else.

Not sure. Right now I've only hacked up x86-64 (not even touching i386), but
it shouldn't be hard to change at least some additional platforms.

My current prototype requires S_UNLOCK, S_LOCK_FREE, S_INIT_LOCK to be
implemented for x86-64 instead of using the "generic" implementation. That'd
be mildly annoying duplication if we did so for a few more platforms.

It'd be more palatable to just change all platforms if we made more of them
use __sync_lock_test_and_set (or some other intrinsic(s))...

Greetings,

Andres Freund

[1] https://postgr.es/m/2812376.1722275765%40sss.pgh.pa.us