Re: LWLock deadlock and gdb advice - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: LWLock deadlock and gdb advice
Date
Msg-id 55BB50D3.9000702@iki.fi
Whole thread Raw
In response to Re: LWLock deadlock and gdb advice  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 07/30/2015 09:14 PM, Andres Freund wrote:
> On 2015-07-30 17:36:52 +0300, Heikki Linnakangas wrote:
>> In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held,
>> until it has updated the variable. And LWLockWaitForVar() holds the spinlock
>> when it checks that the lock is held and that the variable's value matches.
>> So it cannot happen on 9.4.
>
> The first paragraph talked about "the same value", but that was just
> referring to it not yet having been cleared i think...
>
>> To reiterate, with 9.5, it's possible that a backend is sleeping in
>> LWLockWaitForVar(oldvar=123), even though the lock is currently held by
>> another backend with value 124. That seems wrong, or surprising at the very
>> least.
>
> With my patch that can't really happen that way though? The value is
> re-checked after queuing. If it has changed by then we're done. And if
> it hasn't yet changed we're guaranteed to be woken up once it's being
> changed?

Ok, let me try explaining it again. With your patch, 
LWLockAcquireWithVar looks like this (simplified, assuming the lock is 
free):

1. Set LW_VAL_EXCLUSIVE
3. (Acquire spinlock) Set *valptr = val (Release spinlock)

LWLockWaitForVar looks like this:

1. Check if lock is free
2. (Acquire Spinlock) Read *valptr, compare with oldval (Release spinlock)
3. LWLockQueueSelf
4. Re-check that *valptr is still equal to oldval
5. sleep.

This is the race condition:

Backend A            Backend B
---------            ---------
LWLockAcquireWithVar(123)  Set LW_VAL_EXCLUSIVE  Set *valptr = 123            LWLockWaitForVar(123)              See
thatthe lock is held              Read *valptr, it matches              LWLockQueueSelf
 
LWLockRelease()
LWLockAcquireWithVar(124)  Set LW_VAL_EXCLUSIVE              wakes up              See that the lock is still/again
held             Read *valptr, it's still 123              LWLockQueueSelf              Re-check that *valptr is still
123.             go to sleep.  Set *valptr = 124
 


Now, Backend B's LWLockWaitForVar(123) call is sleeping, even though the 
lock was free'd and reacquired with different value, 124. It won't wake 
up until A updates the value or releases the lock again.

This was not possible in 9.4. It was possible in 9.4 too when the value 
was same in both LWLockAcquireWithVar() calls, and I think that's 
acceptable, but not when the values differ.

> I generaly don't mind adding some sort of flag clearing or such, but I'd
> rather not have it in the retry loop in the general LWLockAttemptLock
> path - I found that very small amounts of instructions in there have a
> measurable impact.

I doubt clearing a bit would matter there, although you might have a 
better instinct on that...

- Heikki




pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Transactions involving multiple postgres foreign servers
Next
From: Jeremy Harris
Date:
Subject: Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"