Re: LWLock deadlock and gdb advice - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: LWLock deadlock and gdb advice
Date
Msg-id 55B90C0C.6020104@iki.fi
Whole thread Raw
In response to Re: LWLock deadlock and gdb advice  (Andres Freund <andres@anarazel.de>)
Responses Re: LWLock deadlock and gdb advice  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 07/29/2015 04:10 PM, Andres Freund wrote:
> On 2015-07-29 14:22:23 +0200, Andres Freund wrote:
>> On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote:
>>> Ah, ok, that should work, as long as you also re-check the variable's value
>>> after queueing. Want to write the patch, or should I?
>>
>> I'll try. Shouldn't be too hard.
>
> What do you think about something roughly like the attached?

Hmm. Imagine this:

Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting 
on it. The lock holder releases the lock, and wakes up A. But before A 
wakes up and sees that the lock is free, another backend acquires the 
lock again. It runs LWLockAcquireWithVar to the point just before 
setting the variable's value. Now A wakes up, sees that the lock is 
still (or again) held, and that the variable's value still matches the 
old one, and goes back to sleep. The new lock holder won't wake it up 
until it updates the value again, or to releases the lock.

I think that's OK given the current usage of this in WALInsertLocks, but 
it's scary. At the very least you need comments to explain that race 
condition. You didn't like the new LW_FLAG_VAR_SET flag and the API 
changes I proposed? That eliminates the race condition.

Either way, there is a race condition that if the new lock holder sets 
the variable to the *same* value as before, the old waiter won't 
necessarily wake up even though the lock was free or had a different 
value in between. But that's easier to explain and understand than the 
fact that the value set by LWLockAcquireWithVar() might not be seen by a 
waiter, until you release the lock or update it again.

Another idea is to have LWLockAcquire check the wait queue after setting 
the variable, and wake up anyone who's already queued up. You could just 
call LWLockUpdateVar() from LWLockAcquireCommon to set the variable. I 
would prefer the approach from my previous patch more, though. That 
would avoid having to acquire the spinlock in LWLockAcquire altogether, 
and I actually like the API change from that independently from any 
correctness issues.

The changes in LWLockWaitForVar() and LWLockConflictsWithVar() seem OK 
in principle. You'll want to change LWLockConflictsWithVar() so that the 
spinlock is held only over the "value = *valptr" line, though; the other 
stuff just modifies local variables and don't need to be protected by 
the spinlock. Also, when you enter LWLockWaitForVar(), you're checking 
if the lock is held twice in a row; first at the top of the function, 
and again inside LWLockConflictsWithVar. You could just remove the 
"quick test" at the top.

- Heikki




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Reduce ProcArrayLock contention
Next
From: Robert Haas
Date:
Subject: Re: upgrade failure from 9.5 to head