Re: spinlock contention - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: spinlock contention
Date
Msg-id 98ACBA06-8F4D-43BE-A653-9907CD836EC4@phlo.org
Whole thread Raw
In response to Re: spinlock contention  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: spinlock contention
List pgsql-hackers
On Jul7, 2011, at 03:35 , Robert Haas wrote:
> Some poking around suggests that the problem isn't that
> spinlocks are routinely contended - it seems that we nearly always get
> the spinlock right off the bat.  I'm wondering if the problem may be
> not so much that we have continuous spinlock contention, but rather
> than every once in a while a process gets time-sliced out while it
> holds a spinlock.  If it's an important spinlock (like the one
> protecting SInvalReadLock), the system will quickly evolve into a
> state where every single processor is doing nothing but trying to get
> that spinlock.  Even after the hapless lock-holder gets to run again
> and lets go of the lock, you have a whole pile of other backends who
> are sitting there firing of lock xchgb in a tight loop, and they can
> only get it one at a time, so you have ferocious cache line contention
> until the backlog clears.

Pondering this some more, I came up with another idea, pretty much
orthogonal to the shared counter partitioning I posted a patch for.

If indeed that problem isn't spin lock contention, but rather losing
control of the CPU while holding a spin lock, then why not try to get
rid of the spin lock entirely? On Linux, that's easy - this is exactly
what futexes are about. But it occurred to me that kernel support isn't
actually needed to do that - futexes can effectively be emulated in
user-space using just a semaphore, and without doing a syscall except
when there's contention.

The algorithm is quite straight forward, if one assumes a lock-free
implementation of a queue (More on that below)
 LockAcquire:   (1) CAS the lock state to increment the reader count or set       the exclusive bit in a loop while the
locklooks free.       If successful, we're done, otherwise we continue with (2)   (2) Add ourself to the wait queue   [
Sincewe've added ourself to the queue, we *must* now     decrement the semaphore no matter what, to keep the
increment/decrementcalls balanced. We're careful to     maintain that invariant below. ]   (3) Fence (AKA issue full
memorybarrier)   (4) Re-check if the lock still looks taken. If it does,       we decrement the semaphore
(PGSemaphoreLock),and       (upon wake-up) restart at (1).       Otherwise, continue with (5)   (5) Remove the first
waiterfrom the queue and increment       her semaphore. Rinse-and-repeat until we either removed       ourself from the
queueor the queue is empty.   (6) Decrement our semaphore.   [ (6) is necessary in the general case, see the remark
below(2). But we can of course detect the case were     we'd increment our own semaphore in (5) only to     decrement
itagain in (6), and skip both operations ]
 
 LockRelease:   (1) Set the lock state to 0, i.e. release the lock.   (2) Fence (AKA issue full memory barrier)   (3)
Ifthe lock still looks free, remove the first waiter from       the queue and increment her semaphore. Rinse-and-repeat
     while the lock looks free and the queue is non-empty.   [ From a correctness POV, we only have to wake up one
waiterhere, and that only if there isn't one who was been     woken up but hasn't yet retried to take the lock. In
reality,    the decision if and how many waiter to wake up would depend     on their desired lock level, and some
variantof what we     currently call releaseOK. ]
 

The fencing steps (LockAcquire:3) and (LockRelease:2) guarantee
that if we block in LockAcquire() a lock holder will see our
queue entry and thus will wake us up eventually.

Because we use a semaphore and not, say, a simple signal, we don't
have to worry about the precise ordering of block and unblock
operations - we just need to ensure they're balanced.

Now to to enqueue() and dequeue() primitives that the algorithm
above depends on. There are multiple published algorithms
for lock-free queues. Some googling turned up
 "An optimistic approach to lock-free FIFO queues",  E.L. Mozes, N. Shavit, DOI 10.1007/s00446-007-0050-0

and  "CAS-Based Lock-Free Algorithm for Shared Deques", M.M. Michael

The second one looks promising, since it only requires a single
CAS to enqueue and dequeue entries in the common case. Thus, it
should be just as efficient as our current spin-lock-based queue
in the uncontended case (and much better in the contested case, of
course).

[ Our case is, in fact, simpler than the generic setting that these
algorithms support. We only ever enqueue our *own* proc array entry
(so allocation and re-use of queue entries isn't much of an issue),
and always *process* the queue after enqueuing something - either
directly in LockAcquire or later in LockRelease. We thus don't really
need to support concurrent removal of queue entries, but might get
away with simply skipping queue processing if we detect that somebody
else is in the process of doing so. I think I have an idea for a
simpler lock-less queue that fits our needs, but I haven't yet
ironed out all the details (As it turns out, removing the very last
entry from the queue is the hard case). ]

Thoughts, comments and criticism are all very welcome!

best regards,
Florian Pflug





pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: reducing the overhead of frequent table locks, v4
Next
From: Magnus Hagander
Date:
Subject: Re: [COMMITTERS] pgsql: Enable CHECK constraints to be declared NOT VALID