Re: s_lock() seems too aggressive for machines with many sockets - Mailing list pgsql-hackers

From Andres Freund
Subject Re: s_lock() seems too aggressive for machines with many sockets
Date
Msg-id 20150610153444.GF10551@awork2.anarazel.de
Whole thread Raw
In response to Re: s_lock() seems too aggressive for machines with many sockets  (Jan Wieck <jan@wi3ck.info>)
Responses Re: s_lock() seems too aggressive for machines with many sockets  (Jan Wieck <jan@wi3ck.info>)
List pgsql-hackers
On 2015-06-10 11:12:46 -0400, Jan Wieck wrote:
> The test case is that 200 threads are running in a tight loop like this:
> 
> for (...)
> {
>     s_lock();
>     // do something with a global variable
>     s_unlock();
> }
> 
> That is the most contended case I can think of, yet the short and
> predictable code while holding the lock is the intended use case for a
> spinlock.

But lots of our code isn't that predictable. Check e.g. the buffer
spinlock case:

static void
UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
{PrivateRefCountEntry *ref;
/* not moving as we're likely deleting it soon anyway */ref = GetPrivateRefCountEntry(buf->buf_id + 1,
false);Assert(ref!= NULL);
 
if (fixOwner)    ResourceOwnerForgetBuffer(CurrentResourceOwner,
BufferDescriptorGetBuffer(buf));
Assert(ref->refcount > 0);ref->refcount--;if (ref->refcount == 0){    /* I'd better not still hold any locks on the
buffer*/    Assert(!LWLockHeldByMe(buf->content_lock));    Assert(!LWLockHeldByMe(buf->io_in_progress_lock));
 
    LockBufHdr(buf);
    /* Decrement the shared reference count */    Assert(buf->refcount > 0);    buf->refcount--;
    /* Support LockBufferForCleanup() */    if ((buf->flags & BM_PIN_COUNT_WAITER) &&        buf->refcount == 1)    {
    /* we just released the last pin other than the waiter's */        int            wait_backend_pid =
buf->wait_backend_pid;
        buf->flags &= ~BM_PIN_COUNT_WAITER;        UnlockBufHdr(buf);        ProcSendSignal(wait_backend_pid);    }
else       UnlockBufHdr(buf);
 
    ForgetPrivateRefCountEntry(ref);}
}


If you check the section where the spinlock is held there's nontrivial
code executed. Under contention you'll have the problem that if backend
tries to acquire the the spinlock while another backend holds the lock,
it'll "steal" the cacheline on which the lock and the rest of the buffer
descriptor resides. Which means that operations like buf->refcount--,the
if (), and the &= will now have to wait till the cacheline is
transferred back (as the cacheline will directly be marked as modified
on the attempted locker). All the while other backends will also try to
acquire the pin, causing the same kind of trouble.

If this'd instead be written as

ret = pg_atomic_fetch_sub_u32(&buf->state, 1);

if (ret & BM_PIN_COUNT_WAITER)
{  pg_atomic_fetch_sub_u32(&buf->state, BM_PIN_COUNT_WAITER);  /* XXX: deal with race that another backend has set
BM_PIN_COUNT_WAITER*/
 
}

the whole thing looks entirely different. While there's obviously still
cacheline bouncing, every operation is guaranteed to make progress (on
x86 at least, but even elsewhere we'll never block another locker).

Now unlock is the simpler case, but...


- Andres




pgsql-hackers by date:

Previous
From: Nils Goroll
Date:
Subject: Re: s_lock() seems too aggressive for machines with many sockets
Next
From: Andres Freund
Date:
Subject: Re: replication slot restart_lsn initialization