Re: Spinlocks, yet again: analysis and proposed patches - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Spinlocks, yet again: analysis and proposed patches
Date
Msg-id 26803.1126646303@sss.pgh.pa.us
Whole thread Raw
In response to Re: Spinlocks, yet again: analysis and proposed patches  (Marko Kreen <marko@l-t.ee>)
Responses Re: Spinlocks, yet again: analysis and proposed patches
Re: Spinlocks, yet again: analysis and proposed patches
List pgsql-hackers
Marko Kreen <marko@l-t.ee> writes:
> Hmm.  I guess this could be separated into 2 cases:

> 1. Light load - both lock owner and lock requester wont get
>    scheduled while busy (owner in critical section, waiter
>    spinning.)
> 2. Big load - either or both of them gets scheduled while busy.
>    (waiter is scheduled by OS or voluntarily by eg. calling select())

Don't forget that the coding rules for our spinlocks say that you
mustn't hold any such lock for more than a couple dozen instructions,
and certainly any kernel call while holding the lock is Right Out.
There is *no* case where the holder of a spinlock is going to
voluntarily give up the CPU.  The design intention was that the
odds of losing the CPU while holding a spinlock would be negligibly
small, simply because we don't hold it very long.

> About fast yielding, comment on sys_sched_yield() says:
>  * sys_sched_yield - yield the current processor to other threads.
>  *
>  * this function yields the current CPU by moving the calling thread
>  * to the expired array. If there are no other threads running on this
>  * CPU then this function will return.

Mph.  So that's pretty much exactly what I suspected...


I just had a thought: it seems that the reason we are seeing a
significant issue here is that on SMP machines, the cost of trading
exclusively-owned cache lines back and forth between processors is
so high that the TAS instructions (specifically the xchgb, in the x86
cases) represent a significant fraction of backend execution time all
by themselves.  (We know this is the case due to oprofile results,
see discussions from last April.)  What that means is that there's a
fair chance of a process losing its timeslice immediately after the
xchgb.  Which is precisely the scenario we do not want, if the process
successfully acquired the spinlock by means of the xchgb.

We could ameliorate this if there were a way to acquire ownership of the
cache line without necessarily winning the spinlock.  I'm imagining
that we insert a "dummy" locked instruction just ahead of the xchgb,
which touches the spinlock in such a way as to not change its state.
(xchgb won't do for this, but maybe one of the other lockable
instructions will.)  We do the xchgb just after this one.  The idea is
that if we don't own the cache line, the first instruction causes it to
be faulted into the processor's cache, and if our timeslice expires
while that is happening, we lose the processor without having acquired
the spinlock.  This assumes that once we've got the cache line, the
xchgb that actually does the work can get executed with not much
extra time spent and only low probability of someone else stealing the
cache line back first.

The fact that cmpb isn't helping proves that getting the cache line in a
read-only fashion does *not* do enough to protect the xchgb in this way.
But maybe another locking instruction would.  Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Marko Kreen
Date:
Subject: Re: Spinlocks, yet again: analysis and proposed patches
Next
From: Tom Lane
Date:
Subject: Re: Spinlocks, yet again: analysis and proposed patches