On 13.05.2013 17:21, Merlin Moncure wrote:
> On Mon, May 13, 2013 at 7:50 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> The attached patch is still work-in-progress. There needs to be a configure
>> test and fallback to spinlock if a CAS instruction is not available. I used
>> the gcc __sync_val_compare_and_swap() builtin directly, that needs to be
>> abstracted. Also, in the case that the wait queue needs to be manipulated,
>> the code spins on the CAS instruction, but there is no delay mechanism like
>> there is on a regular spinlock; that needs to be added in somehow.
>
> These are really interesting results. Why is the CAS method so much
> faster then TAS?
If my theory is right, the steep fall is caused by backends being
sometimes context switched while holding the spinlock. The CAS method
alleviates that, because the lock is never "held" by anyone. With a
spinlock, if a process gets context switched while holding the lock,
everyone else will have to wait until the process gets context switched
back, and releases the spinlock. With the CAS method, if a process gets
switched in the CAS-loop, it doesn't hurt others.
With the patch, the CAS-loop still spins, just like a spinlock, when the
wait queue has to be manipulated. So when that happens, it will behave
more or less like the spinlock implementation. I'm not too concerned
about that optimizing that further; sleeping and signaling sleeping
backends are heavy-weight operations anyway.
> Did you see any contention?
With the current spinlock implementation? Yeah, per the profile I
posted, a lot of CPU time was spent spinning, which is a sign of
contention. I didn't see contention with the CAS implemention.
Attached is a new version of the patch. I cleaned it up a lot, and added
a configure test, and fallback to the good old spinlock implementation
if compare-and-swap is not available. Also, if the CAS loops needs to
spin, like a spinlock, it now sleeps after a while and updates the
"spins_per_delay" like regular spinlock.
- Heikki