On 2015-06-10 13:19:14 -0400, Robert Haas wrote:
> On Wed, Jun 10, 2015 at 11:58 AM, Andres Freund <andres@anarazel.de> wrote:
> > I think we should just gank spinlocks asap. The hard part is removing
> > them from lwlock.c's slow path and the buffer headers imo. After that we
> > should imo be fine replacing them with lwlocks.
>
> Mmmph. I'm not convinced there's any point in replacing
> lightly-contended spinlocks with slower, more-complex lwlocks. But
> maybe it's best to stay away from that argument and focus on getting
> rid of the spinlocks that are hosing us right now.
In the uncontended case lwlocks are just as fast as spinlocks now, with
the exception of the local tracking array. They're faster if there's
differences with read/write lockers.
> I'm not altogether convinced that a simple replacement of spinlocks
> with atomics is going to solve this problem. If it does, great. But
> that doesn't eliminate spinning; it just moves it from the spinlock
> loop to a CAS loop.
Well, not necessarily. If you can write your algorithm in a way that
xadd etc are used, instead of a lock cmpxchg, you're actually never
spinning on x86 as it's guaranteed to succeed. I think that's possible
for most of the places that currently lock buffer headers.
(I had a version of the lwlock changes that used xadd for shared lock
acquisition - but the code needed to back out in error cases made things
more complicated, and the benefit on a four socket machine wasn't that
large)
> This is clearly better: when the CAS works,
> you're done, as opposed to having to then manipulate the cache line
> further and release the spinlock, during any of which the cache line
> may get stolen from you.
It's not just that, it's also that there's no chance of sleeping while
holding a lock. Which doesn't happen that infrequently in contended, cpu
bound workloads.
> However, I'm worried that it will still be possible to create the same
> kinds of performance collapses that we see with spinlocks with the CAS
> loops with which we place them - or even worse problems, because
> clearly the spin_delay stuff *works*, and we're unlikely to
> incorporate similarly sophisticated logic into every CAS loop.
I doubt it's really neccessary, but It shouldn't be too hard to
generalize the sleeping logic so it could be used with little code in
the individual callsites.
Greetings,
Andres Freund