Re: s_lock() seems too aggressive for machines with many sockets - Mailing list pgsql-hackers

From Andres Freund
Subject Re: s_lock() seems too aggressive for machines with many sockets
Date
Msg-id 20150610140521.GC5067@alap3.anarazel.de
Whole thread Raw
In response to Re: s_lock() seems too aggressive for machines with many sockets  (Jan Wieck <jan@wi3ck.info>)
Responses Re: s_lock() seems too aggressive for machines with many sockets
List pgsql-hackers
Hi,

On 2015-06-10 09:54:00 -0400, Jan Wieck wrote:
> model name      : Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz

> numactl --hardware shows the distance to the attached memory as 10, the
> distance to every other node as 21. I interpret that as the machine having
> one NUMA bus with all cpu packages attached to that, rather than individual
> connections from cpu to cpu or something different.

Generally that doesn't say very much - IIRC the distances are defined by
the bios.

> What led me into that spinlock area was the fact that a wall clock based
> systemtap FlameGraph showed a large portion of the time spent in
> BufferPin() and BufferUnpin().

I've seen that as a bottleneck in the past as well. My plan to fix that
is to "simply" make buffer pinning lockless for the majority of cases. I
don't have access to hardware to test that at higher node counts atm
though.

My guess is that the pins are on the btree root pages. But it'd be good
to confirm that.

> >Maybe we need to adjust the amount of spinning, but to me such drastic
> >differences are a hint that we should tackle the actual contention
> >point. Often a spinlock for something regularly heavily contended can be
> >worse than a queued lock.
> 
> I have the impression that the code assumes that there is little penalty for
> accessing the shared byte in a tight loop from any number of cores in
> parallel. That apparently is true for some architectures and core counts,
> but no longer holds for these machines with many sockets.

It's just generally a tradeoff. It's beneficial to spin longer if
there's only mild amounts of contention. If the likelihood of getting
the spinlock soon is high (i.e. existing, but low contention), it'll
nearly always be beneficial to spin. If the likelihood is low, it'll be
mostly beneficial to sleep.  The latter is especially true if a machine
is sufficiently overcommitted that it's likely that it'll sleep while
holding a spinlock. The danger of sleeping while holding a spinlock,
without targeted wakeup, is why spinlocks in userspace aren't a really
good idea.

My bet is that if you'd measure using different number iterations for
different spinlocks you'd find some where the higher number of
iterations is rather beneficial as well.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: s_lock() seems too aggressive for machines with many sockets
Next
From: Nils Goroll
Date:
Subject: Re: s_lock() seems too aggressive for machines with many sockets