On 2015-06-10 09:18:56 -0400, Jan Wieck wrote:
> On a machine with 8 sockets, 64 cores, Hyperthreaded 128 threads total, a
> pgbench -S peaks with 50-60 clients around 85,000 TPS. The throughput then
> takes a very sharp dive and reaches around 20,000 TPS at 120 clients. It
> never recovers from there.
85k? Phew, that's pretty bad. What exact type of CPU is this? Which
pgbench scale? Did you use -M prepared?
Could you share a call graph perf profile?
> The attached patch demonstrates that less aggressive spinning and
> (much) more often delaying improves the performance "on this type of
> machine". The 8 socket machine in question scales to over 350,000 TPS.
Even that seems quite low. I've gotten over 500k TPS on a four socket
x86 machine, and about 700k on a 8 socket x86 machine.
Maybe we need to adjust the amount of spinning, but to me such drastic
differences are a hint that we should tackle the actual contention
point. Often a spinlock for something regularly heavily contended can be
worse than a queued lock.
Greetings,
Andres Freund