s_lock() seems too aggressive for machines with many sockets - Mailing list pgsql-hackers

Hi,

I think I may have found one of the problems, PostgreSQL has on machines
with many NUMA nodes. I am not yet sure what exactly happens on the NUMA
bus, but there seems to be a tipping point at which the spinlock
concurrency wreaks havoc and the performance of the database collapses.

On a machine with 8 sockets, 64 cores, Hyperthreaded 128 threads total,
a pgbench -S peaks with 50-60 clients around 85,000 TPS. The throughput
then takes a very sharp dive and reaches around 20,000 TPS at 120
clients. It never recovers from there.

The attached patch demonstrates that less aggressive spinning and (much)
more often delaying improves the performance "on this type of machine".
The 8 socket machine in question scales to over 350,000 TPS.

The patch is meant to demonstrate this effect only. It has a negative
performance impact on smaller machines and client counts < #cores, so
the real solution will probably look much different. But I thought it
would be good to share this and start the discussion about reevaluating
the spinlock code before PGCon.


Regards, Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pg_archivecleanup bug (invalid filename input)
Next
From: Alexander Korotkov
Date:
Subject: Re: Why no jsonb_exists_path()?