Could you get a profile with call graphs? We need to know what leads to all
those osq_lock calls.
perf record --call-graph dwarf -a sleep 1
or such should do the trick, if run while the workload is running.
I'm doing something wrong because I can't find the slow part in the perf data; I'll get back to you on this one.
I think it's unwise to compare builds of such different vintage. The compiler
options and compiler version can have substantial effects.
I recommend also using -P1. Particularly when using unix sockets, the
specifics of how client threads and server threads are scheduled plays a huge
role.
Fair suggestions, those graphs come out of pgbench-tools where I profile all the latency, fast results for me are ruler flat. It's taken me several generations of water cooling experiments to reach that point, but even that only buys me 10 seconds before I can overload a CPU to higher latency with tougher workloads. Here's a few seconds of slightly updated examples, now with matching PGDG sourced 14+15 on the 5950X and with sched_autogroup_enabled=0 too:
$ pgbench -S -T 10 -c 32 -j 32 -M prepared -p 5434 -P 1 pgbench
pgbench (14.8 (Ubuntu 14.8-1.pgdg23.04+1))
progress: 1.0 s, 1032929.3 tps, lat 0.031 ms stddev 0.004
progress: 2.0 s, 1051239.0 tps, lat 0.030 ms stddev 0.001
progress: 3.0 s, 1047528.9 tps, lat 0.030 ms stddev 0.008...
$ pgbench -S -T 10 -c 32 -j 32 -M prepared -p 5432 -P 1 pgbench
pgbench (15.3 (Ubuntu 15.3-1.pgdg23.04+1))
progress: 1.0 s, 171816.4 tps, lat 0.184 ms stddev 0.029, 0 failed
progress: 2.0 s, 173501.0 tps, lat 0.184 ms stddev 0.024, 0 failed...
On the slow runs it will even do this, watch my 5950X accomplish 0 TPS for a second!
progress: 38.0 s, 177376.9 tps, lat 0.180 ms stddev 0.039, 0 failed
progress: 39.0 s, 35861.5 tps, lat 0.181 ms stddev 0.032, 0 failed
progress: 40.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 41.0 s, 222.1 tps, lat 304.500 ms stddev 741.413, 0 failed
progress: 42.0 s, 101199.6 tps, lat 0.530 ms stddev 18.862, 0 failed
progress: 43.0 s, 98286.9 tps, lat 0.328 ms stddev 8.156, 0 failed
Gonna have to measure seconds/transaction if this gets any worse.