Robert Haas <robertmhaas@gmail.com> wrote:
> Stepping beyond the immediate issue of whether we want an unlocked
> test in there or not (and I agree that based on these numbers we
> don't), there's a clear and puzzling difference between those sets
> of numbers. The Opteron test is showing 32 clients getting about
> 23.9 times the throughput of a single client, which is not exactly
> linear but is at least respectable, whereas the PPC64 test is
> showing 32 clients getting just 14.5 times the throughput of a
> single client, which is pretty significantly less good. Moreover,
> cranking it up to 64 clients is squeezing a significant amount of
> additional work out on Opteron, but not on PPC64. The
> HP-UX/Itanium numbers in my OP give a ratio of 17.3x - a little
> better than your PPC64 numbers, but certainly not great.
I wouldn't make too much of that without comparing to a STREAM test
(properly configured -- the default array size is likely not to be
large enough for these machines). On a recently delivered 32 core
machine with 256 GB RAM, I saw numbers like this for just RAM
access:
Threads Copy Scale Add Triad
1 3332.3721 3374.8146 4500.1954 4309.7392
2 5822.8107 6158.4621 8563.3236 7756.9050
4 12474.9646 12282.3401 16960.7216 15399.2406
8 22353.6013 23502.4389 31911.5206 29574.8124
16 35760.8782 40946.6710 49108.4386 49264.6576
32 47567.3882 44935.4608 52983.9355 52278.1373
64 48428.9939 47851.7320 54141.8830 54560.0520
128 49354.4303 49586.6092 55663.2606 57489.5798
256 45081.3601 44303.1032 49561.3815 50073.3530
512 42271.9688 41689.8609 47362.4190 46388.9720
Graph attached for those who are visually inclined and have support
for the display of JPEG files.
Note that this is a machine which is *not* configured to be
blazingly fast for a single connection, but to scale up well for a
large number of concurrent processes:
http://www.redbooks.ibm.com/redpapers/pdfs/redp4650.pdf
Unless your benchmarks are falling off a lot faster than the STREAM
test on that hardware, I wouldn't worry.
-Kevin