Re: ok you all win what is best opteron (I dont want a hosed system - Mailing list pgsql-performance

From William Yu
Subject Re: ok you all win what is best opteron (I dont want a hosed system
Date
Msg-id d66601$30l6$1@news.hub.org
Whole thread Raw
In response to Re: ok you all win what is best opteron (I dont want a hosed system again)  ("Joel Fradkin" <jfradkin@wazagua.com>)
Responses Re: ok you all win what is best opteron (I dont want a hosed system
List pgsql-performance
4-way SMP Opteron system is actually pretty damn cheap -- if you get
2xDual Core versus 4xSingle. I just ordered a 2x265 (4x1.8ghz) system
and the price was about $1300 more than a 2x244 (2x1.8ghz).

Now you might ask, is a 2xDC comparable to 4x1? Here's some benchmarks
I've found that showing DC versus Single @ the same clock rates/same #
cores.

SpecIntRate Windows:
4x846 = 56.7
2x270 = 62.6

SpecFPRate Windows:
4x846 = 52.5
2x270 = 55.3

SpecWeb99SSL:
4x846 = 3399
2x270 = 4100 (2 870s were used)

Specjbb2000 IBM JVM:
4x848 = 146385
4x275 = 157432

What it looks like is a DC system is about 1 clock blip faster than a
corresponding single core SMP system. E.g. if you have a 2xDC @ 1.8ghz,
you need a 4x1 @ 2ghz to match the speed. (In some benchmarks, the
difference is 2 clock steps up.)

On the surface, it looks pretty amazing that a 4x1 Opteron with twice
the memory bandwidth is slower than a corresponding 2xDC. (DC Opterons
use the same socket as plain jane Opterons so they use the same 2xDDR
memory setup.) It turns out the latency in a 2xDC setup is just so much
lower and most apps like lower latency than higher bandwidth. Look at
the diagram of the following Tyan 4-processor MB:

ftp://ftp.tyan.com/datasheets/d_s4882_100.pdf

Take particular note of the lack of diagonal lines connecting CPUs. What
this means is if a process running on CPU0 needs memory attached to
CPU3, it must request either CPU1 or CPU2 to forward the request for it.
Without NUMA support, we're looking at 25% of memory access runs @ 50ns,
50% 110ns, 25% 170ns. (Rough numbers, I'd have to do a lot of googling
to the find the exact latencies but I'm just too lazy now.)

Now consider a 2xDC system. The 2 cores inside a single package are
connected by an immensely fast internal SRQ connection. As long as
there's no bandwidth limitation, both cores have fullspeed access to
memory while core-to-core snooping on each respective cache is roughly
10ns. So memory access speeds look like so: 50% 50ns, 50% 110ns.

If the memory locations you are need to access happen to be contained in
the L1/L2 cache, this makes the difference even more pronounced. You
then get memory access patterns for 4x1: 25% 5ns, 50% 65ns, 25% 125ns
versus 2xDC: 25% 5ns, 25% 15ns, 50% 65ns.



Joel Fradkin wrote:
> Thank you much for the info.
> I will take a look. I think the prices I have been seeing may exclude us
> getting another 4 proc box this soon. My boss asked me to get something in
> the 15K range (I spent 30 on the Dell).
> The HP seemed to run around 30 but it had a lot more drives then the dell
> (speced it with 14 10k drives).

pgsql-performance by date:

Previous
From: Mike Nolan
Date:
Subject: Re: ok you all win what is best opteron (I dont want a hosed system again)
Next
From: Matt Olson
Date:
Subject: Prefetch