Thread: Re: Diminishing bandwidth performance with multiple quad core X5355s

On May 5, 9:44 am, CharlesBlackstone <charlesblacksto...@hotmail.com>
wrote:

> I think a lot of people are aware that an Opteron system has less
> bandwidth restrictions with a lot of processors, but that woodcrests
> don't have as good a memory controller and fall behind opterons after
> 4 cores or so. I'm asking how severe this is. Heavy number cruncing of
> huge data sets in RAM is a bandwidth intensive operation. So, I'm
> asking how badly woodcrests are impacted above 4 cores, for example, 8
> cores vs 4 cores, on bandwidth performance. I didn't think this was
> that vague, is there anything else I can tell you that will make the
> question less difficult to answer?


Your question is difficult to answer because you'd first need to know
(at least approximately)  what's the ratio of
FLOPS vs memory accesses, and the pattern of those accesses.  It all
boils down to that.  If your program
can keep the CPU busy during "long" stretches of time without needing
to access the memory bus, then your
program will definitely benefit from more cpus/cores.  If, on the
other hand, your program needs to request
(i.e. load/store) to main RAM (i.e. cache misses) very frequently,
then you will have contention on the memory
bus and your performance per cpu will degrade.

You ask "how badly" will your app degrade; well, the actual way to
model and predict that would be using the hardware performance
counters (OProfile under Linux, cputrack on Solaris, etc), and then
you'd get an idea about the rate of instructions vs anything else
(load/stores
to ram, retired FLOPS, cache misses, TLB misses, etc).    But of
course the best way is to measure your program on the real thing.

I wanted to post this even if it's a bit late on the thread because
right now I have exactly this kind of problem.
We're trying to figure out if a dual-Quadcore (Xeon) will be better
(cost/benefit wise) than a 4-way Opteron dualcore, for *our* program.

Spec CPU 2006 can give you some pretty good insights on this: go to
the advanced query option, and list all available results,
but filter by "number of total cores" equal to 8.  Go straight to the
int_rate and fp_rate figures, and you'll be able to compare how
4-way dual Opterons compare to (Xeon) dual-Quadcores.  At least, on
the Spec-2006 suite, whose programs have working set sizes quite
big, although they may not be as RAM-bottlenecked as your particular
program.

As you say, Opterons do definitely have a much better memory system.
But then a 4-way mobo is WAY more expensive that a dual-socket one...

And btw, if you want to benchmark just memory bandwidth/latency
performance, STREAM (http://www.cs.virginia.edu/stream/)
is the way to go.

Cheers,

JL


Re: Diminishing bandwidth performance with multiple quad core X5355s

From
Arjen van der Meijden
Date:
On 14-5-2007 0:00 jlmarin wrote:
> I wanted to post this even if it's a bit late on the thread because
> right now I have exactly this kind of problem.
> We're trying to figure out if a dual-Quadcore (Xeon) will be better
> (cost/benefit wise) than a 4-way Opteron dualcore, for *our* program.

We've benchmarked the Sun Fire x4600 (with the older socket 939 cpu's)
and compared it to a much cheaper dual quad core xeon X5355.

As you can see on the end of this page:
http://tweakers.net/reviews/674/8

The 4-way dual core opteron performs less (in our benchmark) than the
2-way quad core xeon. Our benchmark does not consume a lot of memory,
but I don't know which of the two profits most of that. Obviously it may
well be that the Socket F opterons with support for DDR2 memory perform
better, but we haven't seen much proof of that.
Given the cost of a 4-way dual core opteron vs a 2-way quad core xeon,
I'd go for the latter for now. The savings can be used to build a system
with heavier I/O and/or more memory, which normally yield bigger gains
in database land.
For example a Dell 2900 with 2x X5355 + 16GB of memory costs about 7000
euros less than a Dell 6950 with 4x 8220 + 16GB. You can buy an
additional MD1000 with 15x 15k rpm disks for that... And I doubt you'll
find any real-world database benchmark that will favour the
opteron-system if you look at the price/performance-picture.

Of course this picture might very well change as soon as the new
'Barcelona' quad core opterons are finally available.

> As you say, Opterons do definitely have a much better memory system.
> But then a 4-way mobo is WAY more expensive that a dual-socket one...

And it might be limited by NUMA and the relatively simple broadcast
architecture for cache coherency.

Best regards,

Arjen van der Meijden