Thread: Large (8M) cache vs. dual-core CPUs

Large (8M) cache vs. dual-core CPUs

From
Bill Moran
Date:
I've been given the task of making some hardware recommendations for
the next round of server purchases.  The machines to be purchased
will be running FreeBSD & PostgreSQL.

Where I'm stuck is in deciding whether we want to go with dual-core
pentiums with 2M cache, or with HT pentiums with 8M cache.

Both of these are expensive bits of hardware, and I'm trying to
gather as much evidence as possible before making a recommendation.
The FreeBSD community seems pretty divided over which is likely to
be better, and I have been unable to discover a method for estimating
how much of the 2M cache on our existing systems is being used.

Does anyone in the PostgreSQL community have any experience with
large caches or dual-core pentiums that could make any recommendations?
Our current Dell 2850 systems are CPU bound - i.e. they have enough
RAM, and fast enough disks that the CPUs seem to be the limiting
factor.  As a result, this decision on what kind of CPUs to get in
the next round of servers is pretty important.

Any advice is much appreciated.

--
Bill Moran
Collaborative Fusion Inc.

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

Re: Large (8M) cache vs. dual-core CPUs

From
Scott Marlowe
Date:
On Tue, 2006-04-25 at 13:14, Bill Moran wrote:
> I've been given the task of making some hardware recommendations for
> the next round of server purchases.  The machines to be purchased
> will be running FreeBSD & PostgreSQL.
>
> Where I'm stuck is in deciding whether we want to go with dual-core
> pentiums with 2M cache, or with HT pentiums with 8M cache.

Given a choice between those two processors, I'd choose the AMD 64 x 2
CPU.  It's a significantly better processor than either of the Intel
choices.  And if you get the HT processor, you might as well turn of HT
on a PostgreSQL machine.  I've yet to see it make postgresql run faster,
but I've certainly seen HT make it run slower.

If you can't run AMD in your shop due to bigotry (let's call a spade a
spade) then I'd recommend the real dual core CPU with 2M cache.  Most of
what makes a database slow is memory and disk bandwidth.  Few datasets
are gonna fit in that 8M cache, and when they do, they'll get flushed
right out by the next request anyway.

> Does anyone in the PostgreSQL community have any experience with
> large caches or dual-core pentiums that could make any recommendations?
> Our current Dell 2850 systems are CPU bound - i.e. they have enough
> RAM, and fast enough disks that the CPUs seem to be the limiting
> factor.  As a result, this decision on what kind of CPUs to get in
> the next round of servers is pretty important.

If the CPUs are running at 100% then you're likely not memory I/O bound,
but processing speed bound.  The dual core will definitely be the better
option in that case.  I take it you work at a "Dell Only" place, hence
no AMD for you...

Sad, cause the AMD is, on a price / performance scale, twice the
processor for the same money as the Intel.

Re: Large (8M) cache vs. dual-core CPUs

From
Gavin Hamill
Date:
On Tue, 25 Apr 2006 14:14:35 -0400
Bill Moran <wmoran@collaborativefusion.com> wrote:

> Does anyone in the PostgreSQL community have any experience with
> large caches or dual-core pentiums that could make any
> recommendations?

Heh :) You're in the position I was in about a year ago - we "naturally"
replaced our old Dell 2650 with £14k of Dell 6850 Quad Xeon with 8M
cache, and TBH the performance is woeful :/

Having gone through Postgres consultancy, been through IBM 8-way POWER4
hardware, discovered a bit of a shortcoming in PG on N-way hardware
(where N is large) [1] , I have been able to try out a dual-dual-core
Opteron machine, and it flies.

In fact, it flies so well that we ordered one that day. So, in short
£3k's worth of dual-opteron beat the living daylights out of our Xeon
monster. I can't praise the Opteron enough, and I've always been a firm
Intel pedant - the HyperTransport stuff must really be doing wonders. I
typically see 500ms searches on it instead of 1000-2000ms on the Xeon)

As it stands, I've had to borrow this Opteron so much (and send live
searches across the net to the remote box) because otherwise we simply
don't have enough CPU power to run the website (!)

Cheers,
Gavin.

[1] Simon Riggs + Tom Lane are currently involved in optimisation work
for this - it turns out our extremely read-heavy load pattern reveals
some buffer locking issues in PG.

Re: Large (8M) cache vs. dual-core CPUs

From
Scott Marlowe
Date:
On Tue, 2006-04-25 at 13:14, Bill Moran wrote:
> I've been given the task of making some hardware recommendations for
> the next round of server purchases.  The machines to be purchased
> will be running FreeBSD & PostgreSQL.
>
> Where I'm stuck is in deciding whether we want to go with dual-core
> pentiums with 2M cache, or with HT pentiums with 8M cache.

BTW: For an interesting article on why the dual core Opterons are so
much better than their Intel cousins, read this article:

http://techreport.com/reviews/2005q2/opteron-x75/index.x?pg=1

Enlightening read.

Re: Large (8M) cache vs. dual-core CPUs

From
mark@mark.mielke.cc
Date:
On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote:
> Sad, cause the AMD is, on a price / performance scale, twice the
> processor for the same money as the Intel.

Maybe a year or two ago. Prices are all coming down. Intel more
than AMD.

AMD still seems better - but not X2, and it depends on the workload.

X2 sounds like biggotry against Intel... :-)

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: Large (8M) cache vs. dual-core CPUs

From
Scott Marlowe
Date:
On Tue, 2006-04-25 at 13:38, mark@mark.mielke.cc wrote:
> On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote:
> > Sad, cause the AMD is, on a price / performance scale, twice the
> > processor for the same money as the Intel.
>
> Maybe a year or two ago. Prices are all coming down. Intel more
> than AMD.
>
> AMD still seems better - but not X2, and it depends on the workload.
>
> X2 sounds like biggotry against Intel... :-)

Actually, that was from an article from this last month that compared
the dual core intel to the amd.  for every dollar spent on the intel,
you got about half the performance of the amd.  Not bigotry.  fact.

But don't believe me or the other people who've seen the difference.  Go
buy the Intel box.  No skin off my back.



Re: Large (8M) cache vs. dual-core CPUs

From
"Joshua D. Drake"
Date:
Bill Moran wrote:
> I've been given the task of making some hardware recommendations for
> the next round of server purchases.  The machines to be purchased
> will be running FreeBSD & PostgreSQL.
>
> Where I'm stuck is in deciding whether we want to go with dual-core
> pentiums with 2M cache, or with HT pentiums with 8M cache.

Dual Core Opterons :)

Joshua D. Drake

>
> Both of these are expensive bits of hardware, and I'm trying to
> gather as much evidence as possible before making a recommendation.
> The FreeBSD community seems pretty divided over which is likely to
> be better, and I have been unable to discover a method for estimating
> how much of the 2M cache on our existing systems is being used.
>
> Does anyone in the PostgreSQL community have any experience with
> large caches or dual-core pentiums that could make any recommendations?
> Our current Dell 2850 systems are CPU bound - i.e. they have enough
> RAM, and fast enough disks that the CPUs seem to be the limiting
> factor.  As a result, this decision on what kind of CPUs to get in
> the next round of servers is pretty important.
>
> Any advice is much appreciated.
>


--

            === The PostgreSQL Company: Command Prompt, Inc. ===
      Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
      Providing the most comprehensive  PostgreSQL solutions since 1997
                     http://www.commandprompt.com/



Re: Large (8M) cache vs. dual-core CPUs

From
"Joshua D. Drake"
Date:
> But don't believe me or the other people who've seen the difference.  Go
> buy the Intel box.  No skin off my back.

To be more detailed... AMD Opteron has some specific technical
advantages to their design over Intel when it comes to peforming for a
database. Specifically no front side bus :)

Also it is widely known and documented (just review the archives) that
AMD performs better then the equivelant Intel CPU, dollar for dollar.

Lastly it is also known that Dell frankly, sucks for PostgreSQL. Again,
check the archives.

Joshua D. Drake


--

            === The PostgreSQL Company: Command Prompt, Inc. ===
      Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
      Providing the most comprehensive  PostgreSQL solutions since 1997
                     http://www.commandprompt.com/



Re: Large (8M) cache vs. dual-core CPUs

From
David Boreham
Date:

Actually, that was from an article from this last month that compared
the dual core intel to the amd.  for every dollar spent on the intel,
you got about half the performance of the amd.  Not bigotry.  fact.

But don't believe me or the other people who've seen the difference.  Go
buy the Intel box.  No skin off my back. 
I've been doing plenty of performance evaluation on a parallel application
we're developing here : on Dual Core Opterons, P4, P4D. I can say that
the Opterons open up a can of wupass on the Intel processors. Almost 2x
the performance on our application vs. what the SpecCPU numbers would
suggest.


Re: Large (8M) cache vs. dual-core CPUs

From
"Joshua D. Drake"
Date:
David Boreham wrote:
>
>> Actually, that was from an article from this last month that compared
>> the dual core intel to the amd.  for every dollar spent on the intel,
>> you got about half the performance of the amd.  Not bigotry.  fact.
>>
>> But don't believe me or the other people who've seen the difference.  Go
>> buy the Intel box.  No skin off my back.
>>
> I've been doing plenty of performance evaluation on a parallel application
> we're developing here : on Dual Core Opterons, P4, P4D. I can say that
> the Opterons open up a can of wupass on the Intel processors. Almost 2x
> the performance on our application vs. what the SpecCPU numbers would
> suggest.

Because Stone Cold Said So!

>
>


--

            === The PostgreSQL Company: Command Prompt, Inc. ===
      Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
      Providing the most comprehensive  PostgreSQL solutions since 1997
                     http://www.commandprompt.com/



Re: Large (8M) cache vs. dual-core CPUs

From
Bruce Momjian
Date:
Joshua D. Drake wrote:
> David Boreham wrote:
> >
> >> Actually, that was from an article from this last month that compared
> >> the dual core intel to the amd.  for every dollar spent on the intel,
> >> you got about half the performance of the amd.  Not bigotry.  fact.
> >>
> >> But don't believe me or the other people who've seen the difference.  Go
> >> buy the Intel box.  No skin off my back.
> >>
> > I've been doing plenty of performance evaluation on a parallel application
> > we're developing here : on Dual Core Opterons, P4, P4D. I can say that
> > the Opterons open up a can of wupass on the Intel processors. Almost 2x
> > the performance on our application vs. what the SpecCPU numbers would
> > suggest.
>
> Because Stone Cold Said So!

I'll believe someone who uses 'wupass' in a sentence any day!

--
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Large (8M) cache vs. dual-core CPUs

From
Ron Peacetree
Date:
As others have noted, the current price/performance "sweet spot" for DB servers is 2S 2C AMD CPUs.  These CPUs are also
thehighest performing x86 compatible solution for pg. 

If you must go Intel for some reason, then wait until the new NGMA CPU's (Conroe, Merom, Woodcrest) come out and see
howthey bench on DB workloads.  Preliminary benches on these chips look good, but I would not recommend making a
purchasedecision based on just preliminary benches of unreleased products. 

If you must buy soon, then the decision is clear cut from anything except possinly a political/religious standpoint.
The NetBurst based Pentium and Xeon solutions are simply not worth the money spent or the PITA they will put you
throughcompared to the AMD dual cores.  The new Intel NGMA CPUs may be different, but all the pertinent evidence is not
yetavailable. 

My personal favorite pg platform at this time is one based on a 2 socket, dual core ready mainboard with 16 DIMM slots
combinedwith dual core AMD Kx's. 

Less money than the "comparable" Intel solution and _far_ more performance.

...and even if you do buy Intel, =DON"T= buy Dell unless you like causing trouble for yourself.
Bad experiences with Dell in general and their poor PERC RAID controllers in specific are all over this and other DB
forums.

Ron


-----Original Message-----
>From: Bill Moran <wmoran@collaborativefusion.com>
>Sent: Apr 25, 2006 2:14 PM
>To: pgsql-performance@postgresql.org
>Subject: [PERFORM] Large (8M) cache vs. dual-core CPUs
>
>
>I've been given the task of making some hardware recommendations for
>the next round of server purchases.  The machines to be purchased
>will be running FreeBSD & PostgreSQL.
>
>Where I'm stuck is in deciding whether we want to go with dual-core
>pentiums with 2M cache, or with HT pentiums with 8M cache.
>
>Both of these are expensive bits of hardware, and I'm trying to
>gather as much evidence as possible before making a recommendation.
>The FreeBSD community seems pretty divided over which is likely to
>be better, and I have been unable to discover a method for estimating
>how much of the 2M cache on our existing systems is being used.
>
>Does anyone in the PostgreSQL community have any experience with
>large caches or dual-core pentiums that could make any recommendations?
>Our current Dell 2850 systems are CPU bound - i.e. they have enough
>RAM, and fast enough disks that the CPUs seem to be the limiting
>factor.  As a result, this decision on what kind of CPUs to get in
>the next round of servers is pretty important.
>
>Any advice is much appreciated.
>

Re: Large (8M) cache vs. dual-core CPUs

From
David Boreham
Date:
>My personal favorite pg platform at this time is one based on a 2 socket, dual core ready mainboard with 16 DIMM slots
combinedwith dual core AMD Kx's. 
>
>
Right. We've been buying Tyan bare-bones boxes like this.
It's better to go with bare-bones than building boxes from bare metal
because the cooling issues are addressed correctly.

Note that if you need a large number of machines, then Intel
Core Duo may give the best overall price/performance because
they're cheaper to run and cool.




Re: Large (8M) cache vs. dual-core CPUs

From
Ron Peacetree
Date:
I've had intermittent "freeze and reboot" and, worse, just plain freeze problems with the Core Duo's I've been testing.
I have not been able to narrow it down so I do not know if it is a platform issue or a CPU issue.  It appears to be HW,
notSW, related since I have experienced the problem both under M$ and Linux 2.6 based OS's.  I have not tested the Core
Duo'sunder *BSD. 

Also, being that they are only 32b Core Duo's have limited utility for a present day DB server.

Power and space critical applications where 64b is not required may be a reasonable place for them... ...if the
present reliability problems I'm seeing go away.

Ron


-----Original Message-----
>From: David Boreham <david_list@boreham.org>
>Sent: Apr 25, 2006 5:15 PM
>To: pgsql-performance@postgresql.org
>Subject: Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
>
>
>>My personal favorite pg platform at this time is one based on a 2 socket, dual core ready mainboard with 16 DIMM
slotscombined with dual core AMD Kx's. 
>>
>>
>Right. We've been buying Tyan bare-bones boxes like this.
>It's better to go with bare-bones than building boxes from bare metal
>because the cooling issues are addressed correctly.
>
>Note that if you need a large number of machines, then Intel
>Core Duo may give the best overall price/performance because
>they're cheaper to run and cool.
>

Re: Large (8M) cache vs. dual-core CPUs

From
"Joshua D. Drake"
Date:
Ron Peacetree wrote:
> As others have noted, the current price/performance "sweet spot" for DB servers is 2S 2C AMD CPUs.  These CPUs are
alsothe highest performing x86 compatible solution for pg. 
>
> If you must go Intel for some reason, then wait until the new NGMA CPU's (Conroe, Merom, Woodcrest) come out and see
howthey bench on DB workloads.  Preliminary benches on these chips look good, but I would not recommend making a
purchasedecision based on just preliminary benches of unreleased products. 
>
> If you must buy soon, then the decision is clear cut from anything except possinly a political/religious standpoint.
> The NetBurst based Pentium and Xeon solutions are simply not worth the money spent or the PITA they will put you
throughcompared to the AMD dual cores.  The new Intel NGMA CPUs may be different, but all the pertinent evidence is not
yetavailable. 
>
> My personal favorite pg platform at this time is one based on a 2 socket, dual core ready mainboard with 16 DIMM
slotscombined with dual core AMD Kx's. 
>
> Less money than the "comparable" Intel solution and _far_ more performance.
>
> ...and even if you do buy Intel, =DON"T= buy Dell unless you like causing trouble for yourself.
> Bad experiences with Dell in general and their poor PERC RAID controllers in specific are all over this and other DB
forums.
>
> Ron
>

To add to this... the HP DL 385 is a pretty nice dual core capable
opteron box. Just don't buy the extra ram from HP (they like to charge
entirely too much).

Joshua D. Drake

--

            === The PostgreSQL Company: Command Prompt, Inc. ===
      Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
      Providing the most comprehensive  PostgreSQL solutions since 1997
                     http://www.commandprompt.com/



Re: Large (8M) cache vs. dual-core CPUs

From
"Jim C. Nasby"
Date:
On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote:
> On Tue, 2006-04-25 at 13:14, Bill Moran wrote:
> > I've been given the task of making some hardware recommendations for
> > the next round of server purchases.  The machines to be purchased
> > will be running FreeBSD & PostgreSQL.
> >
> > Where I'm stuck is in deciding whether we want to go with dual-core
> > pentiums with 2M cache, or with HT pentiums with 8M cache.
>
> Given a choice between those two processors, I'd choose the AMD 64 x 2
> CPU.  It's a significantly better processor than either of the Intel
> choices.  And if you get the HT processor, you might as well turn of HT
> on a PostgreSQL machine.  I've yet to see it make postgresql run faster,
> but I've certainly seen HT make it run slower.

Actually, believe it or not, a coworker just saw HT double the
performance of pgbench on his desktop machine. Granted, not really a
representative test case, but it still blew my mind. This was with a
database that fit in his 1G of memory, and running windows XP. Both
cases were newly minted pgbench databases with a scale of 40. Testing
was 40 connections and 100 transactions. With HT he saw 47.6 TPS,
without it was 21.1.

I actually had IT build put w2k3 server on a HT box specifically so I
could do more testing.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Large (8M) cache vs. dual-core CPUs

From
mark@mark.mielke.cc
Date:
On Tue, Apr 25, 2006 at 01:42:31PM -0500, Scott Marlowe wrote:
> On Tue, 2006-04-25 at 13:38, mark@mark.mielke.cc wrote:
> > On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote:
> > > Sad, cause the AMD is, on a price / performance scale, twice the
> > > processor for the same money as the Intel.
> > Maybe a year or two ago. Prices are all coming down. Intel more
> > than AMD.
> > AMD still seems better - but not X2, and it depends on the workload.
> > X2 sounds like biggotry against Intel... :-)
> Actually, that was from an article from this last month that compared
> the dual core intel to the amd.  for every dollar spent on the intel,
> you got about half the performance of the amd.  Not bigotry.  fact.
> But don't believe me or the other people who've seen the difference.  Go
> buy the Intel box.  No skin off my back.

AMD Opteron vs Intel Xeon is different than AMD X2 vs Pentium D.

For AMD X2 vs Pentium D - I have both - in similar price range, and
similar speed. I choose to use the AMD X2 as my server, and Pentium D
as my Windows desktop. They're both quite fast.

I made the choice I describe based on a lot of research. I was going
to go both Intel, until I noticed that the Intel prices were dropping
fast. 30% price cut in 2 months. AMD didn't drop at all during the
same time.

There are plenty of reasons to choose one over the other. Generally
the AMD comes out on top. It is *not* 2X though. Anybody who claims
this is being highly selective about which benchmarks they consider.

One article is nothing.

There is a lot of hype these days. AMD is winning the elite market,
which means that they are able to continue to sell high. Intel, losing
this market, is cutting its prices to compete. And they do compete.
Quite well.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: Large (8M) cache vs. dual-core CPUs

From
mark@mark.mielke.cc
Date:
On Tue, Apr 25, 2006 at 08:54:40PM -0400, mark@mark.mielke.cc wrote:
> I made the choice I describe based on a lot of research. I was going
> to go both Intel, until I noticed that the Intel prices were dropping
> fast. 30% price cut in 2 months. AMD didn't drop at all during the
> same time.

Errr.. big mistake. That was going to be - I was going to go both AMD.

> There are plenty of reasons to choose one over the other. Generally
> the AMD comes out on top. It is *not* 2X though. Anybody who claims
> this is being highly selective about which benchmarks they consider.

I have an Intel Pentium D 920, and an AMD X2 3800+. These are very
close in performance. The retail price difference is:

    Intel Pentium D 920     is selling for $310 CDN
    AMD X2 3800+            is selling for $347 CDN

Another benefit of Pentium D over AMD X2, at least until AMD chooses
to switch, is that Pentium D supports DDR2, whereas AMD only supports
DDR. There are a lot of technical pros and cons to each - with claims
from AMD that DDR2 can be slower than DDR - but one claim that isn't
often made, but that helped me make my choice:

    1) DDR2 supports higher transfer speeds. I'm using DDR2 5400 on
       the Intel. I think I'm at 3200 or so on the AMD X2.

    2) DDR2 is cheaper. I purchased 1 Gbyte DDR2 5400 for $147 CDN.
       1 Gbyte of DDR 3200 starts at around the same price, and
       stretches into $200 - $300 CDN.

Now, granted, the Intel 920 requires more electricity to run. Running
24/7 for a year might make the difference in cost.

It doesn't address point 1) though. I like my DDR2 5400.

So, unfortunately, I won't be able to do a good test for you to prove
that my Windows Pentium D box is not only cheaper to buy, but faster,
because the specs aren't exactly equivalent. In the mean time, I'm
quite enjoying my 3d games while doing other things at the same time.
I imagine my desktop load approaches that of a CPU-bound database
load. 3d games require significant I/O and CPU.

Anybody who claims that Intel is 2X more expensive for the same
performance, isn't considering all factors. No question at all - the
Opteron is good, and the Xeon isn't - but the original poster didn't
ask about Opeteron or Xeon, did he? For the desktop lines - X2 is not
double Pentium D. Maybe 10%. Maybe not at all. Especially now that
Intel is dropping it's prices due to overstock.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: Large (8M) cache vs. dual-core CPUs

From
Leigh Dyer
Date:
mark@mark.mielke.cc wrote:
> Another benefit of Pentium D over AMD X2, at least until AMD chooses
> to switch, is that Pentium D supports DDR2, whereas AMD only supports
> DDR. There are a lot of technical pros and cons to each - with claims
> from AMD that DDR2 can be slower than DDR - but one claim that isn't
> often made, but that helped me make my choice:
>
They're switching quite soon though -- within the next month now it
seems, after moving up their earlier plans to launch in June:

http://www.dailytech.com/article.aspx?newsid=1854

This Anandtech article shows the kind of performance increase we can
expect with DDR2 on AMD's new socket:

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2741

The short version is that it's an improvement, but not an enormous one,
and you need to spend quite a bit of cash on 800Mhz (PC6400) DDR2 sticks
to see the most benefit. Some brief local (Australian) price comparisons
show 1GB PC-3200 DDR sticks starting at just over AU$100, with 1GB
PC2-4200 DDR2 sticks around the same price, though Anandtech's tests
showed PC2-4200 DDR2 benching generally slower than PC-3200 DDR,
probably due to the increased latency in DDR2.

Comparing reasonable quality matched pairs of 1GB sticks, PC-3200 DDR
still seems generally cheaper than PC2-5300 DDR2, though not by a lot,
and I'm sure the DDR2 will start dropping even further as AMD systems
start using it in the next month or so.

One thing's for sure though -- Intel's Pentium D prices are remarkably
low, and at the lower end of the price range AMD has nothing that's even
remotely competitive in terms of price/performance. The Pentium D 805,
for instance, with its dual 2.67Ghz cores, costs just AU$180. The X2
3800+ is a far better chip, but it's also two-and-a-half times the price.

None of this really matters much in the server space though, where
Opteron's real advantage over Xeon is not its greater raw CPU power, or
its better dual-core implementation (though both would be hard to
dispute), but the improved system bandwidth provided by Hypertransport.
Even with Intel's next-gen CPUs, which look set to address the first two
points quite well, they still won't have an interconnect technology that
can really compete with AMD's.

Thanks
Leigh


Re: Large (8M) cache vs. dual-core CPUs

From
Ron Peacetree
Date:
>Another benefit of Pentium D over AMD X2, at least until AMD chooses
>to switch, is that Pentium D supports DDR2, whereas AMD only supports
>DDR. There are a lot of technical pros and cons to each - with claims
>from AMD that DDR2 can be slower than DDR - but one claim that isn't
>often made, but that helped me make my choice:
>
>    1) DDR2 supports higher transfer speeds. I'm using DDR2 5400 on
>       the Intel. I think I'm at 3200 or so on the AMD X2.
>
>    2) DDR2 is cheaper. I purchased 1 Gbyte DDR2 5400 for $147 CDN.
>       1 Gbyte of DDR 3200 starts at around the same price, and
>       stretches into $200 - $300 CDN.
>
There's a logical fallacy here that needs to be noted.

THROUGHPUT is better with DDR2 if and only if there is enough data to be fetched in a serial fashion from memory.

LATENCY however is dependent on the base clock rate of the RAM involved.
So PC3200, 200MHz x2, is going to actually perform better than PC2-5400, 166MHz x4, for almost any memory access
patternexcept those that are highly sequential. 

In fact, even PC2-6400, 200MHz x4, has a disadvantage compared to 200MHz x2 memory.
The minimum latency of the two types of memory in clock cycles is always going to be higher for the memory type that
multipliesits base clock rate by the most. 

For the mostly random memory access patterns that comprise many DB applications, the base latency of the RAM involved
isgoing to matter more than the peak throughput AKA the bandwidth of that RAM. 

The big message here is that despite engineering tricks and marketing claims, the base clock rate of the RAM you use
matters.

A minor point to be noted in addition here is that most DB servers under load are limited by their physical IO
subsystem,their HDs, and not the speed of their RAM. 

All of the above comments about the relative performance of different RAM types become insignificant when performance
isgated by the HD subsystem.  


Re: Large (8M) cache vs. dual-core CPUs

From
mark@mark.mielke.cc
Date:
On Tue, Apr 25, 2006 at 11:07:17PM -0400, Ron Peacetree wrote:
> THROUGHPUT is better with DDR2 if and only if there is enough data
> to be fetched in a serial fashion from memory.
> LATENCY however is dependent on the base clock rate of the RAM
> involved.  So PC3200, 200MHz x2, is going to actually perform better
> than PC2-5400, 166MHz x4, for almost any memory access pattern
> except those that are highly sequential.

I had forgotten about this. Still, it's not quite as simple as you say.

DDR2 has increased latency, however, it has a greater upper limit,
and when run at the same clock speed (200 Mhz for 200 Mhz), it is
not going to perform worse. Add in double the pre-fetching capability,
and what you get is that most benchmarks show DDR2 5400 as being
slightly faster than DDR 3200.

AMD is switching to DDR2, and I believe that, even after making such a
big deal about latency, and why they wouldn't switch to DDR2, they are
now saying that their on-chip memory controller will be able to access
DDR2 memory (when they support it soon) faster than Intel can, not
having an on-chip memory controller.

You said that DB accesses are random. I'm not so sure. In PostgreSQL,
are not the individual pages often scanned sequentially, especially
because all records are variable length? You don't think PostgreSQL
will regularly read 32 bytes (8 bytes x 4) at a time, in sequence?
Whether for table pages, or index pages - I'm not seeing why the
accesses wouldn't be sequential. You believe PostgreSQL will access
the table pages and index pages randomly on a per-byte basis? What
is the minimum PostgreSQL record size again? Isn't it 32 bytes or
over? :-)

I wish my systems were running the same OS, and I'd run a test for
you. Alas, I don't think comparing Windows to Linux would be valuable.

> A minor point to be noted in addition here is that most DB servers
> under load are limited by their physical IO subsystem, their HDs,
> and not the speed of their RAM.

It seems like a pretty major point to me. :-)

It's why Opteron with RAID kicks ass over HyperTransport.

> All of the above comments about the relative performance of
> different RAM types become insignificant when performance is gated
> by the HD subsystem.

Yes.

Luckily - we don't all have Terrabyte databases... :-)

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: Large (8M) cache vs. dual-core CPUs

From
Ron Peacetree
Date:
I'm posting this to the entire performance list in the hopes that it will be generally useful.
=r

-----Original Message-----
>From: mark@mark.mielke.cc
>Sent: Apr 26, 2006 3:25 AM
>To: Ron Peacetree <rjpeace@earthlink.net>
>Subject: Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
>
>Hi Ron:
>
>As a result of your post on the matter, I've been redoing some of my
>online research on this subject, to see whether I do have one or more
>things wrong.
>
I'm always in favor of independent investigation to find the truth. :-)


>You say:
>
>> THROUGHPUT is better with DDR2 if and only if there is enough data
>> to be fetched in a serial fashion from memory.
>...
>> So PC3200, 200MHz x2, is going to actually perform better than
>> PC2-5400, 166MHz x4, for almost any memory access pattern except
>> those that are highly sequential.
>...
>> For the mostly random memory access patterns that comprise many DB
>> applications, the base latency of the RAM involved is going to
>> matter more than the peak throughput AKA the bandwidth of that RAM.
>
>I'm trying to understand right now - why does DDR2 require data to be
>fetched in a serial fashion, in order for it to maximize bandwidth?
>
SDR transfers data on either the rising or falling edge of its clock cycle.

DDR transfers data on both the rising and falling edge of the base clock signal.  If there is a contiguous chunk of 2+
datumsto be transferred. 

DDR2 basically has a second clock that cycles at 2x the rate of the base clock and thus we get 4 data transfers per
baseclock cycle.  If there is a contiguous chunk of 4+ datums to be transferred. 

Note also what happens when transferring the first datum after a lull period.
For purposes of example, let's pretend that we are talking about a base clock rate of 200MHz= 5ns.

The SDR still transfers data every 5ns no matter what.
The DDR transfers the 1st datum in 10ns and then assuming there are at least 2 sequential datums to be transferred will
transferthe 2nd and subsequent sequential pieces of data every 2.5ns. 
The DDR2 transfers the 1st datum in 20ns and then assuming there are at least 4 sequential datums to be transferred
willtransfer the 2nd and subsequent sequential pieces of data every 1.25ns. 

Thus we can see that randomly accessing RAM degrades performance significantly for DDR and DDR2.   We can also see that
theconditions for optimal RAM performance become more restrictive as we go from SDR to DDR to DDR2. 
The reason DDR2 with a low base clock rate excelled at tasks like streaming multimedia and stank at things like small
transactionOLTP DB applications is now apparent. 

Factors like CPU prefetching and victim buffers can muddy this picture a bit.
Also, if the CPU's off die IO is slower than the RAM it is talking to, how fast that RAM is becomes unimportant.

The reason AMD is has held off from supporting DDR2 until now are:
1.  DDR is EOL.  JEDEC is not ratifying any DDR faster than 200x2 while DDR2 standards as fast as 333x4 are likely to
beratified (note that Intel pretty much avoided DDR, leaving it to AMD, while DDR2 is Intel's main RAM technology.
Guesswho has more pull with JEDEC?) 

2.  DDR and DDR2 RAM with equal base clock rates are finally available, removing the biggest performance difference
betweenDDR and DDR2. 

3.  Due to the larger demand for DDR2, more of it is produced.  That in turn has resulted in larger supplies of DDR2
thanDDR.  Which in turn, especially when combined with the factors above, has resulted in lower prices for DDR2 than
forDDR of the same or faster base clock rate by now. 

Hope this is helpful,
Ron

Re: Large (8M) cache vs. dual-core CPUs

From
David Boreham
Date:
>
>The reason AMD is has held off from supporting DDR2 until now are:
>1.  DDR is EOL.  JEDEC is not ratifying any DDR faster than 200x2 while DDR2 standards as fast as 333x4 are likely to
beratified (note that Intel pretty much avoided DDR, leaving it to AMD, while DDR2 is Intel's main RAM technology.
Guesswho has more pull with JEDEC?) 
>
>
>
DDR2 is to RDRAM as C# is to Java

;)



Re: Large (8M) cache vs. dual-core CPUs

From
William Yu
Date:
mark@mark.mielke.cc wrote:
>
> I have an Intel Pentium D 920, and an AMD X2 3800+. These are very
> close in performance. The retail price difference is:
>
>     Intel Pentium D 920     is selling for $310 CDN
>     AMD X2 3800+            is selling for $347 CDN
>
> Anybody who claims that Intel is 2X more expensive for the same
> performance, isn't considering all factors. No question at all - the
> Opteron is good, and the Xeon isn't - but the original poster didn't
> ask about Opeteron or Xeon, did he? For the desktop lines - X2 is not
> double Pentium D. Maybe 10%. Maybe not at all. Especially now that
> Intel is dropping it's prices due to overstock.

There's part of the equation you are missing here. This is a PostgreSQL
mailing list which means we're usually talking about performance of just
this specific server app. While in general there may not be that much of
a % difference between the 2 chips, there's a huge gap in Postgres. For
whatever reason, Postgres likes Opterons. Way more than Intel
P4-architecture chips. (And it appears way more than IBM Power4 chips
and a host of other chips also.)

Here's one of the many discussions we had about this issue last year:

http://qaix.com/postgresql-database-development/337-670-re-opteron-vs-xeon-was-what-to-do-with-6-disks-read.shtml

The exact reasons why Opteron runs PostgreSQL so much better than P4s,
we're not 100% sure of. We have guesses -- lower memory latency, lack of
shared FSB, better 64-bit, 64-bit IOMMU, context-switch storms on P4,
better dualcore implementation and so on. Perhaps it's a combination of
all the above factors but somehow, the general experience people have
had is that equivalently priced Opterons servers run PostgreSQL 2X
faster than P4 servers as the baseline and the gap increases as you add
more sockets and more cores.

Re: Large (8M) cache vs. dual-core CPUs

From
David Boreham
Date:
 >While in general there may not be that much of a % difference between
the 2 chips,
 >there's a huge gap in Postgres. For whatever reason, Postgres likes
Opterons.
 >Way more than Intel P4-architecture chips.

It isn't only Postgres. I work on a number of other server applications
that also run much faster on Opterons than the published benchmark
figures would suggest they should. They're all compiled with gcc4,
so possibly there's a compiler issue. I don't run Windows on any
of our Opteron boxes so I can't easily compare using the MS compiler.





Re: Large (8M) cache vs. dual-core CPUs

From
Ron Peacetree
Date:
Mea Culpa.  There is a mistake in my example for SDR vs DDR vs DDR2.
This is what I get for posting before my morning coffee.

The base latency for all of the memory types is that of the base clock rate; 200MHz= 5ns in my given examples.

I double factored, making DDR and DDR2 worse than they actually are.

Again, my apologies.
Ron

-----Original Message-----
>From: Ron Peacetree <rjpeace@earthlink.net>
>Sent: Apr 26, 2006 8:40 AM
>To: mark@mark.mielke.cc, pgsql-performance@postgresql.org
>Subject: Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
>
>I'm posting this to the entire performance list in the hopes that it will be generally useful.
>=r
<snip>
>
>Note also what happens when transferring the first datum after a lull period.
>For purposes of example, let's pretend that we are talking about a base clock rate of 200MHz= 5ns.
>
>The SDR still transfers data every 5ns no matter what.
>The DDR transfers the 1st datum in 10ns and then assuming there are at least 2 sequential datums to be >transferred
willtransfer the 2nd and subsequent sequential pieces of data every 2.5ns. 
>The DDR2 transfers the 1st datum in 20ns and then assuming there are at least 4 sequential datums to be >transferred
willtransfer the 2nd and subsequent sequential pieces of data every 1.25ns. 
>
=5= ns to first transfer in all 3 casess.  Bad Ron.   No Biscuit!

>
>Thus we can see that randomly accessing RAM degrades performance significantly for DDR and DDR2.   We can >also see
thatthe conditions for optimal RAM performance become more restrictive as we go from SDR to DDR to >DDR2. 
>The reason DDR2 with a low base clock rate excelled at tasks like streaming multimedia and stank at things like >small
transactionOLTP DB applications is now apparent. 
>
>Factors like CPU prefetching and victim buffers can muddy this picture a bit.
>Also, if the CPU's off die IO is slower than the RAM it is talking to, how fast that RAM is becomes unimportant.
>
These statements, and everything else I posted, are accurate.

Re: Large (8M) cache vs. dual-core CPUs

From
PFC
Date:
    Have a look at this Wikipedia page which outlines some differences
between the AMD and Intel versions of 64-bit :

    http://en.wikipedia.org/wiki/EM64T

> It isn't only Postgres. I work on a number of other server applications
> that also run much faster on Opterons than the published benchmark
> figures would suggest they should. They're all compiled with gcc4,
> so possibly there's a compiler issue. I don't run Windows on any
> of our Opteron boxes so I can't easily compare using the MS compiler.


Re: Large (8M) cache vs. dual-core CPUs

From
Scott Marlowe
Date:
On Tue, 2006-04-25 at 18:55, Jim C. Nasby wrote:
> On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote:
> > On Tue, 2006-04-25 at 13:14, Bill Moran wrote:
> > > I've been given the task of making some hardware recommendations for
> > > the next round of server purchases.  The machines to be purchased
> > > will be running FreeBSD & PostgreSQL.
> > >
> > > Where I'm stuck is in deciding whether we want to go with dual-core
> > > pentiums with 2M cache, or with HT pentiums with 8M cache.
> >
> > Given a choice between those two processors, I'd choose the AMD 64 x 2
> > CPU.  It's a significantly better processor than either of the Intel
> > choices.  And if you get the HT processor, you might as well turn of HT
> > on a PostgreSQL machine.  I've yet to see it make postgresql run faster,
> > but I've certainly seen HT make it run slower.
>
> Actually, believe it or not, a coworker just saw HT double the
> performance of pgbench on his desktop machine. Granted, not really a
> representative test case, but it still blew my mind. This was with a
> database that fit in his 1G of memory, and running windows XP. Both
> cases were newly minted pgbench databases with a scale of 40. Testing
> was 40 connections and 100 transactions. With HT he saw 47.6 TPS,
> without it was 21.1.
>
> I actually had IT build put w2k3 server on a HT box specifically so I
> could do more testing.

Just to clarify, this is PostgreSQL on Windows, right?

I wonder if the latest Linux kernel can do that well...  I'm guessing
that the kernel scheduler in Windows has had a lot of work to make it
good at scheduling on a HT architecture than the linux kernel has.

Re: Large (8M) cache vs. dual-core CPUs

From
William Yu
Date:
David Boreham wrote:
> It isn't only Postgres. I work on a number of other server applications
> that also run much faster on Opterons than the published benchmark
> figures would suggest they should. They're all compiled with gcc4,
> so possibly there's a compiler issue. I don't run Windows on any
> of our Opteron boxes so I can't easily compare using the MS compiler.


Maybe it's just a fact that the majority of x86 64-bit development for
open source software happens on Opteron/A64 machines. 64-bit AMD
machines were selling a good year before 64-bit Intel machines were
available. And even after Intel EMT64 were available, anybody in their
right mind would have picked AMD machines over Intel due to
cost/heat/performance. So you end up with 64-bit OSS being
developed/optimized for Opterons and the 10% running Intel EMT64 handle
compatibility issues.

Would be interesting to see a survey of what machines OSS developers use
to write/test/optimize their code.

Re: Large (8M) cache vs. dual-core CPUs

From
Scott Marlowe
Date:
On Tue, 2006-04-25 at 20:17, mark@mark.mielke.cc wrote:
> On Tue, Apr 25, 2006 at 08:54:40PM -0400, mark@mark.mielke.cc wrote:
> > I made the choice I describe based on a lot of research. I was going
> > to go both Intel, until I noticed that the Intel prices were dropping
> > fast. 30% price cut in 2 months. AMD didn't drop at all during the
> > same time.
>
> Errr.. big mistake. That was going to be - I was going to go both AMD.
>
> > There are plenty of reasons to choose one over the other. Generally
> > the AMD comes out on top. It is *not* 2X though. Anybody who claims
> > this is being highly selective about which benchmarks they consider.
>
> I have an Intel Pentium D 920, and an AMD X2 3800+. These are very
> close in performance. The retail price difference is:
>
>     Intel Pentium D 920     is selling for $310 CDN
>     AMD X2 3800+            is selling for $347 CDN

Let me be clear.  The performance difference between those boxes running
the latest first person shooter is not what I was alluding to in my
first post.  While the price of the Intel's may have dropped, there's a
huge difference (often 2x or more) in performance when running
PostgreSQL on otherwise similar chips from Intel and AMD.

Note that my workstation at work, my workstation at home, and my laptop
are all intel based machines.  They work fine for that.  But if I needed
to build a big fast oracle or postgresql server, I'd almost certainly go
with the AMD, especially so if I needed >2 cores, where the performance
difference becomes greater and greater.

You'd likely find that for PostgreSQL, the slowest dual core AMDs out
would still beat the fasted Intel Dual cores, because of the issue we've
seen on the list with context switching storms.

If you haven't actually run a heavy benchmark of postgresql on the two
architectures, please don't make your decision based on other
benchmarks.  Since you've got both a D920 and an X2 3800, that'd be a
great place to start.  Mock up some benchmark with a couple dozen
threads hitting the server at once and see if the Intel can keep up.  It
should do OK, but not great.  If you can get your hands on a dual
dual-core setup for either, you should really start to see the advantage
going to AMD, and by the time you get to a quad dual core setup, it
won't even be a contest.

Re: Large (8M) cache vs. dual-core CPUs

From
"Jim C. Nasby"
Date:
On Wed, Apr 26, 2006 at 10:27:18AM -0500, Scott Marlowe wrote:
> If you haven't actually run a heavy benchmark of postgresql on the two
> architectures, please don't make your decision based on other
> benchmarks.  Since you've got both a D920 and an X2 3800, that'd be a
> great place to start.  Mock up some benchmark with a couple dozen
> threads hitting the server at once and see if the Intel can keep up.  It

Or better yet, use dbt* or even pgbench so others can reproduce...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Large (8M) cache vs. dual-core CPUs

From
"Jim C. Nasby"
Date:
On Wed, Apr 26, 2006 at 10:17:58AM -0500, Scott Marlowe wrote:
> On Tue, 2006-04-25 at 18:55, Jim C. Nasby wrote:
> > On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote:
> > > On Tue, 2006-04-25 at 13:14, Bill Moran wrote:
> > > > I've been given the task of making some hardware recommendations for
> > > > the next round of server purchases.  The machines to be purchased
> > > > will be running FreeBSD & PostgreSQL.
> > > >
> > > > Where I'm stuck is in deciding whether we want to go with dual-core
> > > > pentiums with 2M cache, or with HT pentiums with 8M cache.
> > >
> > > Given a choice between those two processors, I'd choose the AMD 64 x 2
> > > CPU.  It's a significantly better processor than either of the Intel
> > > choices.  And if you get the HT processor, you might as well turn of HT
> > > on a PostgreSQL machine.  I've yet to see it make postgresql run faster,
> > > but I've certainly seen HT make it run slower.
> >
> > Actually, believe it or not, a coworker just saw HT double the
> > performance of pgbench on his desktop machine. Granted, not really a
> > representative test case, but it still blew my mind. This was with a
> > database that fit in his 1G of memory, and running windows XP. Both
> > cases were newly minted pgbench databases with a scale of 40. Testing
> > was 40 connections and 100 transactions. With HT he saw 47.6 TPS,
> > without it was 21.1.
> >
> > I actually had IT build put w2k3 server on a HT box specifically so I
> > could do more testing.
>
> Just to clarify, this is PostgreSQL on Windows, right?
>
> I wonder if the latest Linux kernel can do that well...  I'm guessing
> that the kernel scheduler in Windows has had a lot of work to make it
> good at scheduling on a HT architecture than the linux kernel has.

Yes, this is on Windows XP. Larry might also have a HT box with some
other OS on it we can check with (though I suspect that maybe that's
been beaten to death...)
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Large (8M) cache vs. dual-core CPUs

From
"Jim C. Nasby"
Date:
On Tue, Apr 25, 2006 at 11:07:17PM -0400, Ron Peacetree wrote:
> A minor point to be noted in addition here is that most DB servers under load are limited by their physical IO
subsystem,their HDs, and not the speed of their RAM. 

I think if that were the only consideration we wouldn't be seeing such a
dramatic difference between AMD and Intel though. Even in a disk-bound
server, caching is going to have a tremendous impact, and that's
essentially entirely bound by memory bandwith and latency.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Large (8M) cache vs. dual-core CPUs

From
Bruce Momjian
Date:
Jim C. Nasby wrote:
> On Wed, Apr 26, 2006 at 10:27:18AM -0500, Scott Marlowe wrote:
> > If you haven't actually run a heavy benchmark of postgresql on the two
> > architectures, please don't make your decision based on other
> > benchmarks.  Since you've got both a D920 and an X2 3800, that'd be a
> > great place to start.  Mock up some benchmark with a couple dozen
> > threads hitting the server at once and see if the Intel can keep up.  It
>
> Or better yet, use dbt* or even pgbench so others can reproduce...

For why Opterons are superior to Intel for PostgreSQL, see:

    http://techreport.com/reviews/2005q2/opteron-x75/index.x?pg=2

Section "MESI-MESI-MOESI Banana-fana...".  Specifically, this part about
the Intel implementation:

    The processor with the Invalid data in its cache (CPU 0, let's say)
    might then wish to modify that chunk of data, but it could not do so
    while the only valid copy of the data is in the cache of the other
    processor (CPU 1). Instead, CPU 0 would have to wait until CPU 1 wrote
    the modified data back to main memory before proceeding.and that takes
    time, bus bandwidth, and memory bandwidth. This is the great drawback of
    MESI.

AMD transfers the dirty cache line directly from cpu to cpu.  I can
imaging that helping our test-and-set shared memory usage quite a bit.

--
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Large (8M) cache vs. dual-core CPUs

From
"Jim C. Nasby"
Date:
On Wed, Apr 26, 2006 at 02:48:53AM -0400, mark@mark.mielke.cc wrote:
> You said that DB accesses are random. I'm not so sure. In PostgreSQL,
> are not the individual pages often scanned sequentially, especially
> because all records are variable length? You don't think PostgreSQL
> will regularly read 32 bytes (8 bytes x 4) at a time, in sequence?
> Whether for table pages, or index pages - I'm not seeing why the
> accesses wouldn't be sequential. You believe PostgreSQL will access
> the table pages and index pages randomly on a per-byte basis? What
> is the minimum PostgreSQL record size again? Isn't it 32 bytes or
> over? :-)

Data within a page can absolutely be accessed randomly; it would be
horribly inefficient to slog through 8K of data every time you needed to
find a single row.

The header size of tuples is ~23 bytes, depending on your version of
PostgreSQL, and data fields have to start on the proper alignment
(generally 4 bytes). So essentially the smallest row you can get is 28
bytes.

I know that tuple headers are dealt with as a C structure, but I don't
know if that means accessing any of the header costs the same as
accessing the whole thing. I don't know if PostgreSQL can access fields
within tuples without having to scan through at least the first part of
preceeding fields, though I suspect that it can access fixed-width
fields that sit before any varlena fields directly (without scanning
through the other fields).

If we ever got to the point of divorcing the in-memory tuple layout from
the table layout it'd be interesting to experiment with having all
varlena length info stored immediately after all fixed-width fields;
that could potentially make accessing varlena's randomly faster. Note
that null fields are indicated as such in the null bitmap, so I'm pretty
sure that their in-tuple position doesn't matter much. Of course if you
want the definitive answer, Use The Source.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Large (8M) cache vs. dual-core CPUs

From
"Jim C. Nasby"
Date:
On Wed, Apr 26, 2006 at 06:16:46PM -0400, Bruce Momjian wrote:
> Jim C. Nasby wrote:
> > On Wed, Apr 26, 2006 at 10:27:18AM -0500, Scott Marlowe wrote:
> > > If you haven't actually run a heavy benchmark of postgresql on the two
> > > architectures, please don't make your decision based on other
> > > benchmarks.  Since you've got both a D920 and an X2 3800, that'd be a
> > > great place to start.  Mock up some benchmark with a couple dozen
> > > threads hitting the server at once and see if the Intel can keep up.  It
> >
> > Or better yet, use dbt* or even pgbench so others can reproduce...
>
> For why Opterons are superior to Intel for PostgreSQL, see:
>
>     http://techreport.com/reviews/2005q2/opteron-x75/index.x?pg=2
>
> Section "MESI-MESI-MOESI Banana-fana...".  Specifically, this part about
> the Intel implementation:
>
>     The processor with the Invalid data in its cache (CPU 0, let's say)
>     might then wish to modify that chunk of data, but it could not do so
>     while the only valid copy of the data is in the cache of the other
>     processor (CPU 1). Instead, CPU 0 would have to wait until CPU 1 wrote
>     the modified data back to main memory before proceeding.and that takes
>     time, bus bandwidth, and memory bandwidth. This is the great drawback of
>     MESI.
>
> AMD transfers the dirty cache line directly from cpu to cpu.  I can
> imaging that helping our test-and-set shared memory usage quite a bit.

Wasn't the whole point of test-and-set that it's the recommended way to
do lightweight spinlocks according to AMD/Intel? You'd think they'd have
a way to make that performant on multiple CPUs (though if it's relying
on possibly modifying an underlying data page I can't really think of
how to do that without snaking through the cache...)
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Large (8M) cache vs. dual-core CPUs

From
mark@mark.mielke.cc
Date:
On Wed, Apr 26, 2006 at 05:37:31PM -0500, Jim C. Nasby wrote:
> On Wed, Apr 26, 2006 at 06:16:46PM -0400, Bruce Momjian wrote:
> > AMD transfers the dirty cache line directly from cpu to cpu.  I can
> > imaging that helping our test-and-set shared memory usage quite a bit.
> Wasn't the whole point of test-and-set that it's the recommended way to
> do lightweight spinlocks according to AMD/Intel? You'd think they'd have
> a way to make that performant on multiple CPUs (though if it's relying
> on possibly modifying an underlying data page I can't really think of
> how to do that without snaking through the cache...)

It's expensive no matter what. One method might be less expensive than
another. :-)

AMD definately seems to have things right for lowest absolute latency.
2X still sounds like an extreme case - but until I've actually tried a
very large, or thread intensive PostgreSQL db on both, I probably
shouldn't doubt the work of others too much. :-)

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: Large (8M) cache vs. dual-core CPUs

From
Vivek Khera
Date:
On Apr 25, 2006, at 2:14 PM, Bill Moran wrote:

> Where I'm stuck is in deciding whether we want to go with dual-core
> pentiums with 2M cache, or with HT pentiums with 8M cache.

In order of preference:

Opterons (dual core or single core)
Xeon with HT *disabled* at the BIOS level (dual or single core)


Notice Xeon with HT is not on my list :-)


Re: Large (8M) cache vs. dual-core CPUs

From
Vivek Khera
Date:
On Apr 25, 2006, at 5:09 PM, Ron Peacetree wrote:

> ...and even if you do buy Intel, =DON"T= buy Dell unless you like
> causing trouble for yourself.
> Bad experiences with Dell in general and their poor PERC RAID
> controllers in specific are all over this and other DB forums.

I don't think that their current controllers suck like their older
ones did.  That's what you'll read about in the archives -- the old
stuff.  Eg, the 1850's embedded RAID controller really flies, but it
only works with the internal disks.  I can't comment on the external
array controller for the 1850, but I cannot imagine it being any slower.

And personally, I've not experienced any major problems aside from
two bad PE1550's 4 years ago.  And I have currently about 15 Dell
servers running 24x7x365 doing various tasks, including postgres.

However, my *big* databases always go on dual opteron boxes.  my
current favorite is the SunFire X4100 with an external RAID.


Re: Large (8M) cache vs. dual-core CPUs

From
Sven Geisler
Date:
Hi all,

Vivek Khera schrieb:
 > On Apr 25, 2006, at 2:14 PM, Bill Moran wrote:
 >> Where I'm stuck is in deciding whether we want to go with dual-core
 >> pentiums with 2M cache, or with HT pentiums with 8M cache.
 >
 > In order of preference:
 >
 > Opterons (dual core or single core)
 > Xeon with HT *disabled* at the BIOS level (dual or single core)
 >
 >
 > Notice Xeon with HT is not on my list :-)
 >

I support Vivek's order of preference. I have been going through a
nightmare of performance issues with different x86 hardware.
At the end of the day I can say the Opterons are faster because of their
memory bandwidth. I also had to disable HT on all our customers servers
  which were still using XEON's with HT.

There is a paper from HP which describes the advantage of the memory
architecture of the Opterons. This is the best explanation to me why
Opteron 875 is faster than a XEON MP 3 GHz, which I did compare last year.

I remember a thread in the postgresql devel list around HT in 2004,
where you can find the reason why you should disable HT.
This thread refers to Intel Developer Manual Volume 4 (Architecture
Optimisation) where there is some advice regarding spin-wait loop.
This is related to the code of src/include/storage/s_lock.h.

Cheers Sven.

======
 From Intel Developer Manual Volume 4

Synchronization for Short Periods

The frequency and duration that a thread needs to synchronize with
other threads depends application characteristics. When a
synchronization loop needs very fast response, applications may use a
spin-wait loop.

A spin-wait loop is typically used when one thread needs to wait a short
amount of time for another thread to reach a point of synchronization. A
spin-wait loop consists of a loop that compares a synchronization
variable with some pre-defined value [see Example 7-1(a)].

On a modern microprocessor with a superscalar speculative execution
engine, a loop like this results in the issue of multiple simultaneous read
requests from the spinning thread. These requests usually execute
out-of-order with each read request being allocated a buffer resource.
On detection of a write by a worker thread to a load that is in progress,
the processor must guarantee no violations of memory order occur. The
necessity of maintaining the order of outstanding memory operations
inevitably costs the processor a severe penalty that impacts all threads.

This penalty occurs on the Pentium Pro processor, the Pentium II
processor and the Pentium III processor. However, the penalty on these
processors is small compared with penalties suffered on the Pentium 4
and Intel Xeon processors. There the performance penalty for exiting
the loop is about 25 times more severe.

On a processor supporting Hyper-Threading Technology, spin-wait
loops can consume a significant portion of the execution bandwidth of
the processor. One logical processor executing a spin-wait loop can
severely impact the performance of the other logical processor.

====