Thread: hyperthreaded cpu still an issue in 8.4?

hyperthreaded cpu still an issue in 8.4?

From
Doug Hunley
Date:
Just wondering is the issue referenced in
http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
is still present in 8.4 or if some tunable (or other) made the use of
hyperthreading a non-issue. We're looking to upgrade our servers soon
for performance reasons and am trying to determine if more cpus (no
HT) or less cpus (with HT) are the way to go. Thx

--
Douglas J Hunley
http://douglasjhunley.com
Twitter: @hunleyd

Re: hyperthreaded cpu still an issue in 8.4?

From
Grzegorz Jaśkiewicz
Date:
On Tue, Jul 21, 2009 at 1:42 PM, Doug Hunley<doug@hunley.homeip.net> wrote:
> Just wondering is the issue referenced in
> http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
> is still present in 8.4 or if some tunable (or other) made the use of
> hyperthreading a non-issue. We're looking to upgrade our servers soon
> for performance reasons and am trying to determine if more cpus (no
> HT) or less cpus (with HT) are the way to go. Thx

I wouldn't recommend HT CPUs at all. I think your assumption, that HT
== CPU is wrong in first place.
Please read more about HT on intel's website.



--
GJ

Re: hyperthreaded cpu still an issue in 8.4?

From
Scott Marlowe
Date:
On Tue, Jul 21, 2009 at 6:42 AM, Doug Hunley<doug@hunley.homeip.net> wrote:
> Just wondering is the issue referenced in
> http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
> is still present in 8.4 or if some tunable (or other) made the use of
> hyperthreading a non-issue. We're looking to upgrade our servers soon
> for performance reasons and am trying to determine if more cpus (no
> HT) or less cpus (with HT) are the way to go. Thx

This isn't really an application tunable so much as a kernel level
tunable.  PostgreSQL seems to have scaled pretty well a couple years
ago in the tweakers.net benchmark of the Sun T1 CPU with 4 threads per
core.  However, at the time 4 AMD cores were spanking 8 Sun T1 cores
with 4 threads each.

Now, whether or not their benchmark applies to your application only
you can say.  Can you get machines on a 30 day trial program to
benchmark them and decide which to go with?  I'm guessing that dual
6core Opterons with lots of memory is the current king of the hill for
reasonably priced pg servers that are running CPU bound loads.

If you're mostly IO bound then it really doesn't matter which CPU.

Re: hyperthreaded cpu still an issue in 8.4?

From
Scott Marlowe
Date:
2009/7/21 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
> On Tue, Jul 21, 2009 at 1:42 PM, Doug Hunley<doug@hunley.homeip.net> wrote:
>> Just wondering is the issue referenced in
>> http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
>> is still present in 8.4 or if some tunable (or other) made the use of
>> hyperthreading a non-issue. We're looking to upgrade our servers soon
>> for performance reasons and am trying to determine if more cpus (no
>> HT) or less cpus (with HT) are the way to go. Thx
>
> I wouldn't recommend HT CPUs at all. I think your assumption, that HT
> == CPU is wrong in first place.

Not sure the OP said that...

Re: hyperthreaded cpu still an issue in 8.4?

From
Grzegorz Jaśkiewicz
Date:
On Tue, Jul 21, 2009 at 3:16 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote:
> On Tue, Jul 21, 2009 at 6:42 AM, Doug Hunley<doug@hunley.homeip.net> wrote:
>> Just wondering is the issue referenced in
>> http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
>> is still present in 8.4 or if some tunable (or other) made the use of
>> hyperthreading a non-issue. We're looking to upgrade our servers soon
>> for performance reasons and am trying to determine if more cpus (no
>> HT) or less cpus (with HT) are the way to go. Thx
>
> This isn't really an application tunable so much as a kernel level
> tunable.  PostgreSQL seems to have scaled pretty well a couple years
> ago in the tweakers.net benchmark of the Sun T1 CPU with 4 threads per
> core.  However, at the time 4 AMD cores were spanking 8 Sun T1 cores
> with 4 threads each.
>
> Now, whether or not their benchmark applies to your application only
> you can say.  Can you get machines on a 30 day trial program to
> benchmark them and decide which to go with?  I'm guessing that dual
> 6core Opterons with lots of memory is the current king of the hill for
> reasonably priced pg servers that are running CPU bound loads.
>
> If you're mostly IO bound then it really doesn't matter which CPU.
Unless he is doing a lot of computations, on small sets of data.


Now I am confused, HT is not anywhere near what 'threads' are on sparcs afaik.



--
GJ

Re: hyperthreaded cpu still an issue in 8.4?

From
Mark Mielke
Date:
On 07/21/2009 10:36 AM, Grzegorz Jaśkiewicz wrote:
On Tue, Jul 21, 2009 at 3:16 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: 
On Tue, Jul 21, 2009 at 6:42 AM, Doug Hunley<doug@hunley.homeip.net> wrote:   
Just wondering is the issue referenced in
http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
is still present in 8.4 or if some tunable (or other) made the use of
hyperthreading a non-issue. We're looking to upgrade our servers soon
for performance reasons and am trying to determine if more cpus (no
HT) or less cpus (with HT) are the way to go. Thx     
This isn't really an application tunable so much as a kernel level
tunable.  PostgreSQL seems to have scaled pretty well a couple years
ago in the tweakers.net benchmark of the Sun T1 CPU with 4 threads per
core.  However, at the time 4 AMD cores were spanking 8 Sun T1 cores
with 4 threads each.   
Unless he is doing a lot of computations, on small sets of data.


Now I am confused, HT is not anywhere near what 'threads' are on sparcs afaik.

Fun relatively off-topic chat... :-)

Intel "HT" provides the ability to execute two threads per CPU core at the same time.

Sun "CoolThreads" provide the same capability. They have just scaled it further. Instead of Intel's Xeon Series 5500 with dual-processor, quad-core, dual-thread configuration (= 16 active threads at a time), Sun T2+ has dual-processor, eight-core, eight-thread configuration (= 128 active threads at a time).

Just, each Sun "CoolThread" thread is far less capable than an Intel "HT" thread, so the comparison is really about the type of load.

But, the real point is that "thread" (whether "CoolThread" or "HT" thread), is not the same as core, which is not the same as processor. X 2 threads is usually significantly less benefit than X 2 cores. X 2 cores is probably less benefit than X 2 processors.

I think the Intel numbers says that Intel HT provides +15% performance on average.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>

Re: hyperthreaded cpu still an issue in 8.4?

From
Scott Marlowe
Date:
2009/7/21 Mark Mielke <mark@mark.mielke.cc>:
> On 07/21/2009 10:36 AM, Grzegorz Jaśkiewicz wrote:
>
> On Tue, Jul 21, 2009 at 3:16 PM, Scott Marlowe<scott.marlowe@gmail.com>
> wrote:
>
>
> On Tue, Jul 21, 2009 at 6:42 AM, Doug Hunley<doug@hunley.homeip.net> wrote:
>
>
> Just wondering is the issue referenced in
> http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
> is still present in 8.4 or if some tunable (or other) made the use of
> hyperthreading a non-issue. We're looking to upgrade our servers soon
> for performance reasons and am trying to determine if more cpus (no
> HT) or less cpus (with HT) are the way to go. Thx
>
>
> This isn't really an application tunable so much as a kernel level
> tunable.  PostgreSQL seems to have scaled pretty well a couple years
> ago in the tweakers.net benchmark of the Sun T1 CPU with 4 threads per
> core.  However, at the time 4 AMD cores were spanking 8 Sun T1 cores
> with 4 threads each.
>
>
> Unless he is doing a lot of computations, on small sets of data.
>
>
> Now I am confused, HT is not anywhere near what 'threads' are on sparcs
> afaik.
>
> Fun relatively off-topic chat... :-)
>
> Intel "HT" provides the ability to execute two threads per CPU core at the
> same time.
>
> Sun "CoolThreads" provide the same capability. They have just scaled it
> further. Instead of Intel's Xeon Series 5500 with dual-processor, quad-core,
> dual-thread configuration (= 16 active threads at a time), Sun T2+ has
> dual-processor, eight-core, eight-thread configuration (= 128 active threads
> at a time).
>
> Just, each Sun "CoolThread" thread is far less capable than an Intel "HT"
> thread, so the comparison is really about the type of load.
>
> But, the real point is that "thread" (whether "CoolThread" or "HT" thread),
> is not the same as core, which is not the same as processor. X 2 threads is
> usually significantly less benefit than X 2 cores. X 2 cores is probably
> less benefit than X 2 processors.

Actually, given the faster inter-connect speed and communication, I'd
think a single quad core CPU would be faster than the equivalent dual
dual core cpu.

> I think the Intel numbers says that Intel HT provides +15% performance on
> average.

It's very dependent on work load, that's for sure.  I've some things
that are 60 to 80% improved, others that go negative.  But 15 to 40%
is more typical.

Re: hyperthreaded cpu still an issue in 8.4?

From
Scott Carey
Date:


On 7/21/09 9:22 AM, "Scott Marlowe" <scott.marlowe@gmail.com> wrote:

>> But, the real point is that "thread" (whether "CoolThread" or "HT" thread),
>> is not the same as core, which is not the same as processor. X 2 threads is
>> usually significantly less benefit than X 2 cores. X 2 cores is probably
>> less benefit than X 2 processors.
>
> Actually, given the faster inter-connect speed and communication, I'd
> think a single quad core CPU would be faster than the equivalent dual
> dual core cpu.

Its very workload dependant and system dependant.  If the dual core dual cpu
setup has 2x the memory bandwidth of the single quad core (Nehalem,
Opteron), it also likely has higher memory latency and a dedicated
interconnect for memory and cache coherency.  And so some workloads will
favor the low latency and others will favor more bandwidth.

If its like the older Xeons, where an extra CPU doesn't buy you more memory
bandwidth alone (but better chipsets do), then a single quad core is usually
faster than dual core dual cpu (if the same chipset).  Even more so if there
is a lot of lock contention, since that can all be handled on the same CPU
rather than communicating across the bus.

But back on topic for HT -- HT doesn't like spin-locks much unless they use
the right low level instruction sequence rather than actually spinning.
With the right instruction, the spin will allow the other thread to do
work... With the wrong one, it will tie up the pipeline.  I have no idea
what Postgres' spin-locks and tool chain compile down to.


Re: hyperthreaded cpu still an issue in 8.4?

From
Jean-David Beyer
Date:
Scott Carey wrote:
>
> But back on topic for HT -- HT doesn't like spin-locks much unless they
> use the right low level instruction sequence rather than actually
> spinning. With the right instruction, the spin will allow the other
> thread to do work... With the wrong one, it will tie up the pipeline.  I
> have no idea what Postgres' spin-locks and tool chain compile down to.
>
I have two hyperthreaded Xeon processors, so this machine thinks it has four
processors. I have not seen the effect of spin locks with postgres. But I
can tell that Firefox and Thunderbird use the wrong ones. When one of these
is having trouble accessing a site, the processor in question goes up to
100% and the other part of the hyperthreaded processor does nothing even
though I run four BOINC processes that would be glad to gobble up the
cycles. Of course, since it is common to both Firefox and Thunderbird,
perhaps it is a problem in the name server, bind. But wherever it is, it
bugs me.

--
   .~.  Jean-David Beyer          Registered Linux User 85642.
   /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
  /( )\ Shrewsbury, New Jersey    http://counter.li.org
  ^^-^^ 13:55:01 up 6 days, 3:52, 3 users, load average: 4.03, 4.25, 4.45

Re: hyperthreaded cpu still an issue in 8.4?

From
Greg Smith
Date:
On Tue, 21 Jul 2009, Doug Hunley wrote:

> Just wondering is the issue referenced in
> http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
> is still present in 8.4 or if some tunable (or other) made the use of
> hyperthreading a non-issue. We're looking to upgrade our servers soon
> for performance reasons and am trying to determine if more cpus (no
> HT) or less cpus (with HT) are the way to go.

If you're talking about the hyperthreading in the latest Intel Nehalem
processors, I've been seeing great PostgreSQL performance from those.
The kind of weird behavior the old generation hyperthreading designs had
seems gone.  You can see at
http://archives.postgresql.org/message-id/alpine.GSO.2.01.0907222158050.16713@westnet.com
that I've cleared 90K TPS on a 16 core system (2 quad-core hyperthreaded
processors) running a small test using lots of parallel SELECTs.  That
would not be possible if there were HT spinlock problems still around.
There have been both PostgreSQL scaling improvments and hardware
improvements since the 2005 messages you saw there that have combined to
clear up the issues there.  While true cores would still be better if
everything else were equal, it rarely is, and I wouldn't hestitate to jump
on Intel's bandwagon right now.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: hyperthreaded cpu still an issue in 8.4?

From
Greg Smith
Date:
On Mon, 27 Jul 2009, Dave Youatt wrote:

> Do you think it's due to the new internal interconnect, that bears a
> strong resemblance to AMD's hypertransport (AMD's buzzword for borrowing
> lots of interconnect technology from the DEC alpha (EV7?)), or Intel
> fixing a not-so-good initial implementation of "hyperthreading" (Intel's
> marketing buzzword) from a few years ago.

It certainly looks like it's Intel finally getting the interconnect right,
because I'm seeing huge improvements in raw memory speeds too.  That's the
one area I used to see better results from Opterons on sometimes, but
Intel pulled way ahead on this last upgrade.  The experiment I haven't
done yet is to turn off hyperthreading and see how much the performance
degrades.  This is hard because I'm several thousand miles from the
servers I'm running the tests on, which makes low level config changes
somewhat hairy.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: hyperthreaded cpu still an issue in 8.4?

From
Dave Youatt
Date:
On 01/-10/-28163 11:59 AM, Greg Smith wrote:
> On Tue, 21 Jul 2009, Doug Hunley wrote:
>
>> Just wondering is the issue referenced in
>> http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
>> is still present in 8.4 or if some tunable (or other) made the use of
>> hyperthreading a non-issue. We're looking to upgrade our servers soon
>> for performance reasons and am trying to determine if more cpus (no
>> HT) or less cpus (with HT) are the way to go.
>
> If you're talking about the hyperthreading in the latest Intel Nehalem
> processors, I've been seeing great PostgreSQL performance from those.
> The kind of weird behavior the old generation hyperthreading designs
> had seems gone.  You can see at
> http://archives.postgresql.org/message-id/alpine.GSO.2.01.0907222158050.16713@westnet.com
> that I've cleared 90K TPS on a 16 core system (2 quad-core
> hyperthreaded processors) running a small test using lots of parallel
> SELECTs.  That would not be possible if there were HT spinlock
> problems still around. There have been both PostgreSQL scaling
> improvments and hardware improvements since the 2005 messages you saw
> there that have combined to clear up the issues there.  While true
> cores would still be better if everything else were equal, it rarely
> is, and I wouldn't hestitate to jump on Intel's bandwagon right now.

Greg, those are compelling numbers for the new Nehalem processors.
Great news for postgresql.  Do you think it's due to the new internal
interconnect, that bears a strong resemblance to AMD's hypertransport
(AMD's buzzword for borrowing lots of interconnect technology from the
DEC alpha (EV7?)), or Intel fixing a not-so-good initial implementation
of "hyperthreading" (Intel's marketing buzzword) from a few years ago.
Also, and this is getting maybe too far off topic, beyond the buzzwords,
what IS  the new "hyperthreading" in Nehalems?  -- opportunistic
superpipelined cpus?, superscalar?  What's shared by the cores
(bandwidth, cache(s))?   What's changed about the new hyperthreading
that makes it actually seem to work (or at least not causes other
problems)?   smarter scheduling of instructions to take advantage of
stalls, hazards another thread's instruction stream?  Fixed
instruction-level locking/interlocks, or avoiding locking whenever
possible?  better cache coherency mechanicms (related to the
interconnects)?  Jedi mind tricks???

I'm guessing it's the better interconnect, but work interferes with
finding the time to research and benchmark.



Re: hyperthreaded cpu still an issue in 8.4?

From
Scott Carey
Date:
On 7/27/09 11:05 AM, "Dave Youatt" <dave@meteorsolutions.com> wrote:

> On 01/-10/-28163 11:59 AM, Greg Smith wrote:
>> On Tue, 21 Jul 2009, Doug Hunley wrote:
>>
> Also, and this is getting maybe too far off topic, beyond the buzzwords,
> what IS  the new "hyperthreading" in Nehalems?  -- opportunistic
> superpipelined cpus?, superscalar?  What's shared by the cores
> (bandwidth, cache(s))?   What's changed about the new hyperthreading
> that makes it actually seem to work (or at least not causes other
> problems)?   smarter scheduling of instructions to take advantage of
> stalls, hazards another thread's instruction stream?  Fixed
> instruction-level locking/interlocks, or avoiding locking whenever
> possible?  better cache coherency mechanicms (related to the
> interconnects)?  Jedi mind tricks???
>

The Nehalems are an iteration off the "Core" processor line, which is a
4-way superscalar, out of order CPU.  Also, it has some very sophisticated
memory access reordering capability.
So, the HyperThreading here (Symmetric Multi-Threading, SMT, is the academic
name) will take advantage of that processor's inefficiencies -- a mix of
stalls due to waiting for memory, and unused execution 'width' resources.
So, if both threads are active and not stalled on memory access or other
execution bubbles, there are a lot of internal processor resources to share.
And if one of them is misbehaving and spinning, it won't dominate those
resources.

On the old Pentium-4 based HyperThreading, was also SMT, but those
processors were built to be high frequency and 'narrow' in terms of
superscalar execution (2-way superscalar, I believe).  So the main advantage
of HT there was that one thread could schedule work while another was
waiting on memory access.  If both were putting demands on the core
execution resources there was not much to gain unless one thread stalled on
memory access a lot, and if one of them was spinning it would eat up most of
the shared resources.

In both cases, the main execution resources get split up.  L1 cache,
instruction buffers and decoders, instruction reorder buffers, etc.  But in
this release, Intel increased several of these to beyond what is optimal for
one thread, to make the HT more efficient.

But the type of applications that will benefit the most from this HT is not
always the same as the older one, since the two CPU lines have different
weaknesses for SMT to mask or strengths to enhance.

> I'm guessing it's the better interconnect, but work interferes with
> finding the time to research and benchmark.

The new memory and interconnect architecture has a huge impact on
performance, but it is separate from the other big features (Turbo being the
other one not discussed here).  For scalability to many CPUs it is probably
the most significant however.

Note, that these CPU's have some good power saving technology that helps
quite a bit when idle or using just one core or thread, but when all threads
are ramped up and all the memory banks are filled the systems draw a LOT of
power.

AMD still does quite well if you're on a power budget with their latest
CPUs.

>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: hyperthreaded cpu still an issue in 8.4?

From
Matthew Wakeling
Date:
On Mon, 27 Jul 2009, Dave Youatt wrote:
> Greg, those are compelling numbers for the new Nehalem processors.
> Great news for postgresql.  Do you think it's due to the new internal
> interconnect...

Unlikely. Different threads on the same CPU core share their resources, so
they don't need an explicit communication channel at all (I'm simplifying
massively here). A real interconnect is only needed between CPUs and
between different cores on a CPU, and of course to the outside world.

Scott's explanation of why SMT works better now is much more likely to be
the real reason.

Matthew

--
 Ozzy:   Life is full of disappointments.
 Millie: No it isn't - I can always fit more in.

Re: hyperthreaded cpu still an issue in 8.4?

From
Dave Youatt
Date:

On Mon, 27 Jul 2009, Dave Youatt wrote:

Greg, those are compelling numbers for the new Nehalem processors.
Great news for postgresql. Do you think it's due to the new internal
interconnect...

Unlikely. Different threads on the same CPU core share their resources, so
they don't need an explicit communication channel at all (I'm simplifying massively here). A real interconnect is only needed between CPUs and between different cores on a CPU, and of course to the outside world. Scott's explanation of why SMT works better now is much more likely to be the real reason.

:-) there's also this interconnect thingie between sockets, cores and memory. Nehalem has a new one (for Intel), integrated memory controller, that is.  And a new on-chip cache organization.

 I'm still betting on the interconnect(s), particularly for bandwidth-intensive, data pumping server apps.  And it looks like the other new interconnect ("QuickPath") plays well w/the integrated memory controller for multi-socket systems.

Greg, in your spare time...  Also, curious how Nehalem compares w/AMD Phenom II, esp the newer ones w/multi-lane(?) HT

And apologies to the list for straying off topic a bit.

Re: hyperthreaded cpu still an issue in 8.4?

From
Greg Smith
Date:
On Tue, 28 Jul 2009, Matthew Wakeling wrote:

> Unlikely. Different threads on the same CPU core share their resources, so
> they don't need an explicit communication channel at all (I'm simplifying
> massively here). A real interconnect is only needed between CPUs and between
> different cores on a CPU, and of course to the outside world.

The question was "why are the new CPUs benchmarking so much faster than
the old ones", and I believe that's mainly because the interconnection
both between CPUs and between CPUs and memory are dramatically faster.
The SMT improvements stack on top of that, but are in my opinion
secondary.  I base that on also seeing a dramatic improvement in memory
transfer speeds on the new platform, which alone might even be sufficient
to explain the performance boost.  I'll break the two factors apart later
to be sure though--all the regulars on this list know where I stand on
measuring performance compared with theorizing about it.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: hyperthreaded cpu still an issue in 8.4?

From
Merlin Moncure
Date:
On Mon, Jul 27, 2009 at 2:05 PM, Dave Youatt<dave@meteorsolutions.com> wrote:
> On 01/-10/-28163 11:59 AM, Greg Smith wrote:
>> On Tue, 21 Jul 2009, Doug Hunley wrote:
>>
>>> Just wondering is the issue referenced in
>>> http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
>>> is still present in 8.4 or if some tunable (or other) made the use of
>>> hyperthreading a non-issue. We're looking to upgrade our servers soon
>>> for performance reasons and am trying to determine if more cpus (no
>>> HT) or less cpus (with HT) are the way to go.
>>
>> If you're talking about the hyperthreading in the latest Intel Nehalem
>> processors, I've been seeing great PostgreSQL performance from those.
>> The kind of weird behavior the old generation hyperthreading designs
>> had seems gone.  You can see at
>> http://archives.postgresql.org/message-id/alpine.GSO.2.01.0907222158050.16713@westnet.com
>> that I've cleared 90K TPS on a 16 core system (2 quad-core
>> hyperthreaded processors) running a small test using lots of parallel
>> SELECTs.  That would not be possible if there were HT spinlock
>> problems still around. There have been both PostgreSQL scaling
>> improvments and hardware improvements since the 2005 messages you saw
>> there that have combined to clear up the issues there.  While true
>> cores would still be better if everything else were equal, it rarely
>> is, and I wouldn't hestitate to jump on Intel's bandwagon right now.
>
> Greg, those are compelling numbers for the new Nehalem processors.
> Great news for postgresql.  Do you think it's due to the new internal
> interconnect, that bears a strong resemblance to AMD's hypertransport
[snip]

as a point of reference, here are some numbers on a quad core system
(2xintel 5160) using the old pgbench, scaling factor 10:

pgbench -S -c 16 -t 10000
starting vacuum...end.
transaction type: SELECT only
scaling factor: 10
query mode: simple
number of clients: 16
number of transactions per client: 10000
number of transactions actually processed: 160000/160000
tps = 24088.807000 (including connections establishing)
tps = 24201.820189 (excluding connections establishing)

This shows actually my system (pre-Nehalem) is pretty close clock for
clock, albeit thats not completely fair..I'm using only 4 cores on
dual core procs.  Still, these are almost two years old now.

EDIT: I see now that Greg has only 8 core system not counting
hyperthreading...so I'm getting absolutely spanked!  Go Intel!

Also, I'm absolutely dying to see some numbers on the high end
W5580...if anybody has one, please post!

merlin

Re: hyperthreaded cpu still an issue in 8.4?

From
Scott Marlowe
Date:
On Tue, Jul 28, 2009 at 2:58 PM, Merlin Moncure<mmoncure@gmail.com> wrote:
> On Mon, Jul 27, 2009 at 2:05 PM, Dave Youatt<dave@meteorsolutions.com> wrote:
>> On 01/-10/-28163 11:59 AM, Greg Smith wrote:
>>> On Tue, 21 Jul 2009, Doug Hunley wrote:
>>>
>>>> Just wondering is the issue referenced in
>>>> http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
>>>> is still present in 8.4 or if some tunable (or other) made the use of
>>>> hyperthreading a non-issue. We're looking to upgrade our servers soon
>>>> for performance reasons and am trying to determine if more cpus (no
>>>> HT) or less cpus (with HT) are the way to go.
>>>
>>> If you're talking about the hyperthreading in the latest Intel Nehalem
>>> processors, I've been seeing great PostgreSQL performance from those.
>>> The kind of weird behavior the old generation hyperthreading designs
>>> had seems gone.  You can see at
>>> http://archives.postgresql.org/message-id/alpine.GSO.2.01.0907222158050.16713@westnet.com
>>> that I've cleared 90K TPS on a 16 core system (2 quad-core
>>> hyperthreaded processors) running a small test using lots of parallel
>>> SELECTs.  That would not be possible if there were HT spinlock
>>> problems still around. There have been both PostgreSQL scaling
>>> improvments and hardware improvements since the 2005 messages you saw
>>> there that have combined to clear up the issues there.  While true
>>> cores would still be better if everything else were equal, it rarely
>>> is, and I wouldn't hestitate to jump on Intel's bandwagon right now.
>>
>> Greg, those are compelling numbers for the new Nehalem processors.
>> Great news for postgresql.  Do you think it's due to the new internal
>> interconnect, that bears a strong resemblance to AMD's hypertransport
> [snip]
>
> as a point of reference, here are some numbers on a quad core system
> (2xintel 5160) using the old pgbench, scaling factor 10:
>
> pgbench -S -c 16 -t 10000
> starting vacuum...end.
> transaction type: SELECT only
> scaling factor: 10
> query mode: simple
> number of clients: 16
> number of transactions per client: 10000
> number of transactions actually processed: 160000/160000
> tps = 24088.807000 (including connections establishing)
> tps = 24201.820189 (excluding connections establishing)
>
> This shows actually my system (pre-Nehalem) is pretty close clock for
> clock, albeit thats not completely fair..I'm using only 4 cores on
> dual core procs.  Still, these are almost two years old now.
>
> EDIT: I see now that Greg has only 8 core system not counting
> hyperthreading...so I'm getting absolutely spanked!  Go Intel!
>
> Also, I'm absolutely dying to see some numbers on the high end
> W5580...if anybody has one, please post!

Just FYI, I ran the same basic test but with -c 10 since -c shouldn't
really be greater than -s, and got this:

pgbench -S -c 10 -t 10000
starting vacuum...end.
transaction type: SELECT only
scaling factor: 10
number of clients: 10
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
tps = 32855.677494 (including connections establishing)
tps = 33344.826183 (excluding connections establishing)

With -s at 16 and -c at 16 I got this:

pgbench -S -c 16 -t 10000
starting vacuum...end.
transaction type: SELECT only
scaling factor: 16
number of clients: 16
number of transactions per client: 10000
number of transactions actually processed: 160000/160000
tps = 32822.559602 (including connections establishing)
tps = 33266.308652 (excluding connections establishing)

That's on dual Quad-Core AMD Opteron(tm) Processor 2352 CPUs (2.2GHz)
and 16 G ram.

Re: hyperthreaded cpu still an issue in 8.4?

From
Scott Carey
Date:
On 7/28/09 1:28 PM, "Greg Smith" <gsmith@gregsmith.com> wrote:

> On Tue, 28 Jul 2009, Matthew Wakeling wrote:
>
>> Unlikely. Different threads on the same CPU core share their resources, so
>> they don't need an explicit communication channel at all (I'm simplifying
>> massively here). A real interconnect is only needed between CPUs and between
>> different cores on a CPU, and of course to the outside world.
>
> The question was "why are the new CPUs benchmarking so much faster than
> the old ones", and I believe that's mainly because the interconnection
> both between CPUs and between CPUs and memory are dramatically faster.

I believe he was answering the question "What makes SMT work well with
Postgres for these CPUs when it had problems on old Xeons?" -- and that
doesn't have a lot to do with the interconnect or bandwidth.  It may also be
a more advanced compiler / OS toolchain.  Postgres 8.0 compiled on an older
system and OS might not work so well with the new HT.

As for the question as to what is so good about the Nehalems -- the on-die
memory controller and point-to-point interprocessor interconnect is the
biggest performance change.  Turbo and SMT are pretty good icing on the cake
though.





Re: hyperthreaded cpu still an issue in 8.4?

From
Scott Carey
Date:
On 7/28/09 1:58 PM, "Merlin Moncure" <mmoncure@gmail.com> wrote:

> On Mon, Jul 27, 2009 at 2:05 PM, Dave Youatt<dave@meteorsolutions.com> wrote:
>> On 01/-10/-28163 11:59 AM, Greg Smith wrote:
>>> On Tue, 21 Jul 2009, Doug Hunley wrote:
>>>
>>>> Just wondering is the issue referenced in
>>>> http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
>>>> is still present in 8.4 or if some tunable (or other) made the use of
>>>> hyperthreading a non-issue. We're looking to upgrade our servers soon
>>>> for performance reasons and am trying to determine if more cpus (no
>>>> HT) or less cpus (with HT) are the way to go.
>>>
>>> If you're talking about the hyperthreading in the latest Intel Nehalem
>>> processors, I've been seeing great PostgreSQL performance from those.
>>> The kind of weird behavior the old generation hyperthreading designs
>>> had seems gone.  You can see at
>>> http://archives.postgresql.org/message-id/alpine.GSO.2.01.0907222158050.1671
>>> 3@westnet.com
>>> that I've cleared 90K TPS on a 16 core system (2 quad-core
>>> hyperthreaded processors) running a small test using lots of parallel
>>> SELECTs.  That would not be possible if there were HT spinlock
>>> problems still around. There have been both PostgreSQL scaling
>>> improvments and hardware improvements since the 2005 messages you saw
>>> there that have combined to clear up the issues there.  While true
>>> cores would still be better if everything else were equal, it rarely
>>> is, and I wouldn't hestitate to jump on Intel's bandwagon right now.
>>
>> Greg, those are compelling numbers for the new Nehalem processors.
>> Great news for postgresql.  Do you think it's due to the new internal
>> interconnect, that bears a strong resemblance to AMD's hypertransport
> [snip]
>
> as a point of reference, here are some numbers on a quad core system
> (2xintel 5160) using the old pgbench, scaling factor 10:
>
> pgbench -S -c 16 -t 10000
> starting vacuum...end.
> transaction type: SELECT only
> scaling factor: 10
> query mode: simple
> number of clients: 16
> number of transactions per client: 10000
> number of transactions actually processed: 160000/160000
> tps = 24088.807000 (including connections establishing)
> tps = 24201.820189 (excluding connections establishing)
>
> This shows actually my system (pre-Nehalem) is pretty close clock for
> clock, albeit thats not completely fair..I'm using only 4 cores on
> dual core procs.  Still, these are almost two years old now.
>
> EDIT: I see now that Greg has only 8 core system not counting
> hyperthreading...so I'm getting absolutely spanked!  Go Intel!
>
> Also, I'm absolutely dying to see some numbers on the high end
> W5580...if anybody has one, please post!
>
> merlin

Note, that a 5160 is a bit behind.  The 52xx and 54xx series were a decent
perf boost on their own, with more cache, and usually more total system
bandwidth too (50% more than 51xx and 53xx is typical).

But the leap to 55xx is far bigger!


>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: hyperthreaded cpu still an issue in 8.4?

From
Greg Smith
Date:
On Tue, 28 Jul 2009, Scott Marlowe wrote:

> Just FYI, I ran the same basic test but with -c 10 since -c shouldn't
> really be greater than -s

That's only true if you're running the TPC-B-like or other write tests,
where access to the small branches table becomes a serious hotspot for
contention.  The select-only test has no such specific restriction as it
only operations on the big accounts table.  Often peak throughput is
closer to a very small multiple on the number of cores though, and
possibly even clients=cores, presumably because it's more efficient to
approximately peg one backend per core rather than switch among more than
one on each--reduced L1 cache contention etc.  That's the behavior you
measured when your test showed better results with c=10 than c=16 on a 8
core system, rather than suffering less from the "c must be < s"
contention limitation.

Sadly I don't have or expect to have a W5580 in the near future though,
the X5550 @ 2.67GHz is the bang for the buck sweet spot right now and
accordingly that's what I have in the lab at Truviso.  As Merlin points
out, that's still plenty to spank any select-only pgbench results I've
ever seen.  The multi-threaded pgbench batch submitted by Itagaki Takahiro
recently is here just in time to really exercise these new processors
properly.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: hyperthreaded cpu still an issue in 8.4?

From
Scott Marlowe
Date:
On Tue, Jul 28, 2009 at 5:21 PM, Greg Smith<gsmith@gregsmith.com> wrote:
> On Tue, 28 Jul 2009, Scott Marlowe wrote:
>
>> Just FYI, I ran the same basic test but with -c 10 since -c shouldn't
>> really be greater than -s
>
> That's only true if you're running the TPC-B-like or other write tests,
> where access to the small branches table becomes a serious hotspot for
> contention.  The select-only test has no such specific restriction as it

I thought so too, but my pgbench -S -c 16 was WAY faster on a -s 16 db
than on a -s10...

Re: hyperthreaded cpu still an issue in 8.4?

From
Jon Nelson
Date:
On Tue, Jul 28, 2009 at 4:11 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote:
> On Tue, Jul 28, 2009 at 2:58 PM, Merlin Moncure<mmoncure@gmail.com> wrote:
>> On Mon, Jul 27, 2009 at 2:05 PM, Dave Youatt<dave@meteorsolutions.com> wrote:
>>> On 01/-10/-28163 11:59 AM, Greg Smith wrote:
>>>> On Tue, 21 Jul 2009, Doug Hunley wrote:
>>>>
>>>>> Just wondering is the issue referenced in
>>>>> http://archives.postgresql.org/pgsql-performance/2005-11/msg00415.php
>>>>> is still present in 8.4 or if some tunable (or other) made the use of
>>>>> hyperthreading a non-issue. We're looking to upgrade our servers soon
>>>>> for performance reasons and am trying to determine if more cpus (no
>>>>> HT) or less cpus (with HT) are the way to go.
>>>>
>>>> If you're talking about the hyperthreading in the latest Intel Nehalem
>>>> processors, I've been seeing great PostgreSQL performance from those.
>>>> The kind of weird behavior the old generation hyperthreading designs
>>>> had seems gone.  You can see at
>>>> http://archives.postgresql.org/message-id/alpine.GSO.2.01.0907222158050.16713@westnet.com
>>>> that I've cleared 90K TPS on a 16 core system (2 quad-core
>>>> hyperthreaded processors) running a small test using lots of parallel
>>>> SELECTs.  That would not be possible if there were HT spinlock
>>>> problems still around. There have been both PostgreSQL scaling
>>>> improvments and hardware improvements since the 2005 messages you saw
>>>> there that have combined to clear up the issues there.  While true
>>>> cores would still be better if everything else were equal, it rarely
>>>> is, and I wouldn't hestitate to jump on Intel's bandwagon right now.
>>>
>>> Greg, those are compelling numbers for the new Nehalem processors.
>>> Great news for postgresql.  Do you think it's due to the new internal
>>> interconnect, that bears a strong resemblance to AMD's hypertransport

I'd love to see some comparisons on the exact same hardware, same
kernel and everything but with HT enabled and disabled. Don't forget
that newer (Linux) kernels have vastly improved SMP performance.

--
Jon

Re: hyperthreaded cpu still an issue in 8.4?

From
Stefan Kaltenbrunner
Date:
Greg Smith wrote:
> On Tue, 28 Jul 2009, Scott Marlowe wrote:
>
>> Just FYI, I ran the same basic test but with -c 10 since -c shouldn't
>> really be greater than -s
>
> That's only true if you're running the TPC-B-like or other write tests,
> where access to the small branches table becomes a serious hotspot for
> contention.  The select-only test has no such specific restriction as it
> only operations on the big accounts table.  Often peak throughput is
> closer to a very small multiple on the number of cores though, and
> possibly even clients=cores, presumably because it's more efficient to
> approximately peg one backend per core rather than switch among more
> than one on each--reduced L1 cache contention etc.  That's the behavior
> you measured when your test showed better results with c=10 than c=16 on
> a 8 core system, rather than suffering less from the "c must be < s"
> contention limitation.

Well the real problem is that pgbench itself does not scale too well to
lots of concurrent connections and/or to high transaction rates so it
seriously skews the result. If you look
http://www.kaltenbrunner.cc/blog/index.php?/archives/26-Benchmarking-8.4-Chapter-1Read-Only-workloads.html.
It is pretty clear that 90k(or the 83k I got due to the slower E5530)
tps is actually a pgench limit and that the backend really can do almost
twice as fast (I only demonstrated ~140k tps using sysbench there but I
later managed to do ~160k tps with queries that are closer to what
pgbench does in the lab)


Stefan

Re: hyperthreaded cpu still an issue in 8.4?

From
Greg Smith
Date:
On Wed, 29 Jul 2009, Stefan Kaltenbrunner wrote:

> Well the real problem is that pgbench itself does not scale too well to lots
> of concurrent connections and/or to high transaction rates so it seriously
> skews the result.

Sure, but that's what the multi-threaded pgbench code aims to fix, which
didn't show up until after you ran your tests.  I got the 90K select TPS
with a completely unoptimized postgresql.conf, so that's by no means the
best it's possible to get out of the new pgbench code on this hardware.
I've seen as much as a 40% improvement over the standard pgbench code in
my limited testing so far, and the patch author has seen a 450% one.  You
might be able to see at least the same results you got from sysbench out
of it.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: hyperthreaded cpu still an issue in 8.4?

From
Stefan Kaltenbrunner
Date:
Greg Smith wrote:
> On Wed, 29 Jul 2009, Stefan Kaltenbrunner wrote:
>
>> Well the real problem is that pgbench itself does not scale too well
>> to lots of concurrent connections and/or to high transaction rates so
>> it seriously skews the result.
>
> Sure, but that's what the multi-threaded pgbench code aims to fix, which
> didn't show up until after you ran your tests.  I got the 90K select TPS
> with a completely unoptimized postgresql.conf, so that's by no means the
> best it's possible to get out of the new pgbench code on this hardware.
> I've seen as much as a 40% improvement over the standard pgbench code in
> my limited testing so far, and the patch author has seen a 450% one.
> You might be able to see at least the same results you got from sysbench
> out of it.

oh - the 90k tps are with the new multithreaded pgbench? missed that
fact. As you can see from my results I managed to get 83k with the 8.4
pgbench on a slightly slower Nehalem which does not sound too impressive
for the new code...


Stefan

Re: hyperthreaded cpu still an issue in 8.4?

From
Greg Smith
Date:
On Wed, 29 Jul 2009, Stefan Kaltenbrunner wrote:

> oh - the 90k tps are with the new multithreaded pgbench? missed that fact. As
> you can see from my results I managed to get 83k with the 8.4 pgbench on a
> slightly slower Nehalem which does not sound too impressive for the new
> code...

I got 96K with the default postgresql.conf - 32MB shared_buffers etc. -
and I didn't even try to find the sweet spot yet for things like number of
threads, that's just the first useful number that popped out.  I saw as
much as 87K with the regular one too.  I already planned to run the test
set you did for comparison sake at some point.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: hyperthreaded cpu still an issue in 8.4?

From
Matthew Wakeling
Date:
On Tue, 28 Jul 2009, Scott Carey wrote:
> On 7/28/09 1:28 PM, "Greg Smith" <gsmith@gregsmith.com> wrote:
>> On Tue, 28 Jul 2009, Matthew Wakeling wrote:
>>
>>> Unlikely. Different threads on the same CPU core share their resources, so
>>> they don't need an explicit communication channel at all (I'm simplifying
>>> massively here). A real interconnect is only needed between CPUs and between
>>> different cores on a CPU, and of course to the outside world.
>>
>> The question was "why are the new CPUs benchmarking so much faster than
>> the old ones"...
>
> I believe he was answering the question "What makes SMT work well with
> Postgres for these CPUs when it had problems on old Xeons?"

Exactly. Interconnects and bandwidth are going to make the CPU faster in
general, but won't have any (much?) effect on the relative speed with and
without SMT.

If the new CPUs are four-way dispatch and the old ones were two-way
dispatch, that easily explains why SMT is a bonus on the new CPUs. With a
two-way dispatch, a single thread is likely to be able to keep both
pipelines busy most of the time. Switching on SMT will try to keep the
pipelines busy a bit more, giving a small improvement, however that
improvement is cancelled out by the cache being half the size for each
thread. One of our applications ran 30% slower with SMT enabled on an old
Xeon.

On the new CPUs, it would be very hard for a single thread to keep four
execution pipelines busy, so switching on SMT increases the throughput in
a big way. Also, the bigger caches mean that splitting the cache in half
doesn't have nearly as much impact. That's why SMT is a good thing on the
new CPUs.

However, SMT is always likely to slow down any process that is
single-threaded, if that is the only thread doing significant work on the
machine. It only really shows its benefit when you have more CPU-intensive
processes than real CPU cores.

Matthew

--
 In the beginning was the word, and the word was unsigned,
 and the main() {} was without form and void...

Re: hyperthreaded cpu still an issue in 8.4?

From
Matthew Wakeling
Date:
On Tue, 28 Jul 2009, Dave Youatt wrote:
> Unlikely. Different threads on the same CPU core share their resources, so they don't
> need an explicit communication channel at all (I'm simplifying massively here). A real
> interconnect is only needed between CPUs and between different cores on a CPU, and of
> course to the outside world. Scott's explanation of why SMT works better now is much more
> likely to be the real reason.

Actually, no, I wrote that. Please give at least some indication when
replying to an email which parts of it are your words and which are quotes
from someone else. Emails can be incredibly confusing without that
distinction.

You actually wrote:

> :-) there's also this interconnect thingie between sockets, cores and memory. Nehalem has
> a new one (for Intel), integrated memory controller, that is.  And a new on-chip cache
> organization.

This, (like I mention elsewhere) will make the CPU faster overall, but is
unlikely to increase the performance gain of switching SMT on. In fact,
having a lower latency memory controller is more likely to reduce some of
the problem that SMT is trying to address - that of a single thread
stalling on memory access.

Having said that, memory access latency is not scaling as quickly as CPU
speed, so over time SMT is going to get more important.

Matthew

--
"Take care that thou useth the proper method when thou taketh the measure of
 high-voltage circuits so that thou doth not incinerate both thee and the
 meter; for verily, though thou has no account number and can be easily
 replaced, the meter doth have one, and as a consequence, bringeth much woe
 upon the Supply Department."   -- The Ten Commandments of Electronics

Re: hyperthreaded cpu still an issue in 8.4?

From
Merlin Moncure
Date:
On Tue, Jul 28, 2009 at 7:21 PM, Greg Smith<gsmith@gregsmith.com> wrote:
> On Tue, 28 Jul 2009, Scott Marlowe wrote:
>
>> Just FYI, I ran the same basic test but with -c 10 since -c shouldn't
>> really be greater than -s
>
> That's only true if you're running the TPC-B-like or other write tests,
> where access to the small branches table becomes a serious hotspot for
> contention.  The select-only test has no such specific restriction as it
> only operations on the big accounts table.  Often peak throughput is closer
> to a very small multiple on the number of cores though, and possibly even
> clients=cores, presumably because it's more efficient to approximately peg
> one backend per core rather than switch among more than one on each--reduced
> L1 cache contention etc.  That's the behavior you measured when your test
> showed better results with c=10 than c=16 on a 8 core system, rather than
> suffering less from the "c must be < s" contention limitation.
>
> Sadly I don't have or expect to have a W5580 in the near future though, the
> X5550 @ 2.67GHz is the bang for the buck sweet spot right now and
> accordingly that's what I have in the lab at Truviso.  As Merlin points out,
> that's still plenty to spank any select-only pgbench results I've ever seen.
>  The multi-threaded pgbench batch submitted by Itagaki Takahiro recently is
> here just in time to really exercise these new processors properly.

Can I trouble you for a single client run, say:

pgbench -S -c 1 -t 250000

I'd like to see how much of your improvement comes from SMT and how
much comes from general improvements to the cpu...

merlin