Re: Large (8M) cache vs. dual-core CPUs - Mailing list pgsql-performance

From Sven Geisler
Subject Re: Large (8M) cache vs. dual-core CPUs
Date
Msg-id 4451D310.7010701@aeccom.com
Whole thread Raw
In response to Re: Large (8M) cache vs. dual-core CPUs  (Vivek Khera <vivek@khera.org>)
List pgsql-performance
Hi all,

Vivek Khera schrieb:
 > On Apr 25, 2006, at 2:14 PM, Bill Moran wrote:
 >> Where I'm stuck is in deciding whether we want to go with dual-core
 >> pentiums with 2M cache, or with HT pentiums with 8M cache.
 >
 > In order of preference:
 >
 > Opterons (dual core or single core)
 > Xeon with HT *disabled* at the BIOS level (dual or single core)
 >
 >
 > Notice Xeon with HT is not on my list :-)
 >

I support Vivek's order of preference. I have been going through a
nightmare of performance issues with different x86 hardware.
At the end of the day I can say the Opterons are faster because of their
memory bandwidth. I also had to disable HT on all our customers servers
  which were still using XEON's with HT.

There is a paper from HP which describes the advantage of the memory
architecture of the Opterons. This is the best explanation to me why
Opteron 875 is faster than a XEON MP 3 GHz, which I did compare last year.

I remember a thread in the postgresql devel list around HT in 2004,
where you can find the reason why you should disable HT.
This thread refers to Intel Developer Manual Volume 4 (Architecture
Optimisation) where there is some advice regarding spin-wait loop.
This is related to the code of src/include/storage/s_lock.h.

Cheers Sven.

======
 From Intel Developer Manual Volume 4

Synchronization for Short Periods

The frequency and duration that a thread needs to synchronize with
other threads depends application characteristics. When a
synchronization loop needs very fast response, applications may use a
spin-wait loop.

A spin-wait loop is typically used when one thread needs to wait a short
amount of time for another thread to reach a point of synchronization. A
spin-wait loop consists of a loop that compares a synchronization
variable with some pre-defined value [see Example 7-1(a)].

On a modern microprocessor with a superscalar speculative execution
engine, a loop like this results in the issue of multiple simultaneous read
requests from the spinning thread. These requests usually execute
out-of-order with each read request being allocated a buffer resource.
On detection of a write by a worker thread to a load that is in progress,
the processor must guarantee no violations of memory order occur. The
necessity of maintaining the order of outstanding memory operations
inevitably costs the processor a severe penalty that impacts all threads.

This penalty occurs on the Pentium Pro processor, the Pentium II
processor and the Pentium III processor. However, the penalty on these
processors is small compared with penalties suffered on the Pentium 4
and Intel Xeon processors. There the performance penalty for exiting
the loop is about 25 times more severe.

On a processor supporting Hyper-Threading Technology, spin-wait
loops can consume a significant portion of the execution bandwidth of
the processor. One logical processor executing a spin-wait loop can
severely impact the performance of the other logical processor.

====

pgsql-performance by date:

Previous
From: "Guoping Zhang"
Date:
Subject: Re: how unsafe (or worst scenarios) when setting fsync OFF for postgresql
Next
From: "Mikael Carneholm"
Date:
Subject: Re: how unsafe (or worst scenarios) when setting fsync OFF for postgresql