Thread: Performance problems on 4-way AMD Opteron 875 (dual core)

Performance problems on 4-way AMD Opteron 875 (dual core)

From
Dirk Lutzebäck
Date:
[[I'm posting this on behalf of my co-worker who cannot post to this list at the moment]]

Hi,

I had installed PostgreSQL on a 4-way AMD Opteron 875 (dual core) and the performance isn't on the expected level.

Details:
The "old" server is a 4-way XEON MP 3.0 GHz with 4MB L3 cache, 32 GB RAM (PC1600)  and local FC-RAID 10. Hyper-Threading is off. (DL580)
The "old" server is using Red Hat Enterprise Linux 3 Update 5.
The "new" server is a 4-way Opteron 875 with 1 MB L2 cache, 32 GB RAM (PC3200) and the same local FC-RAID 10. (HP DL585)
The "new" server is using Red Hat Enterprise Linux 4 (with the latest x86_64 kernel from Red Hat - 2.6.9-11.ELsmp #1 SMP Fri May 20 18:25:30 EDT 2005 x86_64)
I use PostgreSQL version 8.0.3.

The issue is that the Opteron is slower as the XEON MP under high load. I have created a test with parallel queries which are typical for my application. The queries are in a range of small queries (0.1 seconds) and larger queries using join (15 seconds).
The test starts parallel clients. Each clients runs the queries in a random order. The test takes care that a client use always the same random order to get valid results.

Here are the number of queries which the server has finished in a fix period of time.
I used PostgreSQL 8.1 snapshot from last week compiled as 64bit binary for DL585-64bit.
I used PostgreSQL 8.0.3 compiled as 32bit binary for DL585-32bit and DL580.
During the tests everything which is needed is in the file cache. I didn't have read activity.
Context switch  spikes are over 50000 during the test on both server. My feeling is that the XEON has a tick more context switches.



PostgreSQL params:
max_locks_per_transaction = 256
shared_buffers = 40000
effective_cache_size = 3840000
work_mem = 300000
maintenance_work_mem = 512000
wal_buffers = 32
checkpoint_segments = 24


I was expecting two times more queries on the DL585. The DL585 with PostgreSQL 8.0.3 32bit does meltdown earlier as the XEON in production use. Please compare 4 clients and 8 clients. With 4 clients the Opteron is in front and with 8 clients the XEON doesn't meltdown that much as the Opteron.

I don't have any idea what cause this. Benchmarks like SAP's SD 2-tier showing that the DL585 can handle nearly three times more load as the DL580 with XEON 3.0. We choose the 4-way Opteron 875 based on such benchmark to replace the 4-way XEON MP.

Does anyone have comments or ideas on which I have to focus my work?

I guess, the shared buffer cause the meltdown when to many clients are accessing the same data.
I didn't understand why the 4-way XEON MP 3.0 can deal with this better as the 4-way Opteron 875.
The system load on the Opteron is never over 3.0. The XEON MP has a load up to 4.0.

Should I try other settings for PostgreSQL in postgresql.conf?
Should I try other setting for the compilation?

I will compile the latest PostgreSQL 8.1 snapshot for 32bit to evaluate the new shared buffer code from Tom.
I think, the 64bit is slow because my queries are CPU intensive.

Can someone provide a commercial support contact for this issue?

Sven.

Re: Performance problems on 4-way AMD Opteron 875 (dual core)

From
Michael Stone
Date:
On Fri, Aug 05, 2005 at 01:11:31PM +0200, Dirk Lutzebäck wrote:
>I will compile the latest PostgreSQL 8.1 snapshot for 32bit to evaluate
>the new shared buffer code from Tom.
>I think, the 64bit is slow because my queries are CPU intensive.

Have you actually tried it or are you guessing? If you're guessing, then
compile it as a 64 bit binary and benchmark that.

Mike Stone

Re: Performance problems on 4-way AMD Opteron 875 (dual

From
Dirk Lutzebäck
Date:
Michael Stone wrote:

> On Fri, Aug 05, 2005 at 01:11:31PM +0200, Dirk Lutzebäck wrote:
>
>> I will compile the latest PostgreSQL 8.1 snapshot for 32bit to
>> evaluate the new shared buffer code from Tom.
>> I think, the 64bit is slow because my queries are CPU intensive.
>
>
> Have you actually tried it or are you guessing? If you're guessing, then
> compile it as a 64 bit binary and benchmark that.
>
> Mike Stone

We tried it. 64bit 8.1dev was slower than 32bit 8.0.3.

Dirk

Re: Performance problems on 4-way AMD Opteron 875 (dual core)

From
Tom Lane
Date:
=?ISO-8859-1?Q?Dirk_Lutzeb=E4ck?= <lutzeb@aeccom.com> writes:
> Here are the number of queries which the server has finished in a fix
> period of time.

Uh, you never actually supplied any numbers (or much of any other
specifics about what was tested, either).

My first reaction is "don't vary more than one experimental parameter at
a time".  There is no way to tell whether the discrepancy is due to the
different hardware, different Postgres version, or 32-bit vs 64-bit
build.

            regards, tom lane