Re: Poor performance on HP Package Cluster - Mailing list pgsql-performance

From Ron
Subject Re: Poor performance on HP Package Cluster
Date
Msg-id 6.2.3.4.0.20050901085555.0207ae30@pop.earthlink.net
Whole thread Raw
In response to Poor performance on HP Package Cluster  (Ernst Einstein <Crusader@gmx.ch>)
List pgsql-performance
Your HD raw IO rate seems fine, so the problem is not likely to be
with the HDs.

That consistent ~10x increase in how long it takes to do an import or
a select is noteworthy.

This "smells" like an interconnect problem.  Was the Celeron locally
connected to the HDs while the new Xeons are network
connected?  Getting 10's or even 100's of MBps throughput out of
local storage is much easier than it is to do over a network.  1GbE
is required if you want HDs to push 72.72MBps over a network, and not
even one 10GbE line will allow you to match local buffered IO of
1885.34MBps.  What size are those network connects (Server A <->
storage, Server B <-> storage, Server A <-> Server B)?

Ron Peacetree


At 10:16 AM 9/1/2005, Ernst Einstein wrote:

>I've set up a Package Cluster ( Fail-Over Cluster ) on our two HP
>DL380 G4 with MSA Storage G2.( Xeon 3,4Ghz, 6GB Ram, 2x 36GB@15rpm-
>Raid1).  The system is running under Suse Linux Enterprise Server.
>
>My problem is, that the performance is very low. On our old Server (
>Celeron 2Ghz with 2 GB of Ram ) an import of our Data takes about 10
>minutes. ( 1,1GB data ).  One of the DL380 it takes more than 90 minutes...
>Selects response time have also been increased. Celeron 3 sec, Xeon 30-40sec.
>
>I'm trying to fix the problem for two day's now, googled a lot, but
>i don't know what to do.
>
>Top says, my CPU spends ~50% time with wait io.
>
>top - 14:07:34 up 22 min,  3 users,  load average: 1.09, 1.04, 0.78
>Tasks:  74 total,   3 running,  71 sleeping,   0 stopped,   0 zombie
>Cpu(s): 50.0% us,  5.0% sy,  0.0% ni,  0.0% id, 45.0% wa,  0.0% hi,  0.0% si
>Mem:   6050356k total,   982004k used,  5068352k free,    60300k buffers
>Swap:  2097136k total,        0k used,  2097136k free,   786200k cached
>
>   PID USER      PR  NI  VIRT  RES   SHR S %CPU
> %MEM  TIME+COMMAND
>  9939 postgres   18   0  254m 143m 140m
> R   49.3      2.4    8:35.43 postgres:postgres plate [local]
> INSERT
>  9938 postgres   16   0 13720 1440   1120
> S     4.9      0.0    0:59.08 psql -d plate -f
> dump.sql
>10738 root         15   0  3988  1120     840
>R     4.9      0.0    0:00.05 top -d
>0.2
>        1 root         16   0   640    264     216
> S      0.0      0.0   0:05.03
> init[3]
>        2 root         34  19      0       0         0
> S      0.0      0.0   0:00.00 [ksoftirqd/0]
>
>vmstat 1:
>
>ClusterNode2 root $ vmstat 1
>procs -----------memory---------- ---swap-- -----io---- --system------cpu----
>  r  b   swpd   free        buff   cache    si   so    bi      bo
>   in      cs us sy id wa
>  1  0      0 5032012  60888 821008    0    0   216  6938 1952  5049
> 40  8 15 37
>  0  1      0 5031392  60892 821632    0    0       0  8152
> 2126  5725 45  6  0 49
>  0  1      0 5030896  60900 822144    0    0       0  8124
> 2052  5731 46  6  0 47
>  0  1      0 5030400  60908 822768    0    0       0  8144
> 2124  5717 44  7  0 50
>  1  0      0 5029904  60924 823272    0    0       0  8304
> 2062  5763 43  7  0 49
>
>I've read (2004), that Xeon may have problems with content switching
>- is the problem still existing? Can I do something to minimize the
>problem?
>
>
>postgresql.conf:
>
>shared_buffers = 28672
>effective_cache_size = 400000
>random_page_cost = 2
>
>
>shmall & shmmax are set to 268435456
>
>hdparm:
>
>ClusterNode2 root $ hdparm -tT /dev/cciss/c0d0p1
>
>/dev/cciss/c0d0p1:
>Timing buffer-cache reads: 3772 MB in 2.00 seconds = 1885.34 MB/sec
>Timing buffered disk reads: 150 MB in 2.06 seconds = 72.72 MB/sec




pgsql-performance by date:

Previous
From: Ulrich Wisser
Date:
Subject: Need for speed 3
Next
From: "Merlin Moncure"
Date:
Subject: Re: Need for speed 3