High load average in 64-core server , no I/O wait and CPU is idle - Mailing list pgsql-performance

From Rajesh Kumar. Mallah
Subject High load average in 64-core server , no I/O wait and CPU is idle
Date
Msg-id 33304216.187761337830749805.JavaMail.root@zmbox01.trade-india-local.com
Whole thread Raw
Responses Re: High load average in 64-core server , no I/O wait and CPU is idle
Re: High load average in 64-core server , no I/O wait and CPU is idle
List pgsql-performance
Dear List ,

We are having scalability issues with a high end hardware

The  hardware is
CPU  = 4 *  opteron 6272 with 16 cores ie Total = 64 cores.
RAM  = 128 GB DDR3
Disk = High performance RAID10 with lots of 15K spindles and a working BBU Cache.

normally the 1 min load average of the system remains between 0.5 to 1.0 .

The problem is that  sometimes there are spikes of load avg which
jumps to > 50 very rapidly ( ie from 0.5 to 50  within 10 secs) and
it remains there for sometime and slowly reduces to normal value.

During such times of high load average we observe that there is no IO wait
in system and even CPU is 50% idle. In any case the IO Wait always remains < 1.0 % and
is mostly 0. Hence the load is not due to high I/O wait which was generally
the case with our previous hardware.

We are  puzzled why the CPU and DISK I/O system are not being utilized
fully and would seek lists' wisdom on that.

We have setup sar to poll the system parameters every minute and
the data of which is graphed with cacti. If required any of the
system parameters or postgresql parameter can easily be  put under
cacti monitoring and can be graphed.

The query load is mostly read only.

It is also possible to replicate the problem with pg_bench to some
extent . I choose -s = 100 and -t=10000 , the load does shoot but not
that spectacularly as achieved by the real world usage.

any help shall be greatly appreciated.

just a thought, will it be a good idea to partition the host hardware
to 4 equal  virtual environments , ie 1 for master (r/w) and 3 slaves r/o
and distribute the r/o load on the 3 slaves ?


regds
mallah

pgsql-performance by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: local-storage versus SAN sequential read performance comparison
Next
From: Claudio Freire
Date:
Subject: Re: High load average in 64-core server , no I/O wait and CPU is idle