Thread: How to analyze load average ?

How to analyze load average ?

From
Condor
Date:
Hello,

can some tell me, how I can analyze from where my server bring up load
average ?

I have one server with 128 GB memory, 32 CPU x86_64, RAID5 - 3 15k SAS
HDD ext4 fs. That is my produce server,
also is configured to send wal files over the net. Here is my
configuration:


max_connections = 500
shared_buffers = 32GB
work_mem = 192MB
maintenance_work_mem = 6GB
max_stack_depth = 6MB
bgwriter_delay = 200ms
bgwriter_lru_maxpages = 100
bgwriter_lru_multiplier = 2.0
wal_level = hot_standby
fsync = on
synchronous_commit = on
wal_sync_method = fdatasync
full_page_writes = on
wal_buffers = -1
checkpoint_segments = 32
checkpoint_timeout = 5min
checkpoint_completion_target = 0.5
max_wal_senders = 5
wal_sender_delay = 1s
wal_keep_segments = 64

enable_bitmapscan = on
enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on
enable_material = on
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on
enable_tidscan = on

seq_page_cost = 1.0
random_page_cost = 2.0
cpu_tuple_cost = 0.01
cpu_index_tuple_cost = 0.005
cpu_operator_cost = 0.0025
effective_cache_size = 64GB

autovacuum = on


My on board raid cache write trough is OFF.

When I connect to server i see only 2 query with select * from
pg_stat_activity;
that is not complicated, select rid from table where id = 1;
Both tables have index on most frequently columns. When I check my
server load average is 0.88 0.94 0.87
Im trying to check from where that load avg is so high, only postgres
9.1.4 is working on that server.

Can some one point me from where I should start digging ? I think my
configuration about connections, shared buffers is right as I read
documentation,
I think this slow down can be because mu cache is on the raid card is
OFF. As I read on postgres wiki pages,
if I turn ON that setting on some fall I might lost some of my data,
well the company has UPS and I also have stream replicator so I won't
lose much data.

My iostat show:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0.90    0.00    1.06    0.00    0.00   98.04

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            1.92    0.00    1.06    0.00    0.00   97.02

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0


And my vmstat:

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
id wa
  0  0      0 99307408 334300 31144708    0    0     1    18    1    0
1  1 98  0
  0  0      0 99303808 334300 31144716    0    0     0     0  926  715
0  0 99  0
  0  0      0 99295232 334300 31144716    0    0     0     0  602  532
0  0 99  0
  4  0      0 99268160 334300 31144716    0    0     0    32  975  767
2  2 96  0
  1  0      0 99298544 334300 31144716    0    0     0     0  801  445
3  2 95  0
  0  0      0 99311336 334300 31144716    0    0     0     0  320  175
1  0 98  0
  2  0      0 99298920 334300 31144716    0    0     0     0 1195  996
1  1 97  0
  0  0      0 99307184 334300 31144716    0    0     0     0  843  645
0  1 98  0
  0  0      0 99301024 334300 31144716    0    0     0    12 1346 1040
2  2 96  0

Any one can tell me how I can find from where that load average is so
high ?

Thanks


Re: How to analyze load average ?

From
"Tomas Vondra"
Date:
On 6 Srpen 2012, 16:23, Condor wrote:
> Hello,
>
> can some tell me, how I can analyze from where my server bring up load
> average ?
>
> ...
>
> When I connect to server i see only 2 query with select * from
> pg_stat_activity;
> that is not complicated, select rid from table where id = 1;
> Both tables have index on most frequently columns. When I check my
> server load average is 0.88 0.94 0.87
>
>...
>
> Any one can tell me how I can find from where that load average is so
> high ?

Errr, what? Why do you think the load average is high?

Load average is defined as a number of processes in the run queue (i.e.
using or waiting for a CPU). So the load average "0.88 0.94 0.87" means
there was less than one process waiting for CPU most of the time. I
wouldn't call that "high load average", especially not on a 32-core
system.

Tomas


Re: How to analyze load average ?

From
Mark Felder
Date:
On Mon, 06 Aug 2012 09:38:33 -0500, Tomas Vondra <tv@fuzzy.cz> wrote:

> Load average is defined as a number of processes in the run queue

That depends on if he's running Linux or BSD.

http://www.undeadly.org/cgi?action=article&sid=20090715034920

Re: How to analyze load average ?

From
"Tomas Vondra"
Date:
On 6 Srpen 2012, 16:54, Mark Felder wrote:
> On Mon, 06 Aug 2012 09:38:33 -0500, Tomas Vondra <tv@fuzzy.cz> wrote:
>
>> Load average is defined as a number of processes in the run queue
>
> That depends on if he's running Linux or BSD.
>
> http://www.undeadly.org/cgi?action=article&sid=20090715034920

Well, even this link states that "... most unixen load average is some
measure of the size of the run queue - or the number of runnable processes
over a set period" and in this sense what I said is true even on BSD
systems. But you're right, the definitions are a bit different.

Although the OP mentioned he's using ext4, so I suppose he's running Linux
(although I know there was some ext4 support e.g. in FreeBSD).

Still, the load average 0.88 means the system is almost idle, especially
when there's no I/O activity etc.

Tomas


Re: How to analyze load average ?

From
Mark Felder
Date:
On Mon, 06 Aug 2012 10:27:18 -0500, Tomas Vondra <tv@fuzzy.cz> wrote:

>
> Although the OP mentioned he's using ext4, so I suppose he's running
> Linux
> (although I know there was some ext4 support e.g. in FreeBSD).
> Still, the load average 0.88 means the system is almost idle, especially
> when there's no I/O activity etc.

Ahh, I didn't see the mention of ext4 initially. I tend to just use iostat
for getting a better baseline of what's truly happening on the system. At
least on FreeBSD (not sure of Linux at the moment) the iostat output also
lists CPU usage in the last columns and if "id" (idle) is not close to
zero it's probably OK. :-)

Re: How to analyze load average ?

From
Condor
Date:
On 2012-08-06 17:38, Tomas Vondra wrote:
> On 6 Srpen 2012, 16:23, Condor wrote:
>> Hello,
>>
>> can some tell me, how I can analyze from where my server bring up
>> load
>> average ?
>>
>> ...
>>
>> When I connect to server i see only 2 query with select * from
>> pg_stat_activity;
>> that is not complicated, select rid from table where id = 1;
>> Both tables have index on most frequently columns. When I check my
>> server load average is 0.88 0.94 0.87
>>
>>...
>>
>> Any one can tell me how I can find from where that load average is
>> so
>> high ?
>
> Errr, what? Why do you think the load average is high?
>
> Load average is defined as a number of processes in the run queue
> (i.e.
> using or waiting for a CPU). So the load average "0.88 0.94 0.87"
> means
> there was less than one process waiting for CPU most of the time. I
> wouldn't call that "high load average", especially not on a 32-core
> system.
>
> Tomas


I think load avg is high because before I change the servers my produce
server
was on 16 cpu, 24 gb memory and load avg on that server was 0.24.
Database is the same,
users that use the server is the same, nothing is changed. I dump the
DB from old server
and import it to new one before few days ago and because that is the
new server with more
resource I monitor his load avg and I think is too high. For that
reason Im asking is there
a way to detect why my load avg is 0.88. When I run select * from
pg_stat_activity;
did not see more then 3-4 query that isn't much complicated and I
already try them with
explain to see what is the result.

I know what load average mean, I was OpenBSD user a few years, now I
use Slackware with kernel 3.5.


Hristo

Re: How to analyze load average ?

From
"Kevin Grittner"
Date:
Condor <condor@stz-bg.com> wrote:

> For that reason Im asking is there a way to detect why my load avg
> is 0.88. When I run select * from pg_stat_activity;

So, on a 32 core system if you run vmstat or iostat with a short
interval during such an episode, you should be seeing about 97% idle
time for your CPUs.  If you want to know what's sucking up the other
3%, you might want to try oprofile.

-Kevin

Re: How to analyze load average ?

From
Tomas Vondra
Date:
> I think load avg is high because before I change the servers my produce
> server
> was on 16 cpu, 24 gb memory and load avg on that server was 0.24.
> Database is the same,
> users that use the server is the same, nothing is changed. I dump the DB
> from old server
> and import it to new one before few days ago and because that is the new
> server with more
> resource I monitor his load avg and I think is too high. For that reason
> Im asking is there
> a way to detect why my load avg is 0.88. When I run select * from
> pg_stat_activity;
> did not see more then 3-4 query that isn't much complicated and I
> already try them with
> explain to see what is the result.

Well, the load average is a bit difficult to analyze because of the
exponential damping. Also, I find it a bit artificial and if there are
no sudden peaks or slowdowns I wouldn't bother analyzing this.

A wild quess is that the new server has more CPUs but at lower
frequency, therefore the tasks run longer and impact the load average
accordingly. There are other such things (e.g. maintenance of larger
shared buffers takes more time).

Have you verified that the performance of the new hardware matches
expectations and that it's actually faster than the old server?

> I know what load average mean, I was OpenBSD user a few years, now I
> use Slackware with kernel 3.5.

So you do have 3.5 on production? Wow, you're quite adventurous.

Tomas

Re: How to analyze load average ?

From
Martijn van Oosterhout
Date:
On Mon, Aug 06, 2012 at 08:06:05PM +0300, Condor wrote:
> I think load avg is high because before I change the servers my
> produce server
> was on 16 cpu, 24 gb memory and load avg on that server was 0.24.
> Database is the same,

Our monitoring system starts worrying about the load average if it ever
goes above 0.75*number of cores. In your example it looks a bit like
you paid for 15 more cores than necessary.

Especially at the lower end you have to take the load with a large
grain of salt.  Lots of short running processes (like a make run) while
make the load fluctuate.  But even things like it taking a while for
your disk cache to reach steady state after a reboot can mean that you
see a higher than normal load for a while.

But 0.88 is really nothing to worry about. Perhaps it is just slower
core or a slower memory bus.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment

Re: How to analyze load average ?

From
Condor
Date:
On , Tomas Vondra wrote:
>> I think load avg is high because before I change the servers my
>> produce
>> server
>> was on 16 cpu, 24 gb memory and load avg on that server was 0.24.
>> Database is the same,
>> users that use the server is the same, nothing is changed. I dump
>> the DB
>> from old server
>> and import it to new one before few days ago and because that is the
>> new
>> server with more
>> resource I monitor his load avg and I think is too high. For that
>> reason
>> Im asking is there
>> a way to detect why my load avg is 0.88. When I run select * from
>> pg_stat_activity;
>> did not see more then 3-4 query that isn't much complicated and I
>> already try them with
>> explain to see what is the result.
>
> Well, the load average is a bit difficult to analyze because of the
> exponential damping. Also, I find it a bit artificial and if there
> are
> no sudden peaks or slowdowns I wouldn't bother analyzing this.
>
> A wild quess is that the new server has more CPUs but at lower
> frequency, therefore the tasks run longer and impact the load average
> accordingly. There are other such things (e.g. maintenance of larger
> shared buffers takes more time).
>
> Have you verified that the performance of the new hardware matches
> expectations and that it's actually faster than the old server?
>
>> I know what load average mean, I was OpenBSD user a few years, now I
>> use Slackware with kernel 3.5.
>
> So you do have 3.5 on production? Wow, you're quite adventurous.

Yep, that's me :)

>
> Tomas


Hello to every one again,
sorry for my late replay but I found the problem (I think).
I change the Default IO scheduler from (No-op) to Deadline and
my load average dropped down to 0.23