Thread: high load on server

high load on server

From
Gerd König
Date:
Hello,

since 2 days ago we're facing an increased load on our database server
(opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load stays the whole
working day.
==================
current situation:
==================
#>top
top - 14:09:46 up 40 days,  8:08,  2 users,  load average: 7.60, 7.46, 7.13
...
Mem:   8194596k total,  5716680k used,  2477916k free,   185516k buffers
Swap:  4200988k total,      204k used,  4200784k free,  5041448k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17478 postgres  15   0  610m 455m 444m R   52  5.7   0:08.78 postmaster
17449 postgres  15   0  606m 497m 489m S   37  6.2   0:16.35 postmaster
22541 postgres  16   0  607m 522m 516m R   31  6.5 123:25.17 postmaster
17491 postgres  15   0  618m 447m 435m S   22  5.6   0:03.97 postmaster
17454 postgres  15   0  616m 474m 457m S   18  5.9   0:15.88 postmaster
22547 postgres  15   0  608m 534m 527m S   18  6.7 100:12.01 postmaster
17448 postgres  16   0  616m 517m 501m S   17  6.5   0:15.60 postmaster
17451 postgres  15   0  611m 491m 479m S   11  6.1   0:25.04 postmaster
17490 postgres  15   0  606m 351m 344m S   10  4.4   0:02.69 postmaster
22540 postgres  15   0  607m 520m 513m S    2  6.5  33:46.47 postmaster
17489 postgres  15   0  604m 316m 311m S    2  4.0   0:03.34 postmaster


I assume the problem is caused by heavy writing slows down the
server....?!?...why?=>

1.) there are no long running queries:
SELECT   current_query, COUNT(current_query)
        FROM     pg_stat_activity
        WHERE    query_start < now() - interval '1 min'
        AND      current_query != '<IDLE>'
        GROUP BY current_query;
 current_query | count
---------------+-------
(0 Zeilen)


2.) we get wal archives written every 2-3min.
3.) we have no high-performant hardware layout, data and log on the same disk=>
#>iostat 2 5
Linux 2.6.22.5-31-default    03.04.2009
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              13,42        38,57       391,86  134436221 1365849137
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              36,21         0,00       994,02          0       2992
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              67,67         0,00      1621,33          0       4864
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              40,00         0,00       989,33          0       2968
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              26,91        18,60       948,84         56       2856

#>vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 5  0    204 2449652 185692 5046168    0    0     2    24    1    1  2  0 95  2
 3  0    204 2448496 185692 5046184    0    0     0   518 2984 18683 24  5 65  6
 3  0    204 2430864 185692 5046192    0    0     0   344 2083 10004 34  3 58  5
 2  0    204 2434600 185700 5046200    0    0     0   386 2084 23592 33  3 57  7
 3  0    204 2425612 185700 5046220    0    0     0   372 2352 2905 36  2 57  5
 5  0    204 2424828 185700 5046256    0    0     0   600 2372 33094 36 12 48  4
 4  0    204 2405516 185700 5046256    0    0     4   992 1747 29035 33  8 52  6
 3  0    204 2419368 185708 5046272    0    0     4   660 2735 24732 36  7 51  6
 2  0    204 2419244 185712 5046296    0    0     0   360 2251 3193  9  1 84  5
 3  0    204 2407096 185712 5046296    0    0     0   332 2319 3269 20  3 72  5

Can I check further system/database details ?

Can we lower the load by reducing the amount of written wal archives, is this
somehow possible ?

Since buying and installing new hardware is a huge effort any other solutions
highly welcome :-))

thanks in advance...GERD...

Re: high load on server

From
Scott Marlowe
Date:
2009/4/3 Gerd König <koenig@transporeon.com>:
> Hello,
>
> since 2 days ago we're facing an increased load on our database server
> (opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load stays the whole
> working day.

How man cores?

> ==================
> current situation:
> ==================
> #>top
> top - 14:09:46 up 40 days,  8:08,  2 users,  load average: 7.60, 7.46, 7.13
> ...
> Mem:   8194596k total,  5716680k used,  2477916k free,   185516k buffers
> Swap:  4200988k total,      204k used,  4200784k free,  5041448k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 17478 postgres  15   0  610m 455m 444m R   52  5.7   0:08.78 postmaster
> 17449 postgres  15   0  606m 497m 489m S   37  6.2   0:16.35 postmaster
> 22541 postgres  16   0  607m 522m 516m R   31  6.5 123:25.17 postmaster
> 17491 postgres  15   0  618m 447m 435m S   22  5.6   0:03.97 postmaster
> 17454 postgres  15   0  616m 474m 457m S   18  5.9   0:15.88 postmaster
> 22547 postgres  15   0  608m 534m 527m S   18  6.7 100:12.01 postmaster
> 17448 postgres  16   0  616m 517m 501m S   17  6.5   0:15.60 postmaster
> 17451 postgres  15   0  611m 491m 479m S   11  6.1   0:25.04 postmaster
> 17490 postgres  15   0  606m 351m 344m S   10  4.4   0:02.69 postmaster
> 22540 postgres  15   0  607m 520m 513m S    2  6.5  33:46.47 postmaster
> 17489 postgres  15   0  604m 316m 311m S    2  4.0   0:03.34 postmaster

Next time hit c first to see what the postmasters are up to.

> I assume the problem is caused by heavy writing slows down the
> server....?!?...why?=>

The problem might be that you're assuming there's a problem.  Looking
at the rest of your diags, you're data set fits in memory, I/O wait is
< 10% and there are no processes waiting for a CPU to free up, they're
all running.

Looks healthy to me.

Re: high load on server

From
Gerd Koenig
Date:
Hello Scott,

thanks for answering.

Scott Marlowe schrieb:
> 2009/4/3 Gerd König <koenig@transporeon.com>:
>> Hello,
>>
>> since 2 days ago we're facing an increased load on our database server
>> (opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load stays the whole
>> working day.
>
> How man cores?

The server contains two
"model name    : Intel(R) Xeon(R) CPU           X5355  @ 2.66GHz"
CPU's, thereby 8 cores...

>
>> ==================
>> current situation:
>> ==================
>> #>top
>> top - 14:09:46 up 40 days,  8:08,  2 users,  load average: 7.60, 7.46, 7.13
>> ...
>> Mem:   8194596k total,  5716680k used,  2477916k free,   185516k buffers
>> Swap:  4200988k total,      204k used,  4200784k free,  5041448k cached
>>
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 17478 postgres  15   0  610m 455m 444m R   52  5.7   0:08.78 postmaster
>> 17449 postgres  15   0  606m 497m 489m S   37  6.2   0:16.35 postmaster
>> 22541 postgres  16   0  607m 522m 516m R   31  6.5 123:25.17 postmaster
>> 17491 postgres  15   0  618m 447m 435m S   22  5.6   0:03.97 postmaster
>> 17454 postgres  15   0  616m 474m 457m S   18  5.9   0:15.88 postmaster
>> 22547 postgres  15   0  608m 534m 527m S   18  6.7 100:12.01 postmaster
>> 17448 postgres  16   0  616m 517m 501m S   17  6.5   0:15.60 postmaster
>> 17451 postgres  15   0  611m 491m 479m S   11  6.1   0:25.04 postmaster
>> 17490 postgres  15   0  606m 351m 344m S   10  4.4   0:02.69 postmaster
>> 22540 postgres  15   0  607m 520m 513m S    2  6.5  33:46.47 postmaster
>> 17489 postgres  15   0  604m 316m 311m S    2  4.0   0:03.34 postmaster
>
> Next time hit c first to see what the postmasters are up to.

good hint, I'll perform this the next time the server runs under higher
load (probably on monday...)

>
>> I assume the problem is caused by heavy writing slows down the
>> server....?!?...why?=>
>
> The problem might be that you're assuming there's a problem.  Looking
> at the rest of your diags, you're data set fits in memory, I/O wait is
> < 10% and there are no processes waiting for a CPU to free up, they're
> all running.
>
> Looks healthy to me.
Perfect, probably our customers didn't work that much in the past, but
now they do ;-)

kind regards...:GERD:...


Re: high load on server

From
Scott Marlowe
Date:
On Fri, Apr 3, 2009 at 12:35 PM, Gerd Koenig <koenig@transporeon.com> wrote:
>> The problem might be that you're assuming there's a problem.  Looking
>> at the rest of your diags, you're data set fits in memory, I/O wait is
>> < 10% and there are no processes waiting for a CPU to free up, they're
>> all running.
>>
>> Looks healthy to me.
>
> Perfect, probably our customers didn't work that much in the past, but now
> they do ;-)

Well, it looks like you're about halfway to where you're gonna have to
start improving your hardware / using slony read slaves / using
memcached or something like that to handle the extra load.  Keep an
eye on your wait%.  If that starts climbing and vmstat shows more and
more bo going to your drives, then you'll need to improve your I/O
subsystem to keep up with the load.

Re: high load on server

From
Erik Jones
Date:
On Apr 3, 2009, at 7:32 AM, Scott Marlowe wrote:

> 2009/4/3 Gerd König <koenig@transporeon.com>:
>> Hello,
>>
>> since 2 days ago we're facing an increased load on our database
>> server
>> (opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load
>> stays the whole
>> working day.
>
> How man cores?
>
>> ==================
>> current situation:
>> ==================
>> #>top
>> top - 14:09:46 up 40 days,  8:08,  2 users,  load average: 7.60,
>> 7.46, 7.13
>> ...
>> Mem:   8194596k total,  5716680k used,  2477916k free,   185516k
>> buffers
>> Swap:  4200988k total,      204k used,  4200784k free,  5041448k
>> cached
>>
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 17478 postgres  15   0  610m 455m 444m R   52  5.7   0:08.78
>> postmaster
>> 17449 postgres  15   0  606m 497m 489m S   37  6.2   0:16.35
>> postmaster
>> 22541 postgres  16   0  607m 522m 516m R   31  6.5 123:25.17
>> postmaster
>> 17491 postgres  15   0  618m 447m 435m S   22  5.6   0:03.97
>> postmaster
>> 17454 postgres  15   0  616m 474m 457m S   18  5.9   0:15.88
>> postmaster
>> 22547 postgres  15   0  608m 534m 527m S   18  6.7 100:12.01
>> postmaster
>> 17448 postgres  16   0  616m 517m 501m S   17  6.5   0:15.60
>> postmaster
>> 17451 postgres  15   0  611m 491m 479m S   11  6.1   0:25.04
>> postmaster
>> 17490 postgres  15   0  606m 351m 344m S   10  4.4   0:02.69
>> postmaster
>> 22540 postgres  15   0  607m 520m 513m S    2  6.5  33:46.47
>> postmaster
>> 17489 postgres  15   0  604m 316m 311m S    2  4.0   0:03.34
>> postmaster
>
> Next time hit c first to see what the postmasters are up to.
>
>> I assume the problem is caused by heavy writing slows down the
>> server....?!?...why?=>
>
> The problem might be that you're assuming there's a problem.  Looking
> at the rest of your diags, you're data set fits in memory, I/O wait is
> < 10% and there are no processes waiting for a CPU to free up, they're
> all running.
>
> Looks healthy to me.

Eh?  His run queue constantly has procs waiting for run time, although
I've seen higher.  That with a distinct lack of heavy IO says cpu
bound to me...

#>vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ----
cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
id wa
5  0    204 2449652 185692 5046168    0    0     2    24    1    1  2
0 95  2
3  0    204 2448496 185692 5046184    0    0     0   518 2984 18683
24  5 65  6
3  0    204 2430864 185692 5046192    0    0     0   344 2083 10004
34  3 58  5
2  0    204 2434600 185700 5046200    0    0     0   386 2084 23592
33  3 57  7
3  0    204 2425612 185700 5046220    0    0     0   372 2352 2905 36
2 57  5
5  0    204 2424828 185700 5046256    0    0     0   600 2372 33094 36
12 48  4
4  0    204 2405516 185700 5046256    0    0     4   992 1747 29035
33  8 52  6
3  0    204 2419368 185708 5046272    0    0     4   660 2735 24732
36  7 51  6
2  0    204 2419244 185712 5046296    0    0     0   360 2251 3193  9
1 84  5
3  0    204 2407096 185712 5046296    0    0     0   332 2319 3269 20
3 72  5

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k






Re: high load on server

From
Scott Marlowe
Date:
On Fri, Apr 3, 2009 at 4:13 PM, Erik Jones <ejones@engineyard.com> wrote:
>
> On Apr 3, 2009, at 7:32 AM, Scott Marlowe wrote:
>
>> 2009/4/3 Gerd König <koenig@transporeon.com>:
>>>
>>> Hello,
>>>
>>> since 2 days ago we're facing an increased load on our database server
>>> (opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load stays the
>>> whole
>>> working day.
>>
>> How man cores?
>>
>>> ==================
>>> current situation:
>>> ==================
>>> #>top
>>> top - 14:09:46 up 40 days,  8:08,  2 users,  load average: 7.60, 7.46,
>>> 7.13
>>> ...
>>> Mem:   8194596k total,  5716680k used,  2477916k free,   185516k buffers
>>> Swap:  4200988k total,      204k used,  4200784k free,  5041448k cached
>>>
>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 17478 postgres  15   0  610m 455m 444m R   52  5.7   0:08.78 postmaster
>>> 17449 postgres  15   0  606m 497m 489m S   37  6.2   0:16.35 postmaster
>>> 22541 postgres  16   0  607m 522m 516m R   31  6.5 123:25.17 postmaster
>>> 17491 postgres  15   0  618m 447m 435m S   22  5.6   0:03.97 postmaster
>>> 17454 postgres  15   0  616m 474m 457m S   18  5.9   0:15.88 postmaster
>>> 22547 postgres  15   0  608m 534m 527m S   18  6.7 100:12.01 postmaster
>>> 17448 postgres  16   0  616m 517m 501m S   17  6.5   0:15.60 postmaster
>>> 17451 postgres  15   0  611m 491m 479m S   11  6.1   0:25.04 postmaster
>>> 17490 postgres  15   0  606m 351m 344m S   10  4.4   0:02.69 postmaster
>>> 22540 postgres  15   0  607m 520m 513m S    2  6.5  33:46.47 postmaster
>>> 17489 postgres  15   0  604m 316m 311m S    2  4.0   0:03.34 postmaster
>>
>> Next time hit c first to see what the postmasters are up to.
>>
>>> I assume the problem is caused by heavy writing slows down the
>>> server....?!?...why?=>
>>
>> The problem might be that you're assuming there's a problem.  Looking
>> at the rest of your diags, you're data set fits in memory, I/O wait is
>> < 10% and there are no processes waiting for a CPU to free up, they're
>> all running.
>>
>> Looks healthy to me.
>
> Eh?  His run queue constantly has procs waiting for run time, although I've
> seen higher.  That with a distinct lack of heavy IO says cpu bound to me...

How do you see that? He's got 50% or so idle, and is running fewer
processes than he has cores.

Re: high load on server

From
Scott Marlowe
Date:
On Fri, Apr 3, 2009 at 4:13 PM, Erik Jones <ejones@engineyard.com> wrote:
> Eh?  His run queue constantly has procs waiting for run time, although I've
> seen higher.  That with a distinct lack of heavy IO says cpu bound to me...

I just pulled up the linux man page and it says that r is the number
of processes waiting to run.  This isn't entirely correct.  A BSD or
Solaris man page more correctly identifies it as the number of
processes running OR waiting to run and that if this number exceeds
the number of cores then the number by which is exceeds it is how big
of a queue there is.