Thread: high load on server
Hello, since 2 days ago we're facing an increased load on our database server (opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load stays the whole working day. ================== current situation: ================== #>top top - 14:09:46 up 40 days, 8:08, 2 users, load average: 7.60, 7.46, 7.13 ... Mem: 8194596k total, 5716680k used, 2477916k free, 185516k buffers Swap: 4200988k total, 204k used, 4200784k free, 5041448k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17478 postgres 15 0 610m 455m 444m R 52 5.7 0:08.78 postmaster 17449 postgres 15 0 606m 497m 489m S 37 6.2 0:16.35 postmaster 22541 postgres 16 0 607m 522m 516m R 31 6.5 123:25.17 postmaster 17491 postgres 15 0 618m 447m 435m S 22 5.6 0:03.97 postmaster 17454 postgres 15 0 616m 474m 457m S 18 5.9 0:15.88 postmaster 22547 postgres 15 0 608m 534m 527m S 18 6.7 100:12.01 postmaster 17448 postgres 16 0 616m 517m 501m S 17 6.5 0:15.60 postmaster 17451 postgres 15 0 611m 491m 479m S 11 6.1 0:25.04 postmaster 17490 postgres 15 0 606m 351m 344m S 10 4.4 0:02.69 postmaster 22540 postgres 15 0 607m 520m 513m S 2 6.5 33:46.47 postmaster 17489 postgres 15 0 604m 316m 311m S 2 4.0 0:03.34 postmaster I assume the problem is caused by heavy writing slows down the server....?!?...why?=> 1.) there are no long running queries: SELECT current_query, COUNT(current_query) FROM pg_stat_activity WHERE query_start < now() - interval '1 min' AND current_query != '<IDLE>' GROUP BY current_query; current_query | count ---------------+------- (0 Zeilen) 2.) we get wal archives written every 2-3min. 3.) we have no high-performant hardware layout, data and log on the same disk=> #>iostat 2 5 Linux 2.6.22.5-31-default 03.04.2009 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 13,42 38,57 391,86 134436221 1365849137 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 36,21 0,00 994,02 0 2992 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 67,67 0,00 1621,33 0 4864 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 40,00 0,00 989,33 0 2968 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 26,91 18,60 948,84 56 2856 #>vmstat 2 10 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 5 0 204 2449652 185692 5046168 0 0 2 24 1 1 2 0 95 2 3 0 204 2448496 185692 5046184 0 0 0 518 2984 18683 24 5 65 6 3 0 204 2430864 185692 5046192 0 0 0 344 2083 10004 34 3 58 5 2 0 204 2434600 185700 5046200 0 0 0 386 2084 23592 33 3 57 7 3 0 204 2425612 185700 5046220 0 0 0 372 2352 2905 36 2 57 5 5 0 204 2424828 185700 5046256 0 0 0 600 2372 33094 36 12 48 4 4 0 204 2405516 185700 5046256 0 0 4 992 1747 29035 33 8 52 6 3 0 204 2419368 185708 5046272 0 0 4 660 2735 24732 36 7 51 6 2 0 204 2419244 185712 5046296 0 0 0 360 2251 3193 9 1 84 5 3 0 204 2407096 185712 5046296 0 0 0 332 2319 3269 20 3 72 5 Can I check further system/database details ? Can we lower the load by reducing the amount of written wal archives, is this somehow possible ? Since buying and installing new hardware is a huge effort any other solutions highly welcome :-)) thanks in advance...GERD...
2009/4/3 Gerd König <koenig@transporeon.com>: > Hello, > > since 2 days ago we're facing an increased load on our database server > (opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load stays the whole > working day. How man cores? > ================== > current situation: > ================== > #>top > top - 14:09:46 up 40 days, 8:08, 2 users, load average: 7.60, 7.46, 7.13 > ... > Mem: 8194596k total, 5716680k used, 2477916k free, 185516k buffers > Swap: 4200988k total, 204k used, 4200784k free, 5041448k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 17478 postgres 15 0 610m 455m 444m R 52 5.7 0:08.78 postmaster > 17449 postgres 15 0 606m 497m 489m S 37 6.2 0:16.35 postmaster > 22541 postgres 16 0 607m 522m 516m R 31 6.5 123:25.17 postmaster > 17491 postgres 15 0 618m 447m 435m S 22 5.6 0:03.97 postmaster > 17454 postgres 15 0 616m 474m 457m S 18 5.9 0:15.88 postmaster > 22547 postgres 15 0 608m 534m 527m S 18 6.7 100:12.01 postmaster > 17448 postgres 16 0 616m 517m 501m S 17 6.5 0:15.60 postmaster > 17451 postgres 15 0 611m 491m 479m S 11 6.1 0:25.04 postmaster > 17490 postgres 15 0 606m 351m 344m S 10 4.4 0:02.69 postmaster > 22540 postgres 15 0 607m 520m 513m S 2 6.5 33:46.47 postmaster > 17489 postgres 15 0 604m 316m 311m S 2 4.0 0:03.34 postmaster Next time hit c first to see what the postmasters are up to. > I assume the problem is caused by heavy writing slows down the > server....?!?...why?=> The problem might be that you're assuming there's a problem. Looking at the rest of your diags, you're data set fits in memory, I/O wait is < 10% and there are no processes waiting for a CPU to free up, they're all running. Looks healthy to me.
Hello Scott, thanks for answering. Scott Marlowe schrieb: > 2009/4/3 Gerd König <koenig@transporeon.com>: >> Hello, >> >> since 2 days ago we're facing an increased load on our database server >> (opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load stays the whole >> working day. > > How man cores? The server contains two "model name : Intel(R) Xeon(R) CPU X5355 @ 2.66GHz" CPU's, thereby 8 cores... > >> ================== >> current situation: >> ================== >> #>top >> top - 14:09:46 up 40 days, 8:08, 2 users, load average: 7.60, 7.46, 7.13 >> ... >> Mem: 8194596k total, 5716680k used, 2477916k free, 185516k buffers >> Swap: 4200988k total, 204k used, 4200784k free, 5041448k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 17478 postgres 15 0 610m 455m 444m R 52 5.7 0:08.78 postmaster >> 17449 postgres 15 0 606m 497m 489m S 37 6.2 0:16.35 postmaster >> 22541 postgres 16 0 607m 522m 516m R 31 6.5 123:25.17 postmaster >> 17491 postgres 15 0 618m 447m 435m S 22 5.6 0:03.97 postmaster >> 17454 postgres 15 0 616m 474m 457m S 18 5.9 0:15.88 postmaster >> 22547 postgres 15 0 608m 534m 527m S 18 6.7 100:12.01 postmaster >> 17448 postgres 16 0 616m 517m 501m S 17 6.5 0:15.60 postmaster >> 17451 postgres 15 0 611m 491m 479m S 11 6.1 0:25.04 postmaster >> 17490 postgres 15 0 606m 351m 344m S 10 4.4 0:02.69 postmaster >> 22540 postgres 15 0 607m 520m 513m S 2 6.5 33:46.47 postmaster >> 17489 postgres 15 0 604m 316m 311m S 2 4.0 0:03.34 postmaster > > Next time hit c first to see what the postmasters are up to. good hint, I'll perform this the next time the server runs under higher load (probably on monday...) > >> I assume the problem is caused by heavy writing slows down the >> server....?!?...why?=> > > The problem might be that you're assuming there's a problem. Looking > at the rest of your diags, you're data set fits in memory, I/O wait is > < 10% and there are no processes waiting for a CPU to free up, they're > all running. > > Looks healthy to me. Perfect, probably our customers didn't work that much in the past, but now they do ;-) kind regards...:GERD:...
On Fri, Apr 3, 2009 at 12:35 PM, Gerd Koenig <koenig@transporeon.com> wrote: >> The problem might be that you're assuming there's a problem. Looking >> at the rest of your diags, you're data set fits in memory, I/O wait is >> < 10% and there are no processes waiting for a CPU to free up, they're >> all running. >> >> Looks healthy to me. > > Perfect, probably our customers didn't work that much in the past, but now > they do ;-) Well, it looks like you're about halfway to where you're gonna have to start improving your hardware / using slony read slaves / using memcached or something like that to handle the extra load. Keep an eye on your wait%. If that starts climbing and vmstat shows more and more bo going to your drives, then you'll need to improve your I/O subsystem to keep up with the load.
On Apr 3, 2009, at 7:32 AM, Scott Marlowe wrote: > 2009/4/3 Gerd König <koenig@transporeon.com>: >> Hello, >> >> since 2 days ago we're facing an increased load on our database >> server >> (opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load >> stays the whole >> working day. > > How man cores? > >> ================== >> current situation: >> ================== >> #>top >> top - 14:09:46 up 40 days, 8:08, 2 users, load average: 7.60, >> 7.46, 7.13 >> ... >> Mem: 8194596k total, 5716680k used, 2477916k free, 185516k >> buffers >> Swap: 4200988k total, 204k used, 4200784k free, 5041448k >> cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 17478 postgres 15 0 610m 455m 444m R 52 5.7 0:08.78 >> postmaster >> 17449 postgres 15 0 606m 497m 489m S 37 6.2 0:16.35 >> postmaster >> 22541 postgres 16 0 607m 522m 516m R 31 6.5 123:25.17 >> postmaster >> 17491 postgres 15 0 618m 447m 435m S 22 5.6 0:03.97 >> postmaster >> 17454 postgres 15 0 616m 474m 457m S 18 5.9 0:15.88 >> postmaster >> 22547 postgres 15 0 608m 534m 527m S 18 6.7 100:12.01 >> postmaster >> 17448 postgres 16 0 616m 517m 501m S 17 6.5 0:15.60 >> postmaster >> 17451 postgres 15 0 611m 491m 479m S 11 6.1 0:25.04 >> postmaster >> 17490 postgres 15 0 606m 351m 344m S 10 4.4 0:02.69 >> postmaster >> 22540 postgres 15 0 607m 520m 513m S 2 6.5 33:46.47 >> postmaster >> 17489 postgres 15 0 604m 316m 311m S 2 4.0 0:03.34 >> postmaster > > Next time hit c first to see what the postmasters are up to. > >> I assume the problem is caused by heavy writing slows down the >> server....?!?...why?=> > > The problem might be that you're assuming there's a problem. Looking > at the rest of your diags, you're data set fits in memory, I/O wait is > < 10% and there are no processes waiting for a CPU to free up, they're > all running. > > Looks healthy to me. Eh? His run queue constantly has procs waiting for run time, although I've seen higher. That with a distinct lack of heavy IO says cpu bound to me... #>vmstat 2 10 procs -----------memory---------- ---swap-- -----io---- -system-- ---- cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 5 0 204 2449652 185692 5046168 0 0 2 24 1 1 2 0 95 2 3 0 204 2448496 185692 5046184 0 0 0 518 2984 18683 24 5 65 6 3 0 204 2430864 185692 5046192 0 0 0 344 2083 10004 34 3 58 5 2 0 204 2434600 185700 5046200 0 0 0 386 2084 23592 33 3 57 7 3 0 204 2425612 185700 5046220 0 0 0 372 2352 2905 36 2 57 5 5 0 204 2424828 185700 5046256 0 0 0 600 2372 33094 36 12 48 4 4 0 204 2405516 185700 5046256 0 0 4 992 1747 29035 33 8 52 6 3 0 204 2419368 185708 5046272 0 0 4 660 2735 24732 36 7 51 6 2 0 204 2419244 185712 5046296 0 0 0 360 2251 3193 9 1 84 5 3 0 204 2407096 185712 5046296 0 0 0 332 2319 3269 20 3 72 5 Erik Jones, Database Administrator Engine Yard Support, Scalability, Reliability 866.518.9273 x 260 Location: US/Pacific IRC: mage2k
On Fri, Apr 3, 2009 at 4:13 PM, Erik Jones <ejones@engineyard.com> wrote: > > On Apr 3, 2009, at 7:32 AM, Scott Marlowe wrote: > >> 2009/4/3 Gerd König <koenig@transporeon.com>: >>> >>> Hello, >>> >>> since 2 days ago we're facing an increased load on our database server >>> (opensuse10.3-64bit, PostgreSQL 8.3.5, 8GB Ram). This high load stays the >>> whole >>> working day. >> >> How man cores? >> >>> ================== >>> current situation: >>> ================== >>> #>top >>> top - 14:09:46 up 40 days, 8:08, 2 users, load average: 7.60, 7.46, >>> 7.13 >>> ... >>> Mem: 8194596k total, 5716680k used, 2477916k free, 185516k buffers >>> Swap: 4200988k total, 204k used, 4200784k free, 5041448k cached >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 17478 postgres 15 0 610m 455m 444m R 52 5.7 0:08.78 postmaster >>> 17449 postgres 15 0 606m 497m 489m S 37 6.2 0:16.35 postmaster >>> 22541 postgres 16 0 607m 522m 516m R 31 6.5 123:25.17 postmaster >>> 17491 postgres 15 0 618m 447m 435m S 22 5.6 0:03.97 postmaster >>> 17454 postgres 15 0 616m 474m 457m S 18 5.9 0:15.88 postmaster >>> 22547 postgres 15 0 608m 534m 527m S 18 6.7 100:12.01 postmaster >>> 17448 postgres 16 0 616m 517m 501m S 17 6.5 0:15.60 postmaster >>> 17451 postgres 15 0 611m 491m 479m S 11 6.1 0:25.04 postmaster >>> 17490 postgres 15 0 606m 351m 344m S 10 4.4 0:02.69 postmaster >>> 22540 postgres 15 0 607m 520m 513m S 2 6.5 33:46.47 postmaster >>> 17489 postgres 15 0 604m 316m 311m S 2 4.0 0:03.34 postmaster >> >> Next time hit c first to see what the postmasters are up to. >> >>> I assume the problem is caused by heavy writing slows down the >>> server....?!?...why?=> >> >> The problem might be that you're assuming there's a problem. Looking >> at the rest of your diags, you're data set fits in memory, I/O wait is >> < 10% and there are no processes waiting for a CPU to free up, they're >> all running. >> >> Looks healthy to me. > > Eh? His run queue constantly has procs waiting for run time, although I've > seen higher. That with a distinct lack of heavy IO says cpu bound to me... How do you see that? He's got 50% or so idle, and is running fewer processes than he has cores.
On Fri, Apr 3, 2009 at 4:13 PM, Erik Jones <ejones@engineyard.com> wrote: > Eh? His run queue constantly has procs waiting for run time, although I've > seen higher. That with a distinct lack of heavy IO says cpu bound to me... I just pulled up the linux man page and it says that r is the number of processes waiting to run. This isn't entirely correct. A BSD or Solaris man page more correctly identifies it as the number of processes running OR waiting to run and that if this number exceeds the number of cores then the number by which is exceeds it is how big of a queue there is.