Thread: PG 8.3 and server load
I'm on a CentOS 5 OS 64 bit, latest kernel and all of that. PG version is 8.3.7, compiled as 64bit. The memory is 8GB. It's a 2 x Dual Core Intel 5310. Hard disks are Raid 1, SCSI 15 rpm. The server is running just one website. So there's Apache 2.2.11, MySQL (for some small tasks, almost negligible). And then there's PG, which in the "top" command shows up as the main beast. My server load is going to 64, 63, 65, and so on. Where should I start debugging? What should I see? TOP command does not yield anything meaningful. I mean, even if it shows that postgres user for "postmaster" and nobody user for "httpd" (apache) are the main resource hogs, what should I start with in terms of debugging?
Phoenix Kiula wrote: > I'm on a CentOS 5 OS 64 bit, latest kernel and all of that. > PG version is 8.3.7, compiled as 64bit. > The memory is 8GB. > It's a 2 x Dual Core Intel 5310. > Hard disks are Raid 1, SCSI 15 rpm. > > The server is running just one website. So there's Apache 2.2.11, > MySQL (for some small tasks, almost negligible). > > And then there's PG, which in the "top" command shows up as the main beast. > > My server load is going to 64, 63, 65, and so on. > > Where should I start debugging? What should I see? TOP command does > not yield anything meaningful. I mean, even if it shows that postgres > user for "postmaster" and nobody user for "httpd" (apache) are the > main resource hogs, what should I start with in terms of debugging? If postgres or apache are the reason for the high load, it means you have lots of simultaneous users hitting either server. The only thing you can do (except of course denying service to the users) is investigate which requests / queries take the most time and optimize them. pgtop (http://pgfoundry.org/projects/pgtop/) might help you see what is your database doing. You will also probably need to use something like pqa (http://pqa.projects.postgresql.org/) to find top running queries. Unfortunately, if you cannot significantly optimize your queries, there is not much else you can do with the hardware you have.
Ivan Voras <ivoras 'at' freebsd.org> writes: > pgtop (http://pgfoundry.org/projects/pgtop/) might help you see what > is your database doing. A simpler (but most probably less powerful) method would be to activate "stats_command_string = on" in the server configuration, then issue that query to view the currently running queries: SELECT procpid, datname, current_query, query_start FROM pg_stat_activity WHERE current_query <> '<IDLE>' That may also be interesting. -- Guillaume Cottenceau
Phoenix Kiula wrote: > I'm on a CentOS 5 OS 64 bit, latest kernel and all of that. > PG version is 8.3.7, compiled as 64bit. > The memory is 8GB. > It's a 2 x Dual Core Intel 5310. > Hard disks are Raid 1, SCSI 15 rpm. > > The server is running just one website. So there's Apache 2.2.11, > MySQL (for some small tasks, almost negligible). > > And then there's PG, which in the "top" command shows up as the main beast. > > My server load is going to 64, 63, 65, and so on. > > Where should I start debugging? What should I see? TOP command does > not yield anything meaningful. I mean, even if it shows that postgres > user for "postmaster" and nobody user for "httpd" (apache) are the > main resource hogs, what should I start with in terms of debugging? > 1) check if you are using swap space. Use free and make sure swap/used is a small number. Check vmstat and see if swpd is moving up and down. (Posting a handful of lines from vmstat might help us). 2) check 'ps ax|grep postgres' and make sure nothing says "idle in transaction" 3) I had a web box where the number of apache clients was set very high, and the box was brought to its knees by the sheer number of connections. check "ps ax|grep http|wc --lines" and make sure its not too big. (perhaps less than 100) -Andy
Andy Colson wrote: > Phoenix Kiula wrote: >> I'm on a CentOS 5 OS 64 bit, latest kernel and all of that. >> PG version is 8.3.7, compiled as 64bit. >> The memory is 8GB. >> It's a 2 x Dual Core Intel 5310. >> Hard disks are Raid 1, SCSI 15 rpm. >> >> The server is running just one website. So there's Apache 2.2.11, >> MySQL (for some small tasks, almost negligible). >> >> And then there's PG, which in the "top" command shows up as the main >> beast. >> >> My server load is going to 64, 63, 65, and so on. >> >> Where should I start debugging? What should I see? TOP command does >> not yield anything meaningful. I mean, even if it shows that postgres >> user for "postmaster" and nobody user for "httpd" (apache) are the >> main resource hogs, what should I start with in terms of debugging? >> > > 1) check if you are using swap space. Use free and make sure > swap/used is a small number. Check vmstat and see if swpd is moving > up and down. (Posting a handful of lines from vmstat might help us). > > 2) check 'ps ax|grep postgres' and make sure nothing says "idle in > transaction" > > 3) I had a web box where the number of apache clients was set very > high, and the box was brought to its knees by the sheer number of > connections. check "ps ax|grep http|wc --lines" and make sure its not > too big. (perhaps less than 100) > > -Andy > I will observe that in some benchmark tests I've done on my application (a VERY heavy Postgres user) CentOS was RADICALLY inferior in terms of carrying capacity and performance to FreeBSD on the same hardware. I have no idea why - you wouldn't expect this sort of result, but it is what it is. The test platform in my case was a Core i7 box (8 cores SMP) with 6GB of memory running 64-bit code across the board. Disks were on a 3Ware coprocessor board. I was quite surprised by this given that in general CentOS seems to be comparable for base Apache (web service) use to FreeBSD, but due to this recommend strongly in favor of FreeBSD for applications where web service + PostgreSQL are the intended application mix. -- Karl
Attachment
Phoenix Kiula wrote: > Thanks, but swap is not changing, there is no idle transaction, and > number of connections are 28/29. > > Here are some command line stamps...any other ideas? > > > > [MYSITE] ~ > date && vmstat > Wed Aug 19 10:00:37 CDT 2009 > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 3 1 20920 25736 60172 7594988 0 0 74 153 0 3 10 5 74 12 > > [MYSITE] ~ > date && vmstat > Wed Aug 19 10:00:40 CDT 2009 > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 1 20920 34696 60124 7593996 0 0 74 153 0 3 10 5 74 12 > > [MYSITE] ~ > ps ax|grep postgres > 25302 ? Ss 0:00 postgres: logger process > 25352 ? Ss 0:07 postgres: writer process > 25353 ? Ss 4:21 postgres: stats collector process > 23483 ? Ds 0:00 postgres: snipurl_snipurl snipurl > 127.0.0.1(51622) UPDATE > 23485 pts/12 S+ 0:00 grep postgres > > [MYSITE] ~ > date && vmstat > Wed Aug 19 10:00:55 CDT 2009 > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 20920 49464 60272 7597748 0 0 74 153 0 3 10 5 74 12 > > [MYSITE] ~ > ps ax|grep http|wc --lines > 28 > > [MYSITE] ~ > ps ax|grep http|wc --lines > 29 > > [MYSITE] ~ > ps ax|grep postgres > 25302 ? Ss 0:00 postgres: logger process > 25352 ? Ss 0:07 postgres: writer process > 25353 ? Ss 4:21 postgres: stats collector process > 24718 pts/12 S+ 0:00 grep postgres > > [MYSITE] ~ > date && vmstat > Wed Aug 19 10:01:23 CDT 2009 > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 20920 106376 59220 7531016 0 0 74 153 0 3 10 5 74 12 > > > > > On Wed, Aug 19, 2009 at 10:01 PM, Andy Colson<andy@squeakycode.net> wrote: >> Phoenix Kiula wrote: >>> I'm on a CentOS 5 OS 64 bit, latest kernel and all of that. >>> PG version is 8.3.7, compiled as 64bit. >>> The memory is 8GB. >>> It's a 2 x Dual Core Intel 5310. >>> Hard disks are Raid 1, SCSI 15 rpm. >>> >>> The server is running just one website. So there's Apache 2.2.11, >>> MySQL (for some small tasks, almost negligible). >>> >>> And then there's PG, which in the "top" command shows up as the main >>> beast. >>> >>> My server load is going to 64, 63, 65, and so on. >>> >>> Where should I start debugging? What should I see? TOP command does >>> not yield anything meaningful. I mean, even if it shows that postgres >>> user for "postmaster" and nobody user for "httpd" (apache) are the >>> main resource hogs, what should I start with in terms of debugging? >>> >> 1) check if you are using swap space. Use free and make sure swap/used is a >> small number. Check vmstat and see if swpd is moving up and down. (Posting >> a handful of lines from vmstat might help us). >> >> 2) check 'ps ax|grep postgres' and make sure nothing says "idle in >> transaction" >> >> 3) I had a web box where the number of apache clients was set very high, and >> the box was brought to its knees by the sheer number of connections. check >> "ps ax|grep http|wc --lines" and make sure its not too big. (perhaps less >> than 100) >> >> -Andy >> >> the first line of vmstat is an average since bootup. Kinda useless. run it as: 'vmstat 4' it will print a line every 4 seconds, which will be a summary of everything that happened in the last 4 seconds. since boot, you've written out an average of 153 blocks (the bo column). Thats very small, so your not io bound. but... you have average 74% idle cpu. So your not cpu bound either? Ahh? I'm not sure what that means. Maybe I'm reading something wrong? -Andy
On Wed, Aug 19, 2009 at 11:25 PM, Andy Colson<andy@squeakycode.net> wrote: ....<snip>..... > > the first line of vmstat is an average since bootup. Kinda useless. run it > as: 'vmstat 4' > > it will print a line every 4 seconds, which will be a summary of everything > that happened in the last 4 seconds. > > since boot, you've written out an average of 153 blocks (the bo column). > Thats very small, so your not io bound. > > but... you have average 74% idle cpu. So your not cpu bound either? > > Ahh? I'm not sure what that means. Maybe I'm reading something wrong? > > -Andy > ~ > vmstat 4 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 2 16128 35056 62800 7697428 0 0 74 153 0 3 10 5 74 12 0 0 16128 38256 62836 7698172 0 0 166 219 1386 1440 7 4 85 4 0 1 16128 34704 62872 7698916 0 0 119 314 1441 1589 7 4 85 5 0 0 16128 29544 62912 7699396 0 0 142 144 1443 1418 6 3 88 2 7 1 16128 26784 62832 7692196 0 0 343 241 1492 1671 8 5 83 4 0 0 16128 32840 62880 7693188 0 0 253 215 1459 1511 7 4 85 4 0 0 16128 30112 62940 7693908 0 0 187 216 1395 1282 6 3 87 4
Andy Colson <andy@squeakycode.net> wrote: > Phoenix Kiula wrote: >>>> It's a 2 x Dual Core Intel 5310. > you have average 74% idle cpu. So your not cpu bound either? Or one CPU is pegged and the other three are idle.... -Kevin
Kevin Grittner wrote: > Andy Colson <andy@squeakycode.net> wrote: >> Phoenix Kiula wrote: > >>>>> It's a 2 x Dual Core Intel 5310. > >> you have average 74% idle cpu. So your not cpu bound either? > > Or one CPU is pegged and the other three are idle.... > > -Kevin Ahh, yeah... Phoenix: run top again, and hit the '1' key. It'll show you stats for each cpu. Is one pegged and the others idle? do a 'cat /proc/cpuinfo' and make sure your os is seeing all your cpus. -Andy
On Wed, 19 Aug 2009, Phoenix Kiula wrote: > ~ > vmstat 4 > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 2 16128 35056 62800 7697428 0 0 74 153 0 3 10 5 74 12 > 0 0 16128 38256 62836 7698172 0 0 166 219 1386 1440 7 4 85 4 > 0 1 16128 34704 62872 7698916 0 0 119 314 1441 1589 7 4 85 5 > 0 0 16128 29544 62912 7699396 0 0 142 144 1443 1418 6 3 88 2 > 7 1 16128 26784 62832 7692196 0 0 343 241 1492 1671 8 5 83 4 > 0 0 16128 32840 62880 7693188 0 0 253 215 1459 1511 7 4 85 4 > 0 0 16128 30112 62940 7693908 0 0 187 216 1395 1282 6 3 87 4 As far as I can see from this, your machine isn't very busy at all. > [MYSITE] ~ > ps ax|grep postgres > 25302 ? Ss 0:00 postgres: logger process > 25352 ? Ss 0:07 postgres: writer process > 25353 ? Ss 4:21 postgres: stats collector process > 24718 pts/12 S+ 0:00 grep postgres Moreover, Postgres isn't doing anything either. So, what is the problem that you are seeing? What do you want to change? Matthew -- Surely the value of C++ is zero, but C's value is now 1? -- map36, commenting on the "No, C++ isn't equal to D. 'C' is undeclared [...] C++ should really be called 1" response to "C++ -- shouldn't it be called D?"
On Wed, Aug 19, 2009 at 11:37 PM, Andy Colson<andy@squeakycode.net> wrote:
>
> Phoenix: run top again, and hit the '1' key. It'll show you stats for each
> cpu. Is one pegged and the others idle?
>
top - 10:38:53 up 29 days, 5 min, 1 user, load average: 64.99, 65.17, 65.06
Tasks: 568 total, 1 running, 537 sleeping, 6 stopped, 24 zombie
Cpu0 : 17.7% us, 7.7% sy, 0.0% ni, 74.0% id, 0.7% wa, 0.0% hi, 0.0% si
Cpu1 : 6.3% us, 5.6% sy, 0.0% ni, 84.4% id, 3.6% wa, 0.0% hi, 0.0% si
Cpu2 : 5.6% us, 5.9% sy, 0.0% ni, 86.8% id, 1.7% wa, 0.0% hi, 0.0% si
Cpu3 : 5.6% us, 4.0% sy, 0.0% ni, 74.2% id, 16.2% wa, 0.0% hi, 0.0% si
Mem: 8310256k total, 8277416k used, 32840k free, 61944k buffers
Swap: 2096440k total, 16128k used, 2080312k free, 7664224k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9922 nobody 15 0 49024 16m 7408 S 3.0 0.2 0:00.52 httpd
9630 nobody 15 0 49020 16m 7420 S 2.3 0.2 0:00.60 httpd
9848 nobody 16 0 48992 16m 7372 S 2.3 0.2 0:00.51 httpd
10995 nobody 15 0 49024 16m 7304 S 2.3 0.2 0:00.35 httpd
11031 nobody 15 0 48860 16m 7104 S 2.3 0.2 0:00.34 httpd
6701 nobody 15 0 49028 17m 7576 S 2.0 0.2 0:01.50 httpd
10996 nobody 15 0 48992 16m 7328 S 2.0 0.2 0:00.31 httpd
12232 nobody 15 0 48860 16m 7004 S 1.7 0.2 0:00.05 httpd
9876 nobody 15 0 48992 16m 7400 S 1.3 0.2 0:00.73 httpd
12231 nobody 15 0 48860 16m 6932 S 1.3 0.2 0:00.04 httpd
12233 nobody 16 0 48860 16m 6960 S 1.3 0.2 0:00.04 httpd
20315 postgres 19 0 325m 9732 9380 S 1.0 0.1 0:10.39 postmaster
31573 nobody 15 0 49024 17m 7664 S 1.0 0.2 0:03.14 httpd
7954 nobody 15 0 49032 16m 7400 S 1.0 0.2 0:01.14 httpd
9918 nobody 15 0 48956 16m 7344 S 1.0 0.2 0:00.44 httpd
12298 nobody 16 0 48860 16m 6780 S 1.0 0.2 0:00.03 httpd
6479 nobody 16 0 49040 16m 7412 S 0.7 0.2 0:01.20 httpd
7950 nobody 15 0 49020 16m 7388 S 0.7 0.2 0:00.83 httpd
7951 nobody 15 0 49032 16m 7384 S 0.7 0.2 0:01.03 httpd
9875 nobody 15 0 48948 16m 7096 S 0.7 0.2 0:00.51 httpd
9916 nobody 16 0 48860 16m 7124 S 0.7 0.2 0:00.59 httpd
10969 nobody 15 0 49036 16m 7380 S 0.7 0.2 0:00.29 httpd
11752 root 16 0 3620 1288 772 R 0.7 0.0 0:00.14 top
12309 nobody 16 0 48860 16m 6844 S 0.7 0.2 0:00.02 httpd
20676 mysql 15 0 182m 20m 2916 S 0.3 0.3 0:00.95 mysqld
20811 root 21 0 47920 14m 5872 S 0.3 0.2 0:00.71 httpd
7952 nobody 15 0 49024 16m 7524 S 0.3 0.2 0:00.96 httpd
11036 nobody 15 0 48992 16m 7320 S 0.3 0.2 0:00.36 httpd
12230 nobody 15 0 48860 16m 6956 S 0.3 0.2 0:00.01 httpd
12297 nobody 16 0 48860 16m 6932 S 0.3 0.2 0:00.01 httpd
12299 nobody 16 0 48992 16m 7120 S 0.3 0.2 0:00.01 httpd
12301 nobody 20 0 48860 16m 6816 S 0.3 0.2 0:00.01 httpd
12307 nobody 15 0 48860 16m 6880 S 0.3 0.2 0:00.01 httpd
> do a 'cat /proc/cpuinfo' and make sure your os is seeing all your cpus.
>
>
> Phoenix: run top again, and hit the '1' key. It'll show you stats for each
> cpu. Is one pegged and the others idle?
>
top - 10:38:53 up 29 days, 5 min, 1 user, load average: 64.99, 65.17, 65.06
Tasks: 568 total, 1 running, 537 sleeping, 6 stopped, 24 zombie
Cpu0 : 17.7% us, 7.7% sy, 0.0% ni, 74.0% id, 0.7% wa, 0.0% hi, 0.0% si
Cpu1 : 6.3% us, 5.6% sy, 0.0% ni, 84.4% id, 3.6% wa, 0.0% hi, 0.0% si
Cpu2 : 5.6% us, 5.9% sy, 0.0% ni, 86.8% id, 1.7% wa, 0.0% hi, 0.0% si
Cpu3 : 5.6% us, 4.0% sy, 0.0% ni, 74.2% id, 16.2% wa, 0.0% hi, 0.0% si
Mem: 8310256k total, 8277416k used, 32840k free, 61944k buffers
Swap: 2096440k total, 16128k used, 2080312k free, 7664224k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9922 nobody 15 0 49024 16m 7408 S 3.0 0.2 0:00.52 httpd
9630 nobody 15 0 49020 16m 7420 S 2.3 0.2 0:00.60 httpd
9848 nobody 16 0 48992 16m 7372 S 2.3 0.2 0:00.51 httpd
10995 nobody 15 0 49024 16m 7304 S 2.3 0.2 0:00.35 httpd
11031 nobody 15 0 48860 16m 7104 S 2.3 0.2 0:00.34 httpd
6701 nobody 15 0 49028 17m 7576 S 2.0 0.2 0:01.50 httpd
10996 nobody 15 0 48992 16m 7328 S 2.0 0.2 0:00.31 httpd
12232 nobody 15 0 48860 16m 7004 S 1.7 0.2 0:00.05 httpd
9876 nobody 15 0 48992 16m 7400 S 1.3 0.2 0:00.73 httpd
12231 nobody 15 0 48860 16m 6932 S 1.3 0.2 0:00.04 httpd
12233 nobody 16 0 48860 16m 6960 S 1.3 0.2 0:00.04 httpd
20315 postgres 19 0 325m 9732 9380 S 1.0 0.1 0:10.39 postmaster
31573 nobody 15 0 49024 17m 7664 S 1.0 0.2 0:03.14 httpd
7954 nobody 15 0 49032 16m 7400 S 1.0 0.2 0:01.14 httpd
9918 nobody 15 0 48956 16m 7344 S 1.0 0.2 0:00.44 httpd
12298 nobody 16 0 48860 16m 6780 S 1.0 0.2 0:00.03 httpd
6479 nobody 16 0 49040 16m 7412 S 0.7 0.2 0:01.20 httpd
7950 nobody 15 0 49020 16m 7388 S 0.7 0.2 0:00.83 httpd
7951 nobody 15 0 49032 16m 7384 S 0.7 0.2 0:01.03 httpd
9875 nobody 15 0 48948 16m 7096 S 0.7 0.2 0:00.51 httpd
9916 nobody 16 0 48860 16m 7124 S 0.7 0.2 0:00.59 httpd
10969 nobody 15 0 49036 16m 7380 S 0.7 0.2 0:00.29 httpd
11752 root 16 0 3620 1288 772 R 0.7 0.0 0:00.14 top
12309 nobody 16 0 48860 16m 6844 S 0.7 0.2 0:00.02 httpd
20676 mysql 15 0 182m 20m 2916 S 0.3 0.3 0:00.95 mysqld
20811 root 21 0 47920 14m 5872 S 0.3 0.2 0:00.71 httpd
7952 nobody 15 0 49024 16m 7524 S 0.3 0.2 0:00.96 httpd
11036 nobody 15 0 48992 16m 7320 S 0.3 0.2 0:00.36 httpd
12230 nobody 15 0 48860 16m 6956 S 0.3 0.2 0:00.01 httpd
12297 nobody 16 0 48860 16m 6932 S 0.3 0.2 0:00.01 httpd
12299 nobody 16 0 48992 16m 7120 S 0.3 0.2 0:00.01 httpd
12301 nobody 20 0 48860 16m 6816 S 0.3 0.2 0:00.01 httpd
12307 nobody 15 0 48860 16m 6880 S 0.3 0.2 0:00.01 httpd
> do a 'cat /proc/cpuinfo' and make sure your os is seeing all your cpus.
>
I guess it's using all 4?
Phoenix Kiula wrote: > On Wed, Aug 19, 2009 at 11:37 PM, Andy Colson<andy@squeakycode.net > <mailto:andy@squeakycode.net>> wrote: > > > > > Phoenix: run top again, and hit the '1' key. It'll show you stats > for each > > cpu. Is one pegged and the others idle? > > > > > top - 10:38:53 up 29 days, 5 min, 1 user, load average: 64.99, 65.17, > 65.06 > Tasks: 568 total, 1 running, 537 sleeping, 6 stopped, 24 zombie > Cpu0 : 17.7% us, 7.7% sy, 0.0% ni, 74.0% id, 0.7% wa, 0.0% hi, 0.0% si > Cpu1 : 6.3% us, 5.6% sy, 0.0% ni, 84.4% id, 3.6% wa, 0.0% hi, 0.0% si > Cpu2 : 5.6% us, 5.9% sy, 0.0% ni, 86.8% id, 1.7% wa, 0.0% hi, 0.0% si > Cpu3 : 5.6% us, 4.0% sy, 0.0% ni, 74.2% id, 16.2% wa, 0.0% hi, 0.0% si > Mem: 8310256k total, 8277416k used, 32840k free, 61944k buffers > Swap: 2096440k total, 16128k used, 2080312k free, 7664224k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 9922 nobody 15 0 49024 16m 7408 S 3.0 0.2 0:00.52 httpd > > 9630 nobody 15 0 49020 16m 7420 S 2.3 0.2 0:00.60 httpd > > 9848 nobody 16 0 48992 16m 7372 S 2.3 0.2 0:00.51 httpd > > 10995 nobody 15 0 49024 16m 7304 S 2.3 0.2 0:00.35 httpd > > 11031 nobody 15 0 48860 16m 7104 S 2.3 0.2 0:00.34 httpd > > 6701 nobody 15 0 49028 17m 7576 S 2.0 0.2 0:01.50 httpd > > 10996 nobody 15 0 48992 16m 7328 S 2.0 0.2 0:00.31 httpd > > 12232 nobody 15 0 48860 16m 7004 S 1.7 0.2 0:00.05 httpd > > 9876 nobody 15 0 48992 16m 7400 S 1.3 0.2 0:00.73 httpd > > 12231 nobody 15 0 48860 16m 6932 S 1.3 0.2 0:00.04 httpd > > 12233 nobody 16 0 48860 16m 6960 S 1.3 0.2 0:00.04 httpd > > 20315 postgres 19 0 325m 9732 9380 S 1.0 0.1 0:10.39 postmaster > > 31573 nobody 15 0 49024 17m 7664 S 1.0 0.2 0:03.14 httpd > > 7954 nobody 15 0 49032 16m 7400 S 1.0 0.2 0:01.14 httpd > > 9918 nobody 15 0 48956 16m 7344 S 1.0 0.2 0:00.44 httpd > > 12298 nobody 16 0 48860 16m 6780 S 1.0 0.2 0:00.03 httpd > > 6479 nobody 16 0 49040 16m 7412 S 0.7 0.2 0:01.20 httpd > > 7950 nobody 15 0 49020 16m 7388 S 0.7 0.2 0:00.83 httpd > > 7951 nobody 15 0 49032 16m 7384 S 0.7 0.2 0:01.03 httpd > > 9875 nobody 15 0 48948 16m 7096 S 0.7 0.2 0:00.51 httpd > > 9916 nobody 16 0 48860 16m 7124 S 0.7 0.2 0:00.59 httpd > > 10969 nobody 15 0 49036 16m 7380 S 0.7 0.2 0:00.29 httpd > > 11752 root 16 0 3620 1288 772 R 0.7 0.0 0:00.14 top > > 12309 nobody 16 0 48860 16m 6844 S 0.7 0.2 0:00.02 httpd > > 20676 mysql 15 0 182m 20m 2916 S 0.3 0.3 0:00.95 mysqld > > 20811 root 21 0 47920 14m 5872 S 0.3 0.2 0:00.71 httpd > > 7952 nobody 15 0 49024 16m 7524 S 0.3 0.2 0:00.96 httpd > > 11036 nobody 15 0 48992 16m 7320 S 0.3 0.2 0:00.36 httpd > > 12230 nobody 15 0 48860 16m 6956 S 0.3 0.2 0:00.01 httpd > > 12297 nobody 16 0 48860 16m 6932 S 0.3 0.2 0:00.01 httpd > > 12299 nobody 16 0 48992 16m 7120 S 0.3 0.2 0:00.01 httpd > > 12301 nobody 20 0 48860 16m 6816 S 0.3 0.2 0:00.01 httpd > > 12307 nobody 15 0 48860 16m 6880 S 0.3 0.2 0:00.01 httpd > > > > > > do a 'cat /proc/cpuinfo' and make sure your os is seeing all your cpus. > > > > > > I guess it's using all 4? Yeah. You aren't serving data from a shared drive (smb or nsf) are you? You have a bunch of httpd just sitting around doing very little. Or do you have any php/perl/python/whatever turning around and doing network stuff? Check your nic's for errors (run ifconfig), check these stats: RX packets:15606269 errors:0 dropped:0 overruns:0 frame:0 TX packets:13173940 errors:5 dropped:0 overruns:0 carrier:10 collisions:0 txqueuelen:1000 the load average is a summary of a bunch of things, including whats waiting on something else. I'll bet your httpd's are sitting around waiting on something, (its not cpu or disk, it must be something else), which is causing the load average to spike up. -Andy
Phoenix Kiula <phoenix.kiula@gmail.com> writes: > top - 10:38:53 up 29 days, 5 min, 1 user, load average: 64.99, 65.17, > 65.06 > Tasks: 568 total, 1 running, 537 sleeping, 6 stopped, 24 zombie > Cpu0 : 17.7% us, 7.7% sy, 0.0% ni, 74.0% id, 0.7% wa, 0.0% hi, 0.0% si > Cpu1 : 6.3% us, 5.6% sy, 0.0% ni, 84.4% id, 3.6% wa, 0.0% hi, 0.0% si > Cpu2 : 5.6% us, 5.9% sy, 0.0% ni, 86.8% id, 1.7% wa, 0.0% hi, 0.0% si > Cpu3 : 5.6% us, 4.0% sy, 0.0% ni, 74.2% id, 16.2% wa, 0.0% hi, 0.0% si > Mem: 8310256k total, 8277416k used, 32840k free, 61944k buffers > Swap: 2096440k total, 16128k used, 2080312k free, 7664224k cached It sure looks from here like your box is not under any particular stress. The only thing that suggests a problem is the high load average, but since that doesn't agree with any other measurements, I'm inclined to think that the load average is simply wrong. Do you have any actual evidence of a problem (like slow response)? (I've seen load averages that had nothing to do with observable reality on other Unixes, though not before on RHEL.) regards, tom lane
Phoenix Kiula <phoenix.kiula 'at' gmail.com> writes: > Tasks: 568 total, 1 running, 537 sleeping, 6 stopped, 24 zombie The stopped and zombie processes look odd. Any reason for these? -- Guillaume Cottenceau
On Wed, Aug 19, 2009 at 9:40 AM, Phoenix Kiula<phoenix.kiula@gmail.com> wrote: > On Wed, Aug 19, 2009 at 11:37 PM, Andy Colson<andy@squeakycode.net> wrote: > >> >> Phoenix: run top again, and hit the '1' key. It'll show you stats for >> each >> cpu. Is one pegged and the others idle? > > top - 10:38:53 up 29 days, 5 min, 1 user, load average: 64.99, 65.17, > 65.06 > Tasks: 568 total, 1 running, 537 sleeping, 6 stopped, 24 zombie > Cpu0 : 17.7% us, 7.7% sy, 0.0% ni, 74.0% id, 0.7% wa, 0.0% hi, 0.0% si > Cpu1 : 6.3% us, 5.6% sy, 0.0% ni, 84.4% id, 3.6% wa, 0.0% hi, 0.0% si > Cpu2 : 5.6% us, 5.9% sy, 0.0% ni, 86.8% id, 1.7% wa, 0.0% hi, 0.0% si > Cpu3 : 5.6% us, 4.0% sy, 0.0% ni, 74.2% id, 16.2% wa, 0.0% hi, 0.0% si > Mem: 8310256k total, 8277416k used, 32840k free, 61944k buffers > Swap: 2096440k total, 16128k used, 2080312k free, 7664224k cached > OK, nothing looks odd except, as pointed out, the stopped, zombie and high load. The actual amount of stuff running is minimal. I'm wondering if you've got something causing apache children to crash and go zombie. What parts of this setup are compiled by hand? Are you sure that you don't have something like apache compiled against one version of zlib and php-mysql against another? Not that exact problem, but it's one of many ways to make a crash prone apache.
Scott Marlowe wrote: > On Wed, Aug 19, 2009 at 9:40 AM, Phoenix Kiula<phoenix.kiula@gmail.com> wrote: >> On Wed, Aug 19, 2009 at 11:37 PM, Andy Colson<andy@squeakycode.net> wrote: >> >>> Phoenix: run top again, and hit the '1' key. It'll show you stats for >>> each >>> cpu. Is one pegged and the others idle? >> top - 10:38:53 up 29 days, 5 min, 1 user, load average: 64.99, 65.17, >> 65.06 >> Tasks: 568 total, 1 running, 537 sleeping, 6 stopped, 24 zombie >> Cpu0 : 17.7% us, 7.7% sy, 0.0% ni, 74.0% id, 0.7% wa, 0.0% hi, 0.0% si >> Cpu1 : 6.3% us, 5.6% sy, 0.0% ni, 84.4% id, 3.6% wa, 0.0% hi, 0.0% si >> Cpu2 : 5.6% us, 5.9% sy, 0.0% ni, 86.8% id, 1.7% wa, 0.0% hi, 0.0% si >> Cpu3 : 5.6% us, 4.0% sy, 0.0% ni, 74.2% id, 16.2% wa, 0.0% hi, 0.0% si >> Mem: 8310256k total, 8277416k used, 32840k free, 61944k buffers >> Swap: 2096440k total, 16128k used, 2080312k free, 7664224k cached >> > > OK, nothing looks odd except, as pointed out, the stopped, zombie and > high load. The actual amount of stuff running is minimal. > > I'm wondering if you've got something causing apache children to crash > and go zombie. What parts of this setup are compiled by hand? Are Good point. Does Linux have "last PID" field in top? If so, you could monitor it to find if it it's rapidly changing.