Thread: postgresql + apache under heavy load
Hello, I am testing a web application (using the DBX PHP function to call a Postgresql backend). I have 375Mb RAM on my test home box. I ran ab (apache benchmark) to test the behaviour of the application under heavy load. When increasing the number of requests, all my memory is filled, and the Linux server begins to cache and remains frozen. ab -n 100 -c 10 http://localsite/testscript behaves OK. If I increases to ab -n 1000 -c 100 http://localsite/testscript I get this memory problem. If I eliminate the connection to the (UNIX) socket of Postgresql, the script behaves well even under very high load (and of course with much less time spent per request). I tried to change some parameters in postgresql.conf max_connections = 32 to max_connections = 8 and shared_buffers = 64 to shared_buffers = 16 without success. I tried to use pmap on httpd and postmaster Process ID but don't get much help. Does anybody have some idea to help to debug/understand/solve this issue? Any feedback is appreciated. To me, it would not be a problem if the box is very slow under heavy load (DoS like), but I really dislike having my box out of service after such a DoS attack. I am looking for a way to limit the memory used by postgres. Thanks Alex
On Wed, 21 Jan 2004, Alex Madon wrote: > Hello, > I am testing a web application (using the DBX PHP function to call a > Postgresql backend). I'm not familiar with DBX. Is that connection pooling or what? > I have 375Mb RAM on my test home box. > I ran ab (apache benchmark) to test the behaviour of the application > under heavy load. > When increasing the number of requests, all my memory is filled, and the > Linux server begins to cache and remains frozen. Are you SURE all your memory is in use? What exactly does top say about things like cached and buff memory (I'm assuming you're on linux, any differences in top on another OS would be minor.) If the kernel still shows a fair bit of cached and buff memory, your memory is not getting all used up. > ab -n 100 -c 10 http://localsite/testscript > behaves OK. Keep in mind, this is 10 simo users beating the machine continuously. that's functionally equivalent to about 100 to 200 people running through pages as fast as people can. > If I increases to > ab -n 1000 -c 100 http://localsite/testscript > I get this memory problem. Where's the break point? Just wondering. Does it show up at 20, 40, 60, 80, or only at 100? If so, that's really not bad. > If I eliminate the connection to the (UNIX) socket of Postgresql, the > script behaves well even under very high load (and of course with much > less time spent per request). Of course, the database is the most expensive part of an application, CPU/Memory wise, written on apache/php > I tried to change some parameters in postgresql.conf > max_connections = 32 > to max_connections = 8 Wrong direction. The number of connections postgresql CAN create costs very little. The number of connections it does create, still, costs very little. Have you checked to see if ab is getting valid pages, and not "connection failed, too many connections already open" pages? > shared_buffers = 64 > to shared_buffers = 16 Way the wrong way. Shared buffers are the max memory all the backends together share. The old setting was 512k ram, now you're down to 128k. while 128k would be a lot of memory for a Commodore 128, for a machine with 384 meg ram, it's nothing. Since this is a TOTAL shared memory setting, not a per process thing, you can hand it a good chunk of ram and not usually worry about it. Set it to 512 and just leave it. That's only 4 megs of shared memory, if your machine is running that low, other things have gone wrong. > without success. > > I tried to use pmap on httpd and postmaster Process ID but don't get > much help. > > Does anybody have some idea to help to debug/understand/solve this > issue? Any feedback is appreciated. > To me, it would not be a problem if the box is very slow under heavy > load (DoS like), but I really dislike having my box out of service after > such a DoS attack. Does it not come back? That's bad. > I am looking for a way to limit the memory used by postgres. Don't it's likely not using too much. What does top say is the highest memory user?
Alex Madon wrote: > Hello, > I am testing a web application (using the DBX PHP function to call a > Postgresql backend). > I have 375Mb RAM on my test home box. > I ran ab (apache benchmark) to test the behaviour of the application > under heavy load. > When increasing the number of requests, all my memory is filled, and > the Linux server begins to cache and remains frozen. > > ab -n 100 -c 10 http://localsite/testscript > behaves OK. > > If I increases to > ab -n 1000 -c 100 http://localsite/testscript > I get this memory problem. We would need a lot more information. What version of Linux? What version of the Kernel? What is your shmmax settting? What is your sort_mem setting? Did you use top to see where the hang up? Are there any messages in /var/log/messages? Sincerely, Joshua D. Drake > > If I eliminate the connection to the (UNIX) socket of Postgresql, the > script behaves well even under very high load (and of course with much > less time spent per request). > > I tried to change some parameters in postgresql.conf > max_connections = 32 > to max_connections = 8 > > and > > shared_buffers = 64 > to shared_buffers = 16 > > without success. > > I tried to use pmap on httpd and postmaster Process ID but don't get > much help. > > Does anybody have some idea to help to debug/understand/solve this > issue? Any feedback is appreciated. > To me, it would not be a problem if the box is very slow under heavy > load (DoS like), but I really dislike having my box out of service > after such a DoS attack. > I am looking for a way to limit the memory used by postgres. > > Thanks > Alex > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if > your > joining column's datatypes do not match -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
On Wednesday 21 January 2004 14:11, Alex Madon wrote: > Hello, > I am testing a web application (using the DBX PHP function to call a > Postgresql backend). > I have 375Mb RAM on my test home box. [10 connections is fine, 100 is not] > I tried to change some parameters in postgresql.conf > max_connections = 32 > to max_connections = 8 Are you saying you had more than 8 connections open simultaneously? Are you sure you restarted PG so that it noticed the new values? You can check config settings with "show all;" from psql, or "show <setting>". You'll want to use the "top" command to show the amount of memory each process is using and then check the configuration/tuning articles at the following URL: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php First step is to make sure your changes are being detected. Then, I'd guess you want to set: max_connections shared_buffers sort_mem vacuum_mem (less important) and then adjust effective_cache_size so it matches your normal load. -- Richard Huxton Archonet Ltd
Could be problem be that PHP is not using connection efficiently? Apache KeepAlive with PHP, is a dual edged sword with you holding the blade :-) If I am not mistaken, what happens is that a connection is kept alive because Apache believes that other requests will come in from the client who made the initial connection. So 10 concurrent connections are fine, but they are not released timely enough with 100 concurrent connections. The system ends up waiting around for other KeepAlive connections to timeout before Apache allows others to come in. We had this exact problem in an environment with millions of impressions per day going to the database. Because of the nature of our business, we were able to disable KeepAlive and the load immediately dropped (concurrent connection on the Postgresql database also dropped sharply). We also turned off PHP persistent connections to the database. The drawback is that connections are built up and torn down all the time, and with Postgresql, it is sort of expensive. But thats a fraction of the expense of having KeepAlive on. Warmest regards, Ericson Smith Tracking Specialist/DBA +-----------------------+--------------------------------------+ | http://www.did-it.com | "Crush my enemies, see then driven | | eric@did-it.com | before me, and hear the lamentations | | 516-255-0500 | of their women." - Conan | +-----------------------+--------------------------------------+ Alex Madon wrote: > Hello, > I am testing a web application (using the DBX PHP function to call a > Postgresql backend). > I have 375Mb RAM on my test home box. > I ran ab (apache benchmark) to test the behaviour of the application > under heavy load. > When increasing the number of requests, all my memory is filled, and > the Linux server begins to cache and remains frozen. > > ab -n 100 -c 10 http://localsite/testscript > behaves OK. > > If I increases to > ab -n 1000 -c 100 http://localsite/testscript > I get this memory problem. > > If I eliminate the connection to the (UNIX) socket of Postgresql, the > script behaves well even under very high load (and of course with much > less time spent per request). > > I tried to change some parameters in postgresql.conf > max_connections = 32 > to max_connections = 8 > > and > > shared_buffers = 64 > to shared_buffers = 16 > > without success. > > I tried to use pmap on httpd and postmaster Process ID but don't get > much help. > > Does anybody have some idea to help to debug/understand/solve this > issue? Any feedback is appreciated. > To me, it would not be a problem if the box is very slow under heavy > load (DoS like), but I really dislike having my box out of service > after such a DoS attack. > I am looking for a way to limit the memory used by postgres. > > Thanks > Alex > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if > your > joining column's datatypes do not match >
Attachment
Hello Scott, Thank you for your answer. >I'm not familiar with DBX. Is that connection pooling or what? > > I could not find this information, sorry. >Are you SURE all your memory is in use? What exactly does top say about >things like cached and buff memory (I'm assuming you're on linux, any >differences in top on another OS would be minor.) If the kernel still >shows a fair bit of cached and buff memory, your memory is not getting all >used up. > Well my xosview show that caching begin at a concurrency of 40. At 80 my cache begins to be filled completely, so machine having big problems. >>If I increases to >>ab -n 1000 -c 100 http://localsite/testscript >>I get this memory problem. >> >> > >Where's the break point? Just wondering. Does it show up at 20, 40, 60, >80, or only at 100? If so, that's really not bad. > Here is some results (I kept -n 100 an just vraied the -c option) --c 1 Failed requests: 0 Time per request: 322.096 [ms] (mean, across all concurrent requests) -c 2 Failed requests: 0 Time per request: 374.220 [ms] (mean, across all concurrent requests) -c 10 Failed requests: 68 (Connect: 0, Length: 68, Exceptions: 0) Time per request: 314.779 [ms] (mean, across all concurrent requests) -c 20 Failed requests: 68 Time per request: 369.290 [ms] (mean, across all concurrent requests) -c 30 Failed requests: 43 Time per request: 441.947 [ms] (mean, across all concurrent requests) =====Here begins caching to disk==== -c 40 Failed requests: 65 Time per request: 528.829 [ms] (mean, across all concurrent requests) -c 50 Failed requests: 66 Time per request: 993.674 [ms] (mean, across all concurrent requests) For a higher concurrency, the cache is completly filled, and have to reboot the machine. (I didn't leave the system caching forever, just press to reboot button)... could be interesting to wait to see if the systems recovers after a while >>To me, it would not be a problem if the box is very slow under heavy >>load (DoS like), but I really dislike having my box out of service after >>such a DoS attack. >> >> > >Does it not come back? That's bad. > > see above thanks Alex
On Wed, 21 Jan 2004, Alex Madon wrote: > Hello Scott, > Thank you for your answer. > > >I'm not familiar with DBX. Is that connection pooling or what? > > > > > I could not find this information, sorry. > > >Are you SURE all your memory is in use? What exactly does top say about > >things like cached and buff memory (I'm assuming you're on linux, any > >differences in top on another OS would be minor.) If the kernel still > >shows a fair bit of cached and buff memory, your memory is not getting all > >used up. > > > Well my xosview show that caching begin at a concurrency of 40. > At 80 my cache begins to be filled completely, so machine having big > problems. I think you're confusing what I meant. Caching is good. Swapping is bad. Having a large amount of cache is a good thing. It means the OS is caching all your data in memory for faster access. > >>If I increases to > >>ab -n 1000 -c 100 http://localsite/testscript > >>I get this memory problem. > >> > >> > > > >Where's the break point? Just wondering. Does it show up at 20, 40, 60, > >80, or only at 100? If so, that's really not bad. > > > Here is some results (I kept -n 100 an just vraied the -c option) > --c 1 > Failed requests: 0 > Time per request: 322.096 [ms] (mean, across all concurrent requests) > > -c 2 > Failed requests: 0 > Time per request: 374.220 [ms] (mean, across all concurrent requests) > > -c 10 > Failed requests: 68 > (Connect: 0, Length: 68, Exceptions: 0) > Time per request: 314.779 [ms] (mean, across all concurrent requests) OK, there's a problem, you're getting failed requests at -c 10, which means you likely have postgresql configured in the wrong direction. configure postgresql to use more memory (sort_mem can be set to about 8 megs without a lot of issues on most boxes, going higher may use up all your memory in certain situations (high concurrency)). > For a higher concurrency, the cache is completly filled, and have to > reboot the machine. No, you should NEVER have to reboot a unix box. period. filled cache, again, is a GOOD THING. not bad. > (I didn't leave the system caching forever, just press to reboot > button)... could be interesting to wait to see if the systems recovers > after a while Yes, please do. Also, show us a save of top while under load. I'm betting your machine has plenty of memory, and is not using it effectively, due to postgresql being too conservatively configured.
Hello Joshua, Thank you for your reply. Joshua D. Drake wrote: > > > We would need a lot more information. What version of Linux? uname -a Linux daube 2.4.20-8 #1 Thu Mar 13 17:18:24 EST 2003 i686 athlon i386 GNU/Linux > What version of the Kernel? What is your shmmax settting? cat /proc/sys/kernel/shmmax 33554432 > > What is your sort_mem setting? I didn't change the postgresql.conf settings: #sort_mem = 1024 # min 64, size in KB > Did you use top to see where the hang up? Are there any messages in > /var/log/messages? Well as I said before the box is almost out of control: disk is caching intensively; I run X Windows and the mouse can not point a shell... very bad. The only thing I see is that cache is filling quickly with xosview... and then X become frozen (or better said extremely slow). Abou the logs: I sent the PHP error messages to a file, and yes there are errors: pg_connect(): Unable to connect to PostgreSQL server: FATAL: Non-superuser connection limit exceeded or pg_connect(): Unable to connect to PostgreSQL server: FATAL: Sorry, too many clients already Thanks Alex
Hello Richard, Thank you for your answer. Richard Huxton wrote: >Are you saying you had more than 8 connections open simultaneously? > Well, I don't know how to find that out. What I did is issuing several tomes a ps aux and I never saw more than 6-7 postmaster SELECT. >Are you >sure you restarted PG so that it noticed the new values? You can check config >settings with "show all;" from psql, or "show <setting>". > Yes I restart it. The show command outputs the correct value (8). > >You'll want to use the "top" command to show the amount of memory each process > A typical output (in a concurrency of 20, no cache) is: ps aux | grep postgres postgres 2332 0.0 0.0 8804 328 ? R 18:54 0:01 /usr/bin/postmaster -p 5432 -d2 postgres 2334 0.0 0.0 9792 68 ? S 18:54 0:00 postgres: stats buffer process postgres 2335 0.0 0.0 8828 200 ? S 18:54 0:00 postgres: stats collector process postgres 4386 0.0 0.2 4312 956 pts/3 S 19:22 0:00 -bash postgres 4871 0.0 0.5 9480 2304 ? S 20:36 0:00 postgres: user db [local] SELECT postgres 4873 0.0 0.2 8816 1032 ? R 20:36 0:00 postgres: user db [local] startup myuser 4877 0.0 0.1 3572 624 pts/4 S 20:36 0:00 grep postgres postgres 4878 0.0 0.5 9220 2228 ? R 20:36 0:00 postgres: user db [local] SELECT postgres 4879 0.0 0.5 9204 2016 ? R 20:36 0:00 postgres: user db [local] SELECT ---------------------------top----------------------------- 114 processes: 99 sleeping, 12 running, 3 zombie, 0 stopped CPU states: 91.8% user 8.1% system 0.0% nice 0.0% iowait 0.0% idle Mem: 384580k av, 316328k used, 68252k free, 0k shrd, 25424k buff 253976k actv, 36916k in_d, 4704k in_c Swap: 265064k av, 64788k used, 200276k free 71132k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 4914 apache 16 0 9016 8416 2552 S 6.7 2.1 0:00 0 httpd 4832 apache 16 0 9016 8416 2552 S 6.3 2.1 0:01 0 httpd 4915 apache 16 0 9016 8416 2552 S 5.9 2.1 0:00 0 httpd 4917 apache 16 0 9016 8416 2552 S 5.9 2.1 0:00 0 httpd 4919 apache 16 0 9020 8420 2536 S 5.9 2.1 0:00 0 httpd 4774 apache 16 0 9016 8416 2552 S 5.7 2.1 0:02 0 httpd 4896 apache 16 0 9060 8460 2568 S 5.7 2.1 0:00 0 httpd 4908 apache 15 0 9016 8416 2552 S 5.7 2.1 0:00 0 httpd 4909 apache 16 0 9016 8416 2552 S 5.7 2.1 0:00 0 httpd 4658 apache 16 0 9136 8536 2568 S 5.5 2.2 0:04 0 httpd 4921 apache 16 0 9016 8416 2552 S 5.5 2.1 0:00 0 httpd 2581 root 16 0 14492 4544 1252 R 5.3 1.1 2:26 0 X 4795 apache 16 0 9104 8504 2568 S 5.3 2.2 0:02 0 httpd 4796 apache 16 0 9080 8480 2568 S 5.3 2.2 0:01 0 httpd 4782 apache 16 0 8924 8324 2568 R 3.5 2.1 0:02 0 httpd 2612 madona 15 0 4524 4136 2380 S 1.5 1.0 0:18 0 metacity 4656 apache 15 0 9084 8484 2568 S 1.3 2.2 0:03 0 httpd 4950 postgres 25 0 0 0 0 Z 1.1 0.0 0:00 0 postmaster <defunct> 3812 madona 15 0 44728 42M 17460 S 0.7 11.2 3:21 0 mozilla-bin 4947 postgres 25 0 2540 2392 1688 S 0.7 0.6 0:00 0 postmaster 4952 postgres 25 0 2812 2664 1872 R 0.7 0.6 0:00 0 postmaster 4610 madona 15 0 7460 7460 2152 R 0.5 1.9 0:00 0 xterm 4904 madona 15 0 1108 1108 856 R 0.3 0.2 0:00 0 top 4954 postgres 24 0 1916 1768 1244 R 0.1 0.4 0:00 0 postmaster 4959 postgres 25 0 1596 1448 940 S 0.1 0.3 0:00 0 postmaster 4961 postgres 25 0 984 824 640 R 0.1 0.2 0:00 0 postmaster 1 root 15 0 88 60 40 S 0.0 0.0 0:04 0 init Thanks Alex
Hello Ericson, Thank you for your reply. Ericson Smith wrote: > Could be problem be that PHP is not using connection efficiently? > Apache KeepAlive with PHP, is a dual edged sword with you holding the > blade :-) I turned off the KeepAlive option in httpd.conf [ I think keepalive is not used by default by "ab" and that apche uses it only on static content)-- see the last paragraph of: http://httpd.apache.org/docs/keepalive.html ] and set pgsql.allow_persistent = Off in php.ini, it didn't work for me. thanks Alex > > If I am not mistaken, what happens is that a connection is kept alive > because Apache believes that other requests will come in from the > client who made the initial connection. So 10 concurrent connections > are fine, but they are not released timely enough with 100 concurrent > connections. The system ends up waiting around for other KeepAlive > connections to timeout before Apache allows others to come in. We had > this exact problem in an environment with millions of impressions per > day going to the database. Because of the nature of our business, we > were able to disable KeepAlive and the load immediately dropped > (concurrent connection on the Postgresql database also dropped > sharply). We also turned off PHP persistent connections to the database. > > The drawback is that connections are built up and torn down all the > time, and with Postgresql, it is sort of expensive. But thats a > fraction of the expense of having KeepAlive on. > > Warmest regards, Ericson Smith > Tracking Specialist/DBA > +-----------------------+--------------------------------------+ > | http://www.did-it.com | "Crush my enemies, see then driven | > | eric@did-it.com | before me, and hear the lamentations | > | 516-255-0500 | of their women." - Conan | > +-----------------------+--------------------------------------+ > > > Alex Madon wrote: > >> Hello, >> I am testing a web application (using the DBX PHP function to call a >> Postgresql backend). >> I have 375Mb RAM on my test home box. >> I ran ab (apache benchmark) to test the behaviour of the application >> under heavy load. >> When increasing the number of requests, all my memory is filled, and >> the Linux server begins to cache and remains frozen. >> >> ab -n 100 -c 10 http://localsite/testscript >> behaves OK. >> >> If I increases to >> ab -n 1000 -c 100 http://localsite/testscript >> I get this memory problem. >> >> If I eliminate the connection to the (UNIX) socket of Postgresql, the >> script behaves well even under very high load (and of course with >> much less time spent per request). >> >> I tried to change some parameters in postgresql.conf >> max_connections = 32 >> to max_connections = 8 >> >> and >> >> shared_buffers = 64 >> to shared_buffers = 16 >> >> without success. >> >> I tried to use pmap on httpd and postmaster Process ID but don't get >> much help. >> >> Does anybody have some idea to help to debug/understand/solve this >> issue? Any feedback is appreciated. >> To me, it would not be a problem if the box is very slow under heavy >> load (DoS like), but I really dislike having my box out of service >> after such a DoS attack. >> I am looking for a way to limit the memory used by postgres. >> >> Thanks >> Alex >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: the planner will ignore your desire to choose an index scan if >> your >> joining column's datatypes do not match >>
Hello Scott, Thank you for your reply. > >I think you're confusing what I meant. Caching is good. Swapping is bad. >Having a large amount of cache is a good thing. It means the OS is >caching all your data in memory for faster access. > OK; Sorry about my approximative language. I was meaning, caching to disk... so better said as you point out is swapping. So my system is filling the RAM (OK to me) and then begins to fill all the Swap. When the swap is used at near 100%, the system becomes extremely difficult to control (not actually crashing, but accessing disk very noisily). > >OK, there's a problem, you're getting failed requests at -c 10, which >means you likely have postgresql configured in the wrong >direction. configure postgresql to use more memory (sort_mem can be set >to about 8 megs without a lot of issues on most boxes, going higher may >use up all your memory in certain situations (high concurrency)). > Thank you! I setting sort_mem to 8 Mb and max_connections = 32, I have zero failed with a c=20 and 1 failed with a c=30! Decreasing max_connections to 8 gave me a lot of failed as before. > >No, you should NEVER have to reboot a unix box. period. filled cache, >again, is a GOOD THING. not bad. > sorry, I was meaning swapping. >>button)... could be interesting to wait to see if the systems recovers >>after a while > > >Yes, please do. Also, show us a save of top while under load. Well, increasing the sort_mem and max_connections was very good to have less failed, but this does not solve the memory limitation. Perhaps that comes from my superficial understanding of concurrency. Does anybody have a simple formula to know how much memory a ab benchmark will need in this case (assuming one knows the memory needed by one apache process (A) and the memory used by one postgres process (P)? I would say, issuing a ab -n N -c C (asuming there is no shared library between apache and postgres) memory needed= CxA + CxP (independent of N?) Just another pedestrian question: what usually happens to a linux box when swap is exhausted? Is the parent process killed? Could this crash the whole machine? My conclusion up to now, is that this box does not stand more than 40 concurrent connections. To stand more concurrent connections, I'd need to increase the RAM or redesign my software, e.g. using a PHP cache, static pages being served much faster (almost by a factor 100...) Thanks Alex
On Wed, 2004-01-21 at 13:48, Alex Madon wrote: > Hello Scott, > Thank you for your answer. > > >I'm not familiar with DBX. Is that connection pooling or what? > > > > > I could not find this information, sorry. > It's generic database access functions written in C. It's not connection pooling. http://www.php.net/manual/en/ref.dbx.php Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
On Wednesday 21 January 2004 19:40, Alex Madon wrote: Alex - I'm copying scott in on this since he asked for details on top too. > > Richard Huxton wrote: > >Are you saying you had more than 8 connections open simultaneously? > > Well, I don't know how to find that out. > What I did is issuing several tomes a ps aux and I never saw more than > 6-7 postmaster SELECT. Good enough for me - that's what I'd have done. > >Are you > >sure you restarted PG so that it noticed the new values? You can check > > config settings with "show all;" from psql, or "show <setting>". > > Yes I restart it. The show command outputs the correct value (8). Good - I don't think PG is using up your system memory, then (but we'll see). > ---------------------------top----------------------------- > 114 processes: 99 sleeping, 12 running, 3 zombie, 0 stopped > CPU states: 91.8% user 8.1% system 0.0% nice 0.0% iowait 0.0% idle > Mem: 384580k av, 316328k used, 68252k free, 0k shrd, 25424k > buff > 253976k actv, 36916k in_d, 4704k in_c > Swap: 265064k av, 64788k used, 200276k free 71132k > cached If I'm decoding this through the email line wrapping you've got 71MB of disk cached in RAM and 25MB used as i/o buffers (which is a good thing). I'm a little puzzled as to why your swap is so high, but as long as it isn't growing rapidly there shouldn't be a problem. > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND > 4914 apache 16 0 9016 8416 2552 S 6.7 2.1 0:00 0 httpd > 4832 apache 16 0 9016 8416 2552 S 6.3 2.1 0:01 0 httpd OK - so each httpd process is using about 8500KB with 2500KB shared = 6MB each. > 4947 postgres 25 0 2540 2392 1688 S 0.7 0.6 0:00 0 > postmaster > 4952 postgres 25 0 2812 2664 1872 R 0.7 0.6 0:00 0 > postmaster > 4954 postgres 24 0 1916 1768 1244 R 0.1 0.4 0:00 0 postmaster > 4959 postgres 25 0 1596 1448 940 S 0.1 0.3 0:00 0 postmaster > 4961 postgres 25 0 984 824 640 R 0.1 0.2 0:00 0 postmaster Here we've got what appears to be about 1MB per PG backend. Since you've only got 8, that should be fine. [I noticed you were running X and mozilla (on top of gnome?). If you don't need them, close them - that can free up to 64MB, depending on what you're running] I don't think you've got a RAM shortage, I think your settings are too small and you're running out of I/O bandwidth. If you run "vmstat 5", what does it show - you're interested in the i/o section: blocks in (bi) and out (bo) as well as memory. Save your readings, then follow the advice in that GeneralBits link I posted earlier and see if that makes a difference. -- Richard Huxton Archonet Ltd
On Wed, 21 Jan 2004, Alex Madon wrote: > >You'll want to use the "top" command to show the amount of memory each process > > > > A typical output (in a concurrency of 20, no cache) is: > ps aux | grep postgres > postgres 2332 0.0 0.0 8804 328 ? R 18:54 0:01 > /usr/bin/postmaster -p 5432 -d2 > postgres 2334 0.0 0.0 9792 68 ? S 18:54 0:00 postgres: > stats buffer process > postgres 2335 0.0 0.0 8828 200 ? S 18:54 0:00 postgres: > stats collector process > postgres 4386 0.0 0.2 4312 956 pts/3 S 19:22 0:00 -bash > postgres 4871 0.0 0.5 9480 2304 ? S 20:36 0:00 postgres: > user db [local] SELECT > postgres 4873 0.0 0.2 8816 1032 ? R 20:36 0:00 postgres: > user db [local] startup > myuser 4877 0.0 0.1 3572 624 pts/4 S 20:36 0:00 grep postgres > postgres 4878 0.0 0.5 9220 2228 ? R 20:36 0:00 postgres: > user db [local] SELECT > postgres 4879 0.0 0.5 9204 2016 ? R 20:36 0:00 postgres: > user db [local] SELECT > ---------------------------top----------------------------- > 114 processes: 99 sleeping, 12 running, 3 zombie, 0 stopped > CPU states: 91.8% user 8.1% system 0.0% nice 0.0% iowait 0.0% idle > Mem: 384580k av, 316328k used, 68252k free, 0k shrd, 25424k > buff > 253976k actv, 36916k in_d, 4704k in_c > Swap: 265064k av, 64788k used, 200276k free 71132k > cached > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND > 4914 apache 16 0 9016 8416 2552 S 6.7 2.1 0:00 0 httpd > 4832 apache 16 0 9016 8416 2552 S 6.3 2.1 0:01 0 httpd > 4915 apache 16 0 9016 8416 2552 S 5.9 2.1 0:00 0 httpd > 4917 apache 16 0 9016 8416 2552 S 5.9 2.1 0:00 0 httpd > 4919 apache 16 0 9020 8420 2536 S 5.9 2.1 0:00 0 httpd > 4774 apache 16 0 9016 8416 2552 S 5.7 2.1 0:02 0 httpd > 4896 apache 16 0 9060 8460 2568 S 5.7 2.1 0:00 0 httpd > 4908 apache 15 0 9016 8416 2552 S 5.7 2.1 0:00 0 httpd > 4909 apache 16 0 9016 8416 2552 S 5.7 2.1 0:00 0 httpd > 4658 apache 16 0 9136 8536 2568 S 5.5 2.2 0:04 0 httpd > 4921 apache 16 0 9016 8416 2552 S 5.5 2.1 0:00 0 httpd > 2581 root 16 0 14492 4544 1252 R 5.3 1.1 2:26 0 X > 4795 apache 16 0 9104 8504 2568 S 5.3 2.2 0:02 0 httpd > 4796 apache 16 0 9080 8480 2568 S 5.3 2.2 0:01 0 httpd > 4782 apache 16 0 8924 8324 2568 R 3.5 2.1 0:02 0 httpd > 2612 madona 15 0 4524 4136 2380 S 1.5 1.0 0:18 0 metacity > 4656 apache 15 0 9084 8484 2568 S 1.3 2.2 0:03 0 httpd > 4950 postgres 25 0 0 0 0 Z 1.1 0.0 0:00 0 > postmaster <defunct> > 3812 madona 15 0 44728 42M 17460 S 0.7 11.2 3:21 0 > mozilla-bin > 4947 postgres 25 0 2540 2392 1688 S 0.7 0.6 0:00 0 postmaster > 4952 postgres 25 0 2812 2664 1872 R 0.7 0.6 0:00 0 postmaster > 4610 madona 15 0 7460 7460 2152 R 0.5 1.9 0:00 0 xterm > 4904 madona 15 0 1108 1108 856 R 0.3 0.2 0:00 0 top > 4954 postgres 24 0 1916 1768 1244 R 0.1 0.4 0:00 0 postmaster > 4959 postgres 25 0 1596 1448 940 S 0.1 0.3 0:00 0 postmaster > 4961 postgres 25 0 984 824 640 R 0.1 0.2 0:00 0 postmaster > 1 root 15 0 88 60 40 S 0.0 0.0 0:04 0 init OK, the memory usage is NOT your problem. You have crashing postgresql backends, see the <defunct> and the listing of 3 zombie processes in top? That's bad. You very likely have bad memory in your box, or possibly other hardware problems. It could be software, if you've got a really odd configuration (i.e. compiling apache against one ldap sdk, and php against another can make apache start failing, something similar may be happening in postgresql, but I doubt it.) You should go to www.memtest86.com and download their free tester and see if your machine has any bad ram. The reason the machine will get really slow when this happens is that the postgresql database has to reset all the cache on all backends when one crashes to prevent corruption. The fact that X goes unresponsive is simply a load issue. Have a little more patience. Even with bad memory under it, linux / unix will usually come back from the brink once the load goes away. You need to make sure logging is enabled on postgresql and then you can look at the logs for clues as to why the backends are crashing.
On Wed, 21 Jan 2004, Richard Huxton wrote: > [I noticed you were running X and mozilla (on top of gnome?). If you don't > need them, close them - that can free up to 64MB, depending on what you're > running] Note that if you've got a 64 Meg memory video card, and X uses 4 megs of system ram, it will show, in linux, as using 68 Megs of ram. Turning off X, however, will only free up the 4 megs of system ram, not the 64 on the video card. so, it's not uncommon for folks to see a modern video card in their system, look at top, and go "geeze, X is a total memory hog" when in fact it's just that linux / top count the video memory as part of the memory allocated to X. That said, I'd at least run a less heavyweight window manager like afterstep or twm, not gnome or kde on a box like that one. > I don't think you've got a RAM shortage, I think your settings are too small > and you're running out of I/O bandwidth. If you run "vmstat 5", what does it > show - you're interested in the i/o section: blocks in (bi) and out (bo) as > well as memory. Also, look at si/so (swap in and out) to monitor swap activity. If those numbers are low, then swapping isn't an issue...
"scott.marlowe" <scott.marlowe@ihs.com> writes: > OK, the memory usage is NOT your problem. You have crashing postgresql > backends, see the <defunct> and the listing of 3 zombie processes in top? > That's bad. You very likely have bad memory in your box, or possibly > other hardware problems. While that could be true, the presence of a few zombie processes hardly proves it. What that says to me is that the box is so loaded that the postmaster isn't able to reap dead children instantaneously --- that is, what we see is top running between the time that a backend quits and the time the postmaster next gets to run. While we don't have a lot of data for determining why the overload, the top output: >> CPU states: 91.8% user 8.1% system 0.0% nice 0.0% iowait 0.0% idle sure looks like the bottleneck is CPU cycles, not disk (and not RAM either, since RAM shortage would lead to swapping and hence disk waits). Furthermore, the listing looks like it is the httpd processes that are sucking CPU, not Postgres. I think this is actually not Postgres' problem at all, but some inefficiency in the site's Web code. regards, tom lane
Well the top issued was corresponding to a concurrency of 20 (so a level where swap is not sollicited). With a concurrency of 60, swap is very high Here is a "movie" of how swap is maanged (a snapshot every 5 sec) Swap: 265064k av, 110592k used, 154472k free 48384k cached Swap: 265064k av, 110592k used, 154472k free 48404k cached Swap: 265064k av, 107064k used, 158000k free 48452k cached Swap: 265064k av, 104816k used, 160248k free 48484k cached Swap: 265064k av, 246496k used, 18568k free 31044k cached Swap: 265064k av, 190364k used, 74700k free 31356k cached Swap: 265064k av, 186512k used, 78552k free 31480k cached Swap: 265064k av, 181592k used, 83472k free 31588k cached Swap: 265064k av, 173924k used, 91140k free 31692k cached Swap: 265064k av, 168384k used, 96680k free 31864k cached Swap: 265064k av, 165444k used, 99620k free 31872k cached Swap: 265064k av, 163084k used, 101980k free 32004k cached Swap: 265064k av, 160592k used, 104472k free 32124k cached Swap: 265064k av, 158116k used, 106948k free 32136k cached Swap: 265064k av, 155692k used, 109372k free 32212k cached Swap: 265064k av, 153008k used, 112056k free 32252k cached and the parallel with Mem Mem: 384580k av, 202952k used, 181628k free, 0k shrd, 20876k buff Swap: 265064k av, 110592k used, 154472k free 48384k cached Mem: 384580k av, 219072k used, 165508k free, 0k shrd, 20880k buff Swap: 265064k av, 110592k used, 154472k free 48404k cached Mem: 384580k av, 284076k used, 100504k free, 0k shrd, 20888k buff Swap: 265064k av, 107064k used, 158000k free 48452k cached Mem: 384580k av, 353520k used, 31060k free, 0k shrd, 20892k buff Swap: 265064k av, 104816k used, 160248k free 48484k cached Mem: 384580k av, 380108k used, 4472k free, 0k shrd, 6352k buff Swap: 265064k av, 246496k used, 18568k free 31044k cached Mem: 384580k av, 210008k used, 174572k free, 0k shrd, 6372k buff Swap: 265064k av, 190364k used, 74700k free 31356k cached Mem: 384580k av, 206024k used, 178556k free, 0k shrd, 6380k buff Swap: 265064k av, 186512k used, 78552k free 31480k cached Mem: 384580k av, 210008k used, 174572k free, 0k shrd, 6396k buff Swap: 265064k av, 181592k used, 83472k free 31588k cached Mem: 384580k av, 201612k used, 182968k free, 0k shrd, 6408k buff Swap: 265064k av, 173924k used, 91140k free 31692k cached Mem: 384580k av, 192464k used, 192116k free, 0k shrd, 6420k buff Swap: 265064k av, 168384k used, 96680k free 31864k cached Mem: 384580k av, 190504k used, 194076k free, 0k shrd, 6428k buff Swap: 265064k av, 165444k used, 99620k free 31872k cached Mem: 384580k av, 190956k used, 193624k free, 0k shrd, 6444k buff Swap: 265064k av, 163084k used, 101980k free 32004k cached Mem: 384580k av, 193244k used, 191336k free, 0k shrd, 6452k buff Swap: 265064k av, 160592k used, 104472k free 32124k cached Mem: 384580k av, 195924k used, 188656k free, 0k shrd, 6460k buff Swap: 265064k av, 158116k used, 106948k free 32136k cached Mem: 384580k av, 198428k used, 186152k free, 0k shrd, 6468k buff Swap: 265064k av, 155692k used, 109372k free 32212k cached One can see that at the maximum feeling of swap (74700k free swap), the full picture is: 22:51:54 up 3:58, 6 users, load average: 47.38, 18.53, 7.79 131 processes: 130 sleeping, 1 running, 0 zombie, 0 stopped CPU states: 5.3% user 3.0% system 0.0% nice 0.0% iowait 91.6% idle Mem: 384580k av, 210008k used, 174572k free, 0k shrd, 6372k buff 158748k actv, 14556k in_d, 1412k in_c Swap: 265064k av, 190364k used, 74700k free 31356k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 7 root 16 0 0 0 0 SW 1.2 0.0 0:07 0 kscand/Normal 5 root 15 0 0 0 0 SW 1.0 0.0 0:01 0 kswapd 7050 apache 15 0 8016 5896 1580 D 1.0 1.5 0:00 0 httpd 3870 madona 15 0 6540 1440 472 D 0.6 0.3 0:07 0 xterm 7032 apache 15 0 8336 7568 1980 S 0.6 1.9 0:00 0 httpd 7051 apache 15 0 4784 1640 280 D 0.6 0.4 0:00 0 httpd 2581 root 15 0 15928 1452 704 S 0.5 0.3 5:40 0 X 6985 madona 16 0 788 732 476 R 0.5 0.1 0:00 0 top 7023 apache 15 0 7956 7160 1736 S 0.4 1.8 0:00 0 httpd 7025 apache 15 0 7944 6816 1584 D 0.4 1.7 0:00 0 httpd 7027 apache 15 0 7808 6976 1588 D 0.4 1.8 0:00 0 httpd 7052 apache 15 0 6616 3584 404 D 0.3 0.9 0:00 0 httpd 7142 postgres 15 0 3420 3268 2460 S 0.3 0.8 0:00 0 postmaster 7020 apache 15 0 8092 5880 1980 D 0.2 1.5 0:00 0 httpd 7034 apache 15 0 8380 7660 2036 S 0.2 1.9 0:00 0 httpd 2626 madona 15 0 1316 396 292 S 0.1 0.1 0:00 0 magicdev 7001 apache 15 0 8016 5892 1580 D 0.1 1.5 0:00 0 httpd 7008 apache 15 0 8016 5784 1580 D 0.1 1.5 0:00 0 httpd 7024 apache 15 0 7944 6664 1584 D 0.1 1.7 0:00 0 httpd 7054 apache 15 0 8016 5636 1580 D 0.1 1.4 0:00 0 httpd 1 root 15 0 88 60 40 S 0.0 0.0 0:04 0 init 2 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd 3 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kapmd 4 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd_CPU0 9 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush 6 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kscand/DMA 8 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kscand/HighMem 10 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kupdated 11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd 15 root 15 0 0 0 0 SW 0.0 0.0 0:04 0 kjournald 72 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd 2066 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 eth1 2117 root 15 0 112 60 36 S 0.0 0.0 0:00 0 syslogd 2121 root 15 0 52 0 0 SW 0.0 0.0 0:00 0 klogd 2139 rpc 15 0 72 0 0 SW 0.0 0.0 0:00 0 portmap 2158 rpcuser 25 0 84 0 0 SW 0.0 0.0 0:00 0 rpc.statd 2208 root 25 0 52 4 0 S 0.0 0.0 0:00 0 apmd 2246 root 25 0 240 4 0 S 0.0 0.0 0:00 0 sshd 2260 root 15 0 120 4 0 S 0.0 0.0 0:00 0 xinetd 2269 root 15 0 88 56 36 S 0.0 0.0 0:00 0 gpm 2279 root 15 0 1436 216 152 S 0.0 0.0 0:00 0 httpd 2348 root 15 0 124 108 64 S 0.0 0.0 0:00 0 crond 2359 root 15 0 852 496 228 S 0.0 0.1 0:00 0 cupsd 2419 xfs 15 0 2696 188 188 S 0.0 0.0 0:32 0 xfs 2428 root 24 0 616 4 0 S 0.0 0.0 0:00 0 smbd 2432 root 15 0 824 356 260 S 0.0 0.0 0:00 0 nmbd 2450 daemon 15 0 64 12 12 S 0.0 0.0 0:00 0 atd 2458 root 16 0 224 0 0 SW 0.0 0.0 0:00 0 login 2459 root 22 0 48 4 0 S 0.0 0.0 0:00 0 mingetty 2460 root 22 0 52 4 0 S 0.0 0.0 0:00 0 mingetty 2461 root 22 0 52 4 0 S 0.0 0.0 0:00 0 mingetty 2462 root 22 0 52 4 0 S 0.0 0.0 0:00 0 mingetty 2463 root 22 0 52 4 0 S 0.0 0.0 0:00 0 mingetty 2464 madona 16 0 356 0 0 SW 0.0 0.0 0:00 0 bash 2569 madona 21 0 132 4 0 S 0.0 0.0 0:00 0 startx 2580 madona 20 0 72 0 0 SW 0.0 0.0 0:00 0 xinit 2590 madona 15 0 3412 240 236 S 0.0 0.0 0:01 0 gnome-session 2592 madona 15 0 5452 424 160 S 0.0 0.1 0:00 0 gconfd-2 2604 madona 15 0 236 72 40 S 0.0 0.0 0:00 0 ssh-agent 2608 madona 17 0 488 4 0 S 0.0 0.0 0:00 0 bonobo-activati 2610 madona 15 0 384 204 124 S 0.0 0.0 0:00 0 fam 2612 madona 16 0 4032 1916 1912 S 0.0 0.4 0:38 0 metacity 2614 madona 15 0 1620 268 264 S 0.0 0.0 0:00 0 gnome-settings- 2623 madona 15 0 720 396 288 S 0.0 0.1 0:01 0 xscreensaver 2633 madona 15 0 1276 688 528 S 0.0 0.1 0:00 0 pam-panel-icon 2634 root 15 0 152 136 88 S 0.0 0.0 0:00 0 pam_timestamp_c 2655 madona 15 0 5284 2428 1916 D 0.0 0.6 0:15 0 gnome-panel 2657 madona 15 0 2556 1176 804 S 0.0 0.3 0:01 0 multiload-apple 2659 madona 15 0 1808 452 448 S 0.0 0.1 0:00 0 notification-ar 2662 madona 15 0 5444 164 160 S 0.0 0.0 0:01 0 xterm 2664 madona 15 0 360 4 0 S 0.0 0.0 0:00 0 bash 3015 madona 15 0 9100 248 244 S 0.0 0.0 0:03 0 xterm 3017 madona 15 0 384 28 24 S 0.0 0.0 0:00 0 bash 3812 madona 15 0 35292 4908 4568 S 0.0 1.2 8:02 0 mozilla-bin 3872 madona 22 0 352 4 0 S 0.0 0.0 0:00 0 bash 3902 root 22 0 168 0 0 SW 0.0 0.0 0:00 0 su 3905 root 15 0 392 32 28 S 0.0 0.0 0:00 0 bash 4306 madona 15 0 5604 252 248 S 0.0 0.0 0:01 0 xterm 4308 madona 22 0 352 4 0 S 0.0 0.0 0:00 0 bash 4338 root 21 0 168 0 0 SW 0.0 0.0 0:00 0 su 4341 root 22 0 360 4 0 S 0.0 0.0 0:00 0 bash 4385 root 24 0 172 0 0 SW 0.0 0.0 0:00 0 su 4386 postgres 15 0 312 28 24 S 0.0 0.0 0:00 0 bash 4610 madona 15 0 5728 384 380 S 0.0 0.0 0:00 0 xterm 4612 madona 15 0 384 28 24 S 0.0 0.0 0:00 0 bash 6636 apache 15 0 6868 840 404 D 0.0 0.2 0:03 0 httpd 6820 postgres 15 0 808 560 468 S 0.0 0.1 0:00 0 postmaster 6821 postgres 15 0 1296 496 276 S 0.0 0.1 0:00 0 postmaster 6822 postgres 15 0 744 460 372 S 0.0 0.1 0:00 0 postmaster 6828 root 15 0 140 100 64 S 0.0 0.0 0:00 0 tail 6854 apache 15 0 6888 652 424 S 0.0 0.1 0:01 0 httpd 6867 apache 15 0 6816 740 352 D 0.0 0.1 0:01 0 httpd 6869 apache 15 0 6960 1084 640 D 0.0 0.2 0:00 0 httpd 6875 apache 15 0 6856 780 392 D 0.0 0.2 0:01 0 httpd 6882 apache 15 0 6700 436 236 S 0.0 0.1 0:01 0 httpd 6890 apache 15 0 7020 1184 700 D 0.0 0.3 0:01 0 httpd 6896 apache 15 0 6744 1084 424 D 0.0 0.2 0:01 0 httpd 6986 madona 15 0 816 592 180 S 0.0 0.1 0:00 0 ab 6993 apache 15 0 7084 1404 860 D 0.0 0.3 0:00 0 httpd 7002 apache 15 0 8016 2576 1580 D 0.0 0.6 0:00 0 httpd 7004 apache 15 0 8016 3104 1580 D 0.0 0.8 0:00 0 httpd 7007 postgres 15 0 3000 2576 2048 D 0.0 0.6 0:00 0 postmaster 7009 apache 15 0 8016 2956 1580 D 0.0 0.7 0:00 0 httpd 7011 apache 16 0 7956 7164 1736 S 0.0 1.8 0:00 0 httpd 7012 apache 15 0 8024 5364 1588 D 0.0 1.3 0:00 0 httpd 7021 apache 16 0 8376 7676 2032 S 0.0 1.9 0:00 0 httpd 7022 apache 15 0 8016 5140 1580 D 0.0 1.3 0:00 0 httpd 7029 apache 15 0 8336 7584 1980 S 0.0 1.9 0:00 0 httpd 7036 apache 15 0 8380 7632 2036 S 0.0 1.9 0:00 0 httpd 7053 apache 15 0 8016 4420 1580 D 0.0 1.1 0:00 0 httpd 7056 apache 15 0 8148 6292 1712 D 0.0 1.6 0:00 0 httpd 7059 apache 15 0 8240 7192 2020 S 0.0 1.8 0:00 0 httpd 7061 apache 15 0 8236 6436 1800 D 0.0 1.6 0:00 0 httpd 7062 apache 15 0 8240 7604 2020 S 0.0 1.9 0:00 0 httpd 7066 apache 16 0 8240 7544 2020 S 0.0 1.9 0:00 0 httpd 7068 postgres 15 0 3216 2580 2272 D 0.0 0.6 0:00 0 postmaster 7069 postgres 15 0 2908 2108 1960 S 0.0 0.5 0:00 0 postmaster 7070 postgres 15 0 2436 2104 1740 S 0.0 0.5 0:00 0 postmaster 7071 postgres 15 0 1884 1212 1188 S 0.0 0.3 0:00 0 postmaster 7079 postgres 15 0 3396 3244 2448 S 0.0 0.8 0:00 0 postmaster 7082 postgres 17 0 3412 3260 2460 S 0.0 0.8 0:00 0 postmaster 7086 postgres 16 0 3412 3232 2460 S 0.0 0.8 0:00 0 postmaster 7087 postgres 15 0 3424 3160 2468 S 0.0 0.8 0:00 0 postmaster 7091 postgres 15 0 3420 3264 2464 S 0.0 0.8 0:00 0 postmaster 7096 postgres 17 0 3412 3140 2456 S 0.0 0.8 0:00 0 postmaster 7097 postgres 15 0 3436 3196 2476 S 0.0 0.8 0:00 0 postmaster 7102 postgres 15 0 3436 3220 2480 S 0.0 0.8 0:00 0 postmaster 7112 postgres 15 0 3432 3280 2472 S 0.0 0.8 0:00 0 postmaster 7123 postgres 15 0 3440 3288 2476 S 0.0 0.8 0:00 0 postmaster 7124 postgres 15 0 3428 3276 2468 S 0.0 0.8 0:00 0 postmaster 7125 postgres 15 0 3424 3272 2464 S 0.0 0.8 0:00 0 postmaster After that swap and ram usage decrease.... and the ab ends up with a time out. For the gnome and X, this is just my development machine. On the web server itself, no X of course. I had also a closer look at postgresql logs and make logs more verbose. I saw a query taking 16 sec (it take normally few millisec): [7104] LOG: connection received: host=[local] [7104] LOG: connection authorized: user=myuser database=mydb [7104] LOG: query: begin; select getdatabaseencoding(); commit [7104] LOG: duration: 0.002152 sec [7104] LOG: query: select ---- myselect instruction ----- [7104] LOG: duration: 16.561325 sec [7104] LOG: query: select oid,typname from pg_type [7104] LOG: duration: 15.498828 sec [6820] DEBUG: child process (pid 7104) exited with exit code 0 Thanks everybody, Alex Tom Lane wrote: >"scott.marlowe" <scott.marlowe@ihs.com> writes: > > >>OK, the memory usage is NOT your problem. You have crashing postgresql >>backends, see the <defunct> and the listing of 3 zombie processes in top? >>That's bad. You very likely have bad memory in your box, or possibly >>other hardware problems. >> >> > >While that could be true, the presence of a few zombie processes hardly >proves it. What that says to me is that the box is so loaded that the >postmaster isn't able to reap dead children instantaneously --- that is, >what we see is top running between the time that a backend quits and the >time the postmaster next gets to run. > >While we don't have a lot of data for determining why the overload, >the top output: > > > >>>CPU states: 91.8% user 8.1% system 0.0% nice 0.0% iowait 0.0% idle >>> >>> > >sure looks like the bottleneck is CPU cycles, not disk (and not RAM >either, since RAM shortage would lead to swapping and hence disk waits). >Furthermore, the listing looks like it is the httpd processes that are >sucking CPU, not Postgres. I think this is actually not Postgres' >problem at all, but some inefficiency in the site's Web code. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > >
On Wed, 21 Jan 2004, Alex Madon wrote: > One can see that at the maximum feeling of swap (74700k free swap), the > full picture is: > > > 22:51:54 up 3:58, 6 users, load average: 47.38, 18.53, 7.79 > 131 processes: 130 sleeping, 1 running, 0 zombie, 0 stopped > CPU states: 5.3% user 3.0% system 0.0% nice 0.0% iowait 91.6% idle > Mem: 384580k av, 210008k used, 174572k free, 0k shrd, 6372k > buff > 158748k actv, 14556k in_d, 1412k in_c > Swap: 265064k av, 190364k used, 74700k free 31356k > cached > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND > 7 root 16 0 0 0 0 SW 1.2 0.0 0:07 0 > kscand/Normal > 5 root 15 0 0 0 0 SW 1.0 0.0 0:01 0 kswapd > 7050 apache 15 0 8016 5896 1580 D 1.0 1.5 0:00 0 httpd > 3870 madona 15 0 6540 1440 472 D 0.6 0.3 0:07 0 xterm > 7032 apache 15 0 8336 7568 1980 S 0.6 1.9 0:00 0 httpd > 7051 apache 15 0 4784 1640 280 D 0.6 0.4 0:00 0 httpd > 2581 root 15 0 15928 1452 704 S 0.5 0.3 5:40 0 X > 6985 madona 16 0 788 732 476 R 0.5 0.1 0:00 0 top > 7023 apache 15 0 7956 7160 1736 S 0.4 1.8 0:00 0 httpd > 7025 apache 15 0 7944 6816 1584 D 0.4 1.7 0:00 0 httpd > 7027 apache 15 0 7808 6976 1588 D 0.4 1.8 0:00 0 httpd > 7052 apache 15 0 6616 3584 404 D 0.3 0.9 0:00 0 httpd this is really strange. You've got 170Megs free memory, yet are going into a swapstorm. I had this problem with older kernels under rh7.2 (2.4.7 and 2.4.9) when accessing really large files but they went away with the latest one I'm running, which is 2.4.20. Are any of the underlying tables really large and maybe being seq scanned? I'm strictly guessing here. anyone else have any ideas? I'm stumped.
On Wed, Jan 21, 2004 at 04:47:57PM -0700, scott.marlowe wrote: > On Wed, 21 Jan 2004, Alex Madon wrote: > > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND > > 7 root 16 0 0 0 0 SW 1.2 0.0 0:07 0 > > kscand/Normal > > 5 root 15 0 0 0 0 SW 1.0 0.0 0:01 0 kswapd > > 7050 apache 15 0 8016 5896 1580 D 1.0 1.5 0:00 0 httpd > > 3870 madona 15 0 6540 1440 472 D 0.6 0.3 0:07 0 xterm > > 7032 apache 15 0 8336 7568 1980 S 0.6 1.9 0:00 0 httpd > > 7051 apache 15 0 4784 1640 280 D 0.6 0.4 0:00 0 httpd > > 2581 root 15 0 15928 1452 704 S 0.5 0.3 5:40 0 X > > 6985 madona 16 0 788 732 476 R 0.5 0.1 0:00 0 top > > 7023 apache 15 0 7956 7160 1736 S 0.4 1.8 0:00 0 httpd > > 7025 apache 15 0 7944 6816 1584 D 0.4 1.7 0:00 0 httpd > > 7027 apache 15 0 7808 6976 1588 D 0.4 1.8 0:00 0 httpd > > 7052 apache 15 0 6616 3584 404 D 0.3 0.9 0:00 0 httpd > > Are any of the underlying tables really large and maybe being seq scanned? > I'm strictly guessing here. anyone else have any ideas? I'm stumped. 'D' processes are probably I/O bound, but why would webserver processes be so busy reading from disk? 'postmaster' processes had 'S' status AFAIR, so the load is probably elsewhere, not Postgres. My guess would be that the Apache processes are doing something completely unexpected. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest "foo" someone someday shall type "supercalifragilisticexpialidocious" (5th Commandment for C programmers)
On Wednesday 21 January 2004 22:20, Alex Madon wrote: > Well the top issued was corresponding to a concurrency of 20 (so a level > where swap is not sollicited). > With a concurrency of 60, swap is very high > > Here is a "movie" of how swap is maanged (a snapshot every 5 sec) [used sits around 110MB, spikes to 240MB then gradually drops back to 153MB just under a minute later] > and the parallel with Mem [buffer memory plumets from 20MB down to 6MB, then stabilises around 6.5MB] OK - I'm guessing the two spikes in the figures are as the ab requests hit, spawning http daemons and postgres backends. > One can see that at the maximum feeling of swap (74700k free swap), the > full picture is: > > > 22:51:54 up 3:58, 6 users, load average: 47.38, 18.53, 7.79 > 131 processes: 130 sleeping, 1 running, 0 zombie, 0 stopped > CPU states: 5.3% user 3.0% system 0.0% nice 0.0% iowait 91.6% idle Although Tom pointed to the high CPU in a previous post, here you're not suffering on CPU. Run a "vmstat 5" while this is happening - that'll show us the details. -- Richard Huxton Archonet Ltd