Thread: Why is this system swapping?
Hello,
I am trying to understand what I need to do for this system to stop using swap. Maybe it’s something simple, or obvious for the situation. I’d appreciate some thoughts/suggestions.
Some background:
This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4…pretty heavy on concurrent usage. With peak traffic (db allows 1000 connections, in line with the number of app servers and connection pools for each) following is from ‘top’ (sorted by mem) Shared_buffers is 170MB, sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre channel storage, RAID10.
972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 57.2% 0.0% 23.2% 0.0% 3.6% 82.8% 232.4%
cpu00 22.0% 0.0% 9.1% 0.1% 0.9% 18.7% 48.8%
cpu01 17.5% 0.0% 5.8% 0.0% 2.3% 19.7% 54.4%
cpu02 7.8% 0.0% 3.7% 0.0% 0.0% 20.8% 67.5%
cpu03 9.7% 0.0% 4.4% 0.0% 0.5% 23.6% 61.5%
Mem: 12081744k av, 12055220k used, 26524k free, 0k shrd, 71828k buff
9020480k actv, 1741348k in_d, 237396k in_c
Swap: 4096532k av, 472872k used, 3623660k free 9911176k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
21397 postgres 22 0 181M 180M 175M D 25.9 1.5 85:17 0 postmaster
23820 postgres 15 0 178M 177M 175M S 0.0 1.5 1:53 3 postmaster
24428 postgres 15 0 178M 177M 175M S 0.0 1.5 1:35 3 postmaster
24392 postgres 15 0 178M 177M 175M S 2.7 1.5 2:07 2 postmaster
23610 postgres 15 0 178M 177M 175M S 0.0 1.5 0:29 2 postmaster
24395 postgres 15 0 178M 177M 175M S 0.0 1.5 1:12 1 postmaster
…
…
-bash-2.05b$ free
total used free shared buffers cached
Mem: 12081744 12055536 26208 0 66704 9943988
-/+ buffers/cache: 2044844 10036900
Swap: 4096532 512744 3583788
As you can see the system starts utilizing swap at some point, with so many processes. Some time ago we had decided to keep the connections from the pool open for longer periods of time, possibly to avoid connection maintenance overhead on the db. At that time the traffic was not as high as it is today, which might be causing this, because for the most part, non-idle postmaster processes are only a few, except when the system becomes busy and suddenly you see a lot of selects piling up, and load averages shooting upwards. I am thinking closing out connections sooner might help the system release some memory to the kernel. Swapping adds up to the IO, although OS is on separate channel than postgres.
I can add more memory, but I want to make sure I haven’t missed out something obvious.
Thanks!
Anjan
******************************************************************************************
This e-mail and any files transmitted with it are intended for the use of the
addressee(s) only and may be confidential and covered by the attorney/client
and other privileges. If you received this e-mail in error, please notify the
sender; do not disclose, copy, distribute, or take any action in reliance on
the contents of this information; and delete it from your system. Any other
use of this e-mail is prohibited.
******************************************************************************************
"Anjan Dave" <adave@vantage.com> writes: > Some background: > > This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4...pretty heavy > on concurrent usage. With peak traffic (db allows 1000 connections, in > line with the number of app servers and connection pools for each) > following is from 'top' (sorted by mem) Shared_buffers is 170MB, > sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre channel > storage, RAID10. > > 972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped > > CPU states: cpu user nice system irq softirq iowait idle > total 57.2% 0.0% 23.2% 0.0% 3.6% 82.8% 232.4% This looks to me like most of your server processes are sitting around idle most of the time. > 21397 postgres 22 0 181M 180M 175M D 25.9 1.5 85:17 0 > postmaster > > 23820 postgres 15 0 178M 177M 175M S 0.0 1.5 1:53 3 > postmaster So each process is taking up 8-11M of ram beyond the shared memory. 1,000 x 10M is 10G. Add in some memory for page tables and kernel data structures, as well as the kernel's need to keep some memory set aside for filesystem buffers (what you really want all that memory being used for anyways) and you've used up all your 12G. I would seriously look at tuning those connection pools down. A lot. If your server processes are sitting idle over half the time I would at least cut it by a factor of 2. Working the other direction: you have four processors (I guess you have hyperthreading turned off?) so ideally what you want is four runnable processes at all times and as few others as possible. If your load typically spends about half the time waiting on i/o (which is what that top output says) then you want a total of 8 connections. Realistically you might not be able to predict which app server will be providing the load at any given time, so you might want 8 connections per app server. And you might have some load that's more i/o intensive than the 50% i/o load shown here. Say you think some loads will be 80% i/o, you might want 20 connections for those loads. If you had 10 app servers with 20 connections each for a total of 200 connections I suspect that would be closer to right than having 1,000 connections. 200 connections would consume 2G of ram leaving you with 10G of filesystem cache. Which might in turn decrease the percentage of time waiting on i/o, which would decrease the number of processes you need even further... -- greg
On Apr 27, 2005, at 1:48 PM, Anjan Dave wrote: > As you can see the system starts utilizing swap at some point, with so > many processes. Some time ago we had decided to keep the connections > from the pool open for longer You've shown the system has used swap but not that it is swapping. Having swap in use is fine - there is likely plenty of code and whatnot that is not being used so it dumped it out to swap. However if you are actively moving data to/from swap that is bad. Very bad. Especially on linux. To tell if you are swapping you need to watch the output of say, vmstat 1 and look at the si and so columns. Linux is very swap happy and likes to swap things for fun and profit. -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
Sorry, I didn't attach vmstat, the system does actively swap pages. Not to the point where it crawls, but for some brief periods the console becomes a bit unresponsive. I am taking this as a sign to prevent future problems. anjan -----Original Message----- From: Jeff [mailto:threshar@torgo.978.org] Sent: Wednesday, April 27, 2005 2:30 PM To: Anjan Dave Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Why is this system swapping? On Apr 27, 2005, at 1:48 PM, Anjan Dave wrote: > As you can see the system starts utilizing swap at some point, with so > many processes. Some time ago we had decided to keep the connections > from the pool open for longer You've shown the system has used swap but not that it is swapping. Having swap in use is fine - there is likely plenty of code and whatnot that is not being used so it dumped it out to swap. However if you are actively moving data to/from swap that is bad. Very bad. Especially on linux. To tell if you are swapping you need to watch the output of say, vmstat 1 and look at the si and so columns. Linux is very swap happy and likes to swap things for fun and profit. -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
On Apr 27, 2005, at 2:29 PM, Greg Stark wrote: > "AI would seriously look at tuning those connection pools down. A lot. > If your > server processes are sitting idle over half the time I would at least > cut it > by a factor of 2. > Are you (Anjan) using real or fake connection pooling - ie pgpool versus php's persistent connections ? I'd strongly recommend looking at pgpool. it does connection pooling correctly (A set of X connections shared among the entire box rather than 1 per web server) -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
Yes, HT is turned off (I haven't seen any recommendations to keep it on). This is when we were seeing 30 to 50% less traffic (users) than today - we didn't want the idle connections in the pool to expire too soon (default 30 secs, after which it goes back to pool) and reopen it quickly, or not have sufficient available (default 20 conns, we raised it to 50), so we figured a number per app server (50) and set that to expire after a very long time, so as to avoid any overhead, and always have the connection available whenever needed, without opening a new one. But now, for *some* reason, in some part of the day, we use up almost all connections in each app's pool. After that since they are set to expire after a long time, they remain there, taking up DB resources. I will be trimming down the idle-timeout to a few minutes first, see if that helps. Thanks, Anjan -----Original Message----- From: Greg Stark [mailto:gsstark@mit.edu] Sent: Wednesday, April 27, 2005 2:29 PM To: Anjan Dave Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Why is this system swapping? "Anjan Dave" <adave@vantage.com> writes: > Some background: > > This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4...pretty heavy > on concurrent usage. With peak traffic (db allows 1000 connections, in > line with the number of app servers and connection pools for each) > following is from 'top' (sorted by mem) Shared_buffers is 170MB, > sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre channel > storage, RAID10. > > 972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped > > CPU states: cpu user nice system irq softirq iowait idle > total 57.2% 0.0% 23.2% 0.0% 3.6% 82.8% 232.4% This looks to me like most of your server processes are sitting around idle most of the time. > 21397 postgres 22 0 181M 180M 175M D 25.9 1.5 85:17 0 > postmaster > > 23820 postgres 15 0 178M 177M 175M S 0.0 1.5 1:53 3 > postmaster So each process is taking up 8-11M of ram beyond the shared memory. 1,000 x 10M is 10G. Add in some memory for page tables and kernel data structures, as well as the kernel's need to keep some memory set aside for filesystem buffers (what you really want all that memory being used for anyways) and you've used up all your 12G. I would seriously look at tuning those connection pools down. A lot. If your server processes are sitting idle over half the time I would at least cut it by a factor of 2. Working the other direction: you have four processors (I guess you have hyperthreading turned off?) so ideally what you want is four runnable processes at all times and as few others as possible. If your load typically spends about half the time waiting on i/o (which is what that top output says) then you want a total of 8 connections. Realistically you might not be able to predict which app server will be providing the load at any given time, so you might want 8 connections per app server. And you might have some load that's more i/o intensive than the 50% i/o load shown here. Say you think some loads will be 80% i/o, you might want 20 connections for those loads. If you had 10 app servers with 20 connections each for a total of 200 connections I suspect that would be closer to right than having 1,000 connections. 200 connections would consume 2G of ram leaving you with 10G of filesystem cache. Which might in turn decrease the percentage of time waiting on i/o, which would decrease the number of processes you need even further... -- greg
Using Resin's connection pooling. We are looking into pgpool alongside slony to separate some reporting functionality. -anjan -----Original Message----- From: Jeff [mailto:threshar@torgo.978.org] Sent: Wednesday, April 27, 2005 3:29 PM To: Greg Stark Cc: Anjan Dave; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Why is this system swapping? On Apr 27, 2005, at 2:29 PM, Greg Stark wrote: > "AI would seriously look at tuning those connection pools down. A lot. > If your > server processes are sitting idle over half the time I would at least > cut it > by a factor of 2. > Are you (Anjan) using real or fake connection pooling - ie pgpool versus php's persistent connections ? I'd strongly recommend looking at pgpool. it does connection pooling correctly (A set of X connections shared among the entire box rather than 1 per web server) -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
Jeff <threshar@torgo.978.org> writes: > Are you (Anjan) using real or fake connection pooling - ie pgpool versus php's > persistent connections ? I'd strongly recommend looking at pgpool. it does > connection pooling correctly (A set of X connections shared among the entire > box rather than 1 per web server) Having one connection per web process isn't "fake connection pooling", it's a completely different arrangement. And there's nothing "incorrect" about it. In fact I think it's generally superior to having a layer like pgpool having to hand off all your database communication. Having to do an extra context switch to handle every database communication is crazy. For typical web sites where the database is the only slow component there's not much point in having more web server processes than connections anyways, All your doing is transferring the wait time from waiting for a web server process to waiting for a database process. Most applications that find they need connection pooling are using it to work around a poorly architected system that is mixing static requests (like images) and database driven requests in the same web server. However, your application sounds like it's more involved than a typical web server. If it's handling many slow resources, such as connections to multiple databases, SOAP services, mail, or other network services then you may well need that many processes. In which case you'll need something like pgpool. -- greg
Greg, > In fact I think it's generally superior to having a layer like pgpool > having to hand off all your database communication. Having to do an extra > context switch to handle every database communication is crazy. Although, one of their issues is that their database connection pooling is per-server. Which means that a safety margin of pre-allocated connections (something they need since they get bursts of 1000 new users in a few seconds) has to be maintained per server, increasing the total number of connections. So a pooling system that allowed them to hold 100 free connections centrally rather than 10 per server might be a win. Better would be getting some of this stuff offloaded onto database replication slaves. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On Apr 27, 2005, at 7:46 PM, Greg Stark wrote: > In fact I think it's generally superior to having a layer like pgpool > having > to hand off all your database communication. Having to do an extra > context > switch to handle every database communication is crazy. > I suppose this depends on how many machines / how much traffic you have. In one setup I run here I get away with 32 * 4 db connections instead of 500 * 4. Pretty simple to see the savings on the db machine. (Yes, it is a "bad design" as you said where static & dynamic content are served from the same box. However it also saves money since I don't need machines sitting around serving up pixel.gif vs myBigApplication.cgi) -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/