Thread: Why is this system swapping?

Why is this system swapping?

From

"Anjan Dave"

Date:

27 April 2005, 14:48:45

Hello,

I am trying to understand what I need to do for this system to stop using swap. Maybe it’s something simple, or obvious for the situation. I’d appreciate some thoughts/suggestions.

Some background:

This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4…pretty heavy on concurrent usage. With peak traffic (db allows 1000 connections, in line with the number of app servers and connection pools for each) following is from ‘top’ (sorted by mem) Shared_buffers is 170MB, sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre channel storage, RAID10.

972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped

CPU states: cpu user nice system irq softirq iowait idle

total 57.2% 0.0% 23.2% 0.0% 3.6% 82.8% 232.4%

cpu00 22.0% 0.0% 9.1% 0.1% 0.9% 18.7% 48.8%

cpu01 17.5% 0.0% 5.8% 0.0% 2.3% 19.7% 54.4%

cpu02 7.8% 0.0% 3.7% 0.0% 0.0% 20.8% 67.5%

cpu03 9.7% 0.0% 4.4% 0.0% 0.5% 23.6% 61.5%

Mem: 12081744k av, 12055220k used, 26524k free, 0k shrd, 71828k buff

9020480k actv, 1741348k in_d, 237396k in_c

Swap: 4096532k av, 472872k used, 3623660k free 9911176k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND

21397 postgres 22 0 181M 180M 175M D 25.9 1.5 85:17 0 postmaster

23820 postgres 15 0 178M 177M 175M S 0.0 1.5 1:53 3 postmaster

24428 postgres 15 0 178M 177M 175M S 0.0 1.5 1:35 3 postmaster

24392 postgres 15 0 178M 177M 175M S 2.7 1.5 2:07 2 postmaster

23610 postgres 15 0 178M 177M 175M S 0.0 1.5 0:29 2 postmaster

24395 postgres 15 0 178M 177M 175M S 0.0 1.5 1:12 1 postmaster

…

-bash-2.05b$ free

total used free shared buffers cached

Mem: 12081744 12055536 26208 0 66704 9943988

-/+ buffers/cache: 2044844 10036900

Swap: 4096532 512744 3583788

As you can see the system starts utilizing swap at some point, with so many processes. Some time ago we had decided to keep the connections from the pool open for longer periods of time, possibly to avoid connection maintenance overhead on the db. At that time the traffic was not as high as it is today, which might be causing this, because for the most part, non-idle postmaster processes are only a few, except when the system becomes busy and suddenly you see a lot of selects piling up, and load averages shooting upwards. I am thinking closing out connections sooner might help the system release some memory to the kernel. Swapping adds up to the IO, although OS is on separate channel than postgres.

I can add more memory, but I want to make sure I haven’t missed out something obvious.

Thanks!

Anjan

******************************************************************************************

This e-mail and any files transmitted with it are intended for the use of the

addressee(s) only and may be confidential and covered by the attorney/client

and other privileges.  If you received this e-mail in error, please notify the

sender; do not disclose, copy, distribute, or take any action in reliance on

the contents of this information; and delete it from your system. Any other

use of this e-mail is prohibited.

******************************************************************************************

Re: Why is this system swapping?

From

Greg Stark

Date:

27 April 2005, 15:29:29

"Anjan Dave" <adave@vantage.com> writes:

> Some background:
>
> This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4...pretty heavy
> on concurrent usage. With peak traffic (db allows 1000 connections, in
> line with the number of app servers and connection pools for each)
> following is from 'top' (sorted by mem) Shared_buffers is 170MB,
> sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre channel
> storage, RAID10.
>
> 972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped
>
> CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
>            total   57.2%    0.0%   23.2%   0.0%     3.6%   82.8%  232.4%

This looks to me like most of your server processes are sitting around idle
most of the time.

> 21397 postgres  22   0  181M 180M  175M D    25.9  1.5  85:17   0
> postmaster
>
> 23820 postgres  15   0  178M 177M  175M S     0.0  1.5   1:53   3
> postmaster

So each process is taking up 8-11M of ram beyond the shared memory. 1,000 x
10M is 10G. Add in some memory for page tables and kernel data structures, as
well as the kernel's need to keep some memory set aside for filesystem buffers
(what you really want all that memory being used for anyways) and you've used
up all your 12G.

I would seriously look at tuning those connection pools down. A lot. If your
server processes are sitting idle over half the time I would at least cut it
by a factor of 2.

Working the other direction: you have four processors (I guess you have
hyperthreading turned off?) so ideally what you want is four runnable
processes at all times and as few others as possible. If your load typically
spends about half the time waiting on i/o (which is what that top output says)
then you want a total of 8 connections.

Realistically you might not be able to predict which app server will be
providing the load at any given time, so you might want 8 connections per app
server.

And you might have some load that's more i/o intensive than the 50% i/o load
shown here. Say you think some loads will be 80% i/o, you might want 20
connections for those loads. If you had 10 app servers with 20 connections
each for a total of 200 connections I suspect that would be closer to right
than having 1,000 connections.

200 connections would consume 2G of ram leaving you with 10G of filesystem
cache. Which might in turn decrease the percentage of time waiting on i/o,
which would decrease the number of processes you need even further...

--
greg

Re: Why is this system swapping?

From

Jeff

Date:

27 April 2005, 15:29:47

On Apr 27, 2005, at 1:48 PM, Anjan Dave wrote:

> As you can see the system starts utilizing swap at some point, with so
> many processes. Some time ago we had decided to keep the connections
> from the pool open for longer

You've shown the system has used swap but not that it is swapping.
Having swap in use is fine - there is likely plenty of code and whatnot
that is not being used so it dumped it out to swap. However if you are
actively moving data to/from swap that is bad. Very bad. Especially on
linux.

To tell if you are swapping you need to watch the output of say, vmstat
1 and look at the si and so columns.

Linux is very swap happy and likes to swap things for fun and profit.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Re: Why is this system swapping?

From

"Anjan Dave"

Date:

27 April 2005, 15:47:56

Sorry, I didn't attach vmstat, the system does actively swap pages. Not
to the point where it crawls, but for some brief periods the console
becomes a bit unresponsive. I am taking this as a sign to prevent future
problems.

anjan

-----Original Message-----
From: Jeff [mailto:threshar@torgo.978.org]
Sent: Wednesday, April 27, 2005 2:30 PM
To: Anjan Dave
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why is this system swapping?

On Apr 27, 2005, at 1:48 PM, Anjan Dave wrote:

> As you can see the system starts utilizing swap at some point, with so

> many processes. Some time ago we had decided to keep the connections
> from the pool open for longer

You've shown the system has used swap but not that it is swapping.
Having swap in use is fine - there is likely plenty of code and whatnot
that is not being used so it dumped it out to swap. However if you are
actively moving data to/from swap that is bad. Very bad. Especially on
linux.

To tell if you are swapping you need to watch the output of say, vmstat
1 and look at the si and so columns.

Linux is very swap happy and likes to swap things for fun and profit.

--

Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Re: Why is this system swapping?

From

Jeff

Date:

27 April 2005, 16:29:12

On Apr 27, 2005, at 2:29 PM, Greg Stark wrote:

> "AI would seriously look at tuning those connection pools down. A lot.
> If your
> server processes are sitting idle over half the time I would at least
> cut it
> by a factor of 2.
>

Are you (Anjan) using real or fake connection pooling - ie pgpool
versus php's persistent connections ?  I'd strongly recommend looking
at pgpool. it does connection pooling correctly (A set of X connections
shared among the entire box rather than 1 per web server)

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Re: Why is this system swapping?

From

"Anjan Dave"

Date:

27 April 2005, 17:54:13

Yes, HT is turned off (I haven't seen any recommendations to keep it
on).

This is when we were seeing 30 to 50% less traffic (users) than today -
we didn't want the idle connections in the pool to expire too soon
(default 30 secs, after which it goes back to pool) and reopen it
quickly, or not have sufficient available (default 20 conns, we raised
it to 50), so we figured a number per app server (50) and set that to
expire after a very long time, so as to avoid any overhead, and always
have the connection available whenever needed, without opening a new
one.

But now, for *some* reason, in some part of the day, we use up almost
all connections in each app's pool. After that since they are set to
expire after a long time, they remain there, taking up DB resources.

I will be trimming down the idle-timeout to a few minutes first, see if
that helps.

Thanks,
Anjan

-----Original Message-----
From: Greg Stark [mailto:gsstark@mit.edu]
Sent: Wednesday, April 27, 2005 2:29 PM
To: Anjan Dave
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why is this system swapping?

"Anjan Dave" <adave@vantage.com> writes:

> Some background:
>
> This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4...pretty
heavy
> on concurrent usage. With peak traffic (db allows 1000 connections, in
> line with the number of app servers and connection pools for each)
> following is from 'top' (sorted by mem) Shared_buffers is 170MB,
> sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre
channel
> storage, RAID10.
>
> 972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped
>
> CPU states:  cpu    user    nice  system    irq  softirq  iowait
idle
>            total   57.2%    0.0%   23.2%   0.0%     3.6%   82.8%
232.4%

This looks to me like most of your server processes are sitting around
idle
most of the time.

> 21397 postgres  22   0  181M 180M  175M D    25.9  1.5  85:17   0
> postmaster
>
> 23820 postgres  15   0  178M 177M  175M S     0.0  1.5   1:53   3
> postmaster

So each process is taking up 8-11M of ram beyond the shared memory.
1,000 x
10M is 10G. Add in some memory for page tables and kernel data
structures, as
well as the kernel's need to keep some memory set aside for filesystem
buffers
(what you really want all that memory being used for anyways) and you've
used
up all your 12G.

I would seriously look at tuning those connection pools down. A lot. If
your
server processes are sitting idle over half the time I would at least
cut it
by a factor of 2.

Working the other direction: you have four processors (I guess you have
hyperthreading turned off?) so ideally what you want is four runnable
processes at all times and as few others as possible. If your load
typically
spends about half the time waiting on i/o (which is what that top output
says)
then you want a total of 8 connections.

Realistically you might not be able to predict which app server will be
providing the load at any given time, so you might want 8 connections
per app
server.

And you might have some load that's more i/o intensive than the 50% i/o
load
shown here. Say you think some loads will be 80% i/o, you might want 20
connections for those loads. If you had 10 app servers with 20
connections
each for a total of 200 connections I suspect that would be closer to
right
than having 1,000 connections.

200 connections would consume 2G of ram leaving you with 10G of
filesystem
cache. Which might in turn decrease the percentage of time waiting on
i/o,
which would decrease the number of processes you need even further...

--

greg

Re: Why is this system swapping?

From

"Anjan Dave"

Date:

27 April 2005, 17:59:06

Using Resin's connection pooling. We are looking into pgpool alongside
slony to separate some reporting functionality.

-anjan

-----Original Message-----
From: Jeff [mailto:threshar@torgo.978.org]
Sent: Wednesday, April 27, 2005 3:29 PM
To: Greg Stark
Cc: Anjan Dave; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why is this system swapping?

On Apr 27, 2005, at 2:29 PM, Greg Stark wrote:

> "AI would seriously look at tuning those connection pools down. A lot.

> If your
> server processes are sitting idle over half the time I would at least
> cut it
> by a factor of 2.
>

Are you (Anjan) using real or fake connection pooling - ie pgpool
versus php's persistent connections ?  I'd strongly recommend looking
at pgpool. it does connection pooling correctly (A set of X connections
shared among the entire box rather than 1 per web server)

--

Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Re: Why is this system swapping?

From

Greg Stark

Date:

27 April 2005, 20:46:28

Jeff <threshar@torgo.978.org> writes:

> Are you (Anjan) using real or fake connection pooling - ie pgpool versus php's
> persistent connections ?  I'd strongly recommend looking at pgpool. it does
> connection pooling correctly (A set of X connections shared among the entire
> box rather than 1 per web server)

Having one connection per web process isn't "fake connection pooling", it's a
completely different arrangement. And there's nothing "incorrect" about it.

In fact I think it's generally superior to having a layer like pgpool having
to hand off all your database communication. Having to do an extra context
switch to handle every database communication is crazy.

For typical web sites where the database is the only slow component there's
not much point in having more web server processes than connections anyways,
All your doing is transferring the wait time from waiting for a web server
process to waiting for a database process.

Most applications that find they need connection pooling are using it to work
around a poorly architected system that is mixing static requests (like
images) and database driven requests in the same web server.

However, your application sounds like it's more involved than a typical web
server. If it's handling many slow resources, such as connections to multiple
databases, SOAP services, mail, or other network services then you may well
need that many processes. In which case you'll need something like pgpool.

--
greg

Re: Why is this system swapping?

From

Josh Berkus

Date:

27 April 2005, 20:57:44

Greg,

> In fact I think it's generally superior to having a layer like pgpool
> having to hand off all your database communication. Having to do an extra
> context switch to handle every database communication is crazy.

Although, one of their issues is that their database connection pooling is
per-server.    Which means that a safety margin of pre-allocated connections
(something they need since they get bursts of 1000 new users in a few
seconds) has to be maintained per server, increasing the total number of
connections.

So a pooling system that allowed them to hold 100 free connections centrally
rather than 10 per server might be a win.

Better would be getting some of this stuff offloaded onto database replication
slaves.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: Why is this system swapping?

From

Jeff

Date:

28 April 2005, 09:13:51

On Apr 27, 2005, at 7:46 PM, Greg Stark wrote:

> In fact I think it's generally superior to having a layer like pgpool
> having
> to hand off all your database communication. Having to do an extra
> context
> switch to handle every database communication is crazy.
>

I suppose this depends on how many machines / how much traffic you have.

In one setup I run here I get away with 32 * 4 db connections instead
of 500 * 4. Pretty simple to see the savings on the db machine. (Yes,
it is a "bad design" as you said where static & dynamic content are
served from the same box. However it also saves money since I don't
need machines sitting around serving up pixel.gif vs
myBigApplication.cgi)

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/