Thread: kswapd 100%, swap full, vm.swappiness=0
Hardware: 48 core AMD Magny Cours (4x12) 128G 1333MHz memory 34 15k6 drives, 2 hot spares, rest in RAID-1 pairs, 1 set for OS, 4 for pg_xlog, rest for /data/base LSI 8888 RAID controller OS: Ubuntu 10.04 uname -a Linux bigassdbserver 2.6.32-24-generic #38-Ubuntu SMP Mon Jul 5 09:20:59 UTC 2010 x86_64 GNU/Linux scheduler = noop for all drive sets. Settings for sysctl.conf: vm.zone_reclaim_mode = 0 kernel.shmmax = 33554432000 kernel.shmall = 2097152000 kernel.shmmni = 4096 vm.swappiness = 0 vm.dirty_ratio = 2 vm.dirty_background_ratio = 1 $ free total used free shared buffers cached Mem: 131651412 104986524 26664888 0 910804 91170764 -/+ buffers/cache: 12904956 118746456 Swap: 0 0 0 (swap is now off with sudo swapoff -a, it fixed the problem) It's twin, the read slave, looks like this: $ free total used free shared buffers cached Mem: 131651412 110364700 21286712 0 702144 96771656 -/+ buffers/cache: 12890900 118760512 Swap: 25388024 940 25387084 So, this morning, the machine goes into 100% swap usage. four kswapds are running at 100% CPU in mostly D state. Load climbs to 300. Server gets a little slow. Swapoff -a fixes it. This makes no sense to me. The machine had 90G+ in kernel cache, and was NOT running out of memory in any way. Swappiness is 0. Any advice on this, reporting it to the kernel guys etc welcome. -- To understand recursion, one must first understand recursion.
On Thu, Oct 7, 2010 at 9:11 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > Hardware: > 48 core AMD Magny Cours (4x12) > 128G 1333MHz memory > 34 15k6 drives, 2 hot spares, rest in RAID-1 pairs, 1 set for OS, 4 > for pg_xlog, rest for /data/base > LSI 8888 RAID controller > OS: > Ubuntu 10.04 > > uname -a > Linux bigassdbserver 2.6.32-24-generic #38-Ubuntu SMP Mon Jul 5 > 09:20:59 UTC 2010 x86_64 GNU/Linux > > scheduler = noop for all drive sets. > Settings for sysctl.conf: > vm.zone_reclaim_mode = 0 > kernel.shmmax = 33554432000 > kernel.shmall = 2097152000 > kernel.shmmni = 4096 > vm.swappiness = 0 > vm.dirty_ratio = 2 > vm.dirty_background_ratio = 1 > > $ free > total used free shared buffers cached > Mem: 131651412 104986524 26664888 0 910804 91170764 > -/+ buffers/cache: 12904956 118746456 > Swap: 0 0 0 > > (swap is now off with sudo swapoff -a, it fixed the problem) > > It's twin, the read slave, looks like this: > > $ free > total used free shared buffers cached > Mem: 131651412 110364700 21286712 0 702144 96771656 > -/+ buffers/cache: 12890900 118760512 > Swap: 25388024 940 25387084 > > So, this morning, the machine goes into 100% swap usage. four kswapds > are running at 100% CPU in mostly D state. Load climbs to 300. > Server gets a little slow. Swapoff -a fixes it. > > This makes no sense to me. The machine had 90G+ in kernel cache, and > was NOT running out of memory in any way. Swappiness is 0. > > Any advice on this, reporting it to the kernel guys etc welcome. > > -- > To understand recursion, one must first understand recursion. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > My wild guess is that Ubuntu may be to blame. Try restarting PG and chances are that it would not solve the problem, meaning that it is most likely an OS issue. I had similar experiences on PostgreSQL server hosted on Ubuntu. After a couple of days having the computer running "free -g" would display no (or a very few) free GBs of RAM. With Fedora I have not noticed this problem. For some reason I seem to have issues with Ubuntu/Kubuntu but not Fedora. Allan.
On Thu, Oct 7, 2010 at 1:46 PM, Allan Kamau <kamauallan@gmail.com> wrote: > My wild guess is that Ubuntu may be to blame. Try restarting PG and > chances are that it would not solve the problem, meaning that it is > most likely an OS issue. I had similar experiences on PostgreSQL > server hosted on Ubuntu. After a couple of days having the computer > running "free -g" would display no (or a very few) free GBs of RAM. > With Fedora I have not noticed this problem. For some reason I seem to > have issues with Ubuntu/Kubuntu but not Fedora. I definitely would tend to agree, but I'm more suspicious of a late model kernel than the specific distro. Note that this machine has 60 days of uptime with no behaviour like this before. For now I'm just running it with swap turned off. It's got 128Gig of ram, if it runs out of that I've got other problems. :) -- To understand recursion, one must first understand recursion.