Thread: heavy swapping, not sure why
I have several Linux-x68_64 based dedicated PostgreSQL servers where I'm experiencing significant swap usage growth over time. All of them have fairly substantial amounts of RAM (not including swap), yet the amount of swap that postgres is using ramps up over time and eventually hurts performance badly. In every case, simply restarting postgresql frees up all the swap in use (until it ramps up again later). I'm assuming that I have at least one postgresql.conf parameter set wrong, but I'm not sure which. I read that (max_connections * work_mem) should never exceed physical RAM, and if that's accurate, then I suspect that's the root of my problem on systemA (below). However, I'd like confirmation before I start tweaking things, as one of these servers is in production, and I can't easily tweak settings to experiment (plus this problem takes a few weeks before swapping gets bad enough to impact performance). A few examples: 0) system A: 56GB RAM, running postgresql-8.4.8 with the following parameters: maintenance_work_mem = 96MB effective_cache_size = 40GB work_mem = 256MB wal_buffers = 16MB shared_buffers = 13GB max_connections = 300 1) system B: 120GB RAM, running postgresql-9.0.4 with the following parameters: maintenance_work_mem = 1GB effective_cache_size = 88GB work_mem = 576MB wal_buffers = 4MB shared_buffers = 28GB max_connections = 200 thanks
On August 29, 2011 01:36:07 PM Lonni J Friedman wrote: > I have several Linux-x68_64 based dedicated PostgreSQL servers where > I'm experiencing significant swap usage growth over time. It's the Linux kernel that does it, not PostgreSQL. Set vm.swappiness=0 (usually in /etc/sysctl.conf) and put that into effect.
On Mon, Aug 29, 2011 at 1:46 PM, Alan Hodgson <ahodgson@simkin.ca> wrote: > On August 29, 2011 01:36:07 PM Lonni J Friedman wrote: >> I have several Linux-x68_64 based dedicated PostgreSQL servers where >> I'm experiencing significant swap usage growth over time. > > It's the Linux kernel that does it, not PostgreSQL. Set vm.swappiness=0 > (usually in /etc/sysctl.conf) and put that into effect. I understand that the kernel determines what is swapped out, however postgres is what is using nearly all the RAM, and then overflowing into swap. I guess I should have noted that this isn't a case of a significant amount of RAM not being used, and swapping occurring anyway. Most of the RAM is already consumed when the heavy swapping is happening. So, I'd be surprised if setting vm.swappiness=0 will make a significant difference, however I can certainly try.
On Mon, Aug 29, 2011 at 2:57 PM, Lonni J Friedman <netllama@gmail.com> wrote: > On Mon, Aug 29, 2011 at 1:46 PM, Alan Hodgson <ahodgson@simkin.ca> wrote: >> On August 29, 2011 01:36:07 PM Lonni J Friedman wrote: >>> I have several Linux-x68_64 based dedicated PostgreSQL servers where >>> I'm experiencing significant swap usage growth over time. >> >> It's the Linux kernel that does it, not PostgreSQL. Set vm.swappiness=0 >> (usually in /etc/sysctl.conf) and put that into effect. > > I understand that the kernel determines what is swapped out, however > postgres is what is using nearly all the RAM, and then overflowing > into swap. I guess I should have noted that this isn't a case of a > significant amount of RAM not being used, and swapping occurring > anyway. Most of the RAM is already consumed when the heavy swapping > is happening. So, I'd be surprised if setting vm.swappiness=0 will > make a significant difference, however I can certainly try. You haven't shown us how you determined this, it would be nice to see some copy and paste of things like top, free, or whatever. How much free AND cache is left over when the machine starts to run out of memory etc. Your settings for shared_memory are HUGE. I run a machine witih 128G of ram and my shared_memory is 8G and that's quite large. No testing anyone has done has shown anything over 10G being useful.
On Mon, Aug 29, 2011 at 3:46 PM, Alan Hodgson <ahodgson@simkin.ca> wrote: > On August 29, 2011 01:36:07 PM Lonni J Friedman wrote: >> I have several Linux-x68_64 based dedicated PostgreSQL servers where >> I'm experiencing significant swap usage growth over time. > > It's the Linux kernel that does it, not PostgreSQL. Set vm.swappiness=0 > (usually in /etc/sysctl.conf) and put that into effect. that won't help and, in almost all cases, is a bad idea. merlin
On Mon, Aug 29, 2011 at 3:36 PM, Lonni J Friedman <netllama@gmail.com> wrote: > I have several Linux-x68_64 based dedicated PostgreSQL servers where > I'm experiencing significant swap usage growth over time. All of them > have fairly substantial amounts of RAM (not including swap), yet the > amount of swap that postgres is using ramps up over time and > eventually hurts performance badly. In every case, simply restarting > postgresql frees up all the swap in use (until it ramps up again > later). I'm assuming that I have at least one postgresql.conf > parameter set wrong, but I'm not sure which. I read that > (max_connections * work_mem) should never exceed physical RAM, and if > that's accurate, then I suspect that's the root of my problem on > systemA (below). However, I'd like confirmation before I start > tweaking things, as one of these servers is in production, and I can't > easily tweak settings to experiment (plus this problem takes a few > weeks before swapping gets bad enough to impact performance). using any C code in the backend? this includes 3rd party libraries which link in C, including postgis, pljava, xml2, etc. Any features being used not included in the standard core distribution are interesting. How long do your database connections stay open? forever? If yes, is memory distributed semi-evenly across all postgres processes or only to particular ones? If no, do you see excessive consumption with the non user backends like the stats collector, etc? Anything else running on these boxes? Application server? Anything written in java? merlin
On Mon, Aug 29, 2011 at 2:24 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Mon, Aug 29, 2011 at 2:57 PM, Lonni J Friedman <netllama@gmail.com> wrote: >> On Mon, Aug 29, 2011 at 1:46 PM, Alan Hodgson <ahodgson@simkin.ca> wrote: >>> On August 29, 2011 01:36:07 PM Lonni J Friedman wrote: >>>> I have several Linux-x68_64 based dedicated PostgreSQL servers where >>>> I'm experiencing significant swap usage growth over time. >>> >>> It's the Linux kernel that does it, not PostgreSQL. Set vm.swappiness=0 >>> (usually in /etc/sysctl.conf) and put that into effect. >> >> I understand that the kernel determines what is swapped out, however >> postgres is what is using nearly all the RAM, and then overflowing >> into swap. I guess I should have noted that this isn't a case of a >> significant amount of RAM not being used, and swapping occurring >> anyway. Most of the RAM is already consumed when the heavy swapping >> is happening. So, I'd be surprised if setting vm.swappiness=0 will >> make a significant difference, however I can certainly try. > > You haven't shown us how you determined this, it would be nice to see > some copy and paste of things like top, free, or whatever. How much > free AND cache is left over when the machine starts to run out of > memory etc. Sorry, I was looking at the output from 'free' (plus we have some generic monitoring tools which generate pretty graphs that also illustrate the problem). I restarted postgres this morning, so everything is in a good state right now: total used free shared buffers cached Mem: 56481 55486 995 0 15 53298 -/+ buffers/cache: 2172 54309 Swap: 1099 18 1081 total used free shared buffers cached Mem: 121177 111603 9573 0 0 101007 -/+ buffers/cache: 10596 110581 Swap: 1498 10 1488 Based on past results, it'll be about two weeks before a few hundred MB of swap is in use, and perf is noticeably poor. Although it will creep up over time, so even in a day or 2, it will be worse than right now. I could post the pretty graph somewhere (or send it to the list, if you'd prefer) if you want to see something right now (filesize is less than 40KB). > > Your settings for shared_memory are HUGE. I run a machine witih 128G > of ram and my shared_memory is 8G and that's quite large. No testing > anyone has done has shown anything over 10G being useful. Do you mean shared_buffers? thanks
On Mon, Aug 29, 2011 at 2:38 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Mon, Aug 29, 2011 at 3:36 PM, Lonni J Friedman <netllama@gmail.com> wrote: >> I have several Linux-x68_64 based dedicated PostgreSQL servers where >> I'm experiencing significant swap usage growth over time. All of them >> have fairly substantial amounts of RAM (not including swap), yet the >> amount of swap that postgres is using ramps up over time and >> eventually hurts performance badly. In every case, simply restarting >> postgresql frees up all the swap in use (until it ramps up again >> later). I'm assuming that I have at least one postgresql.conf >> parameter set wrong, but I'm not sure which. I read that >> (max_connections * work_mem) should never exceed physical RAM, and if >> that's accurate, then I suspect that's the root of my problem on >> systemA (below). However, I'd like confirmation before I start >> tweaking things, as one of these servers is in production, and I can't >> easily tweak settings to experiment (plus this problem takes a few >> weeks before swapping gets bad enough to impact performance). > > using any C code in the backend? this includes 3rd party libraries > which link in C, including postgis, pljava, xml2, etc. Any features > being used not included in the standard core distribution are > interesting. Nope, nothing like that. They're fairly generic setups, with nothing added that isn't part of the core distribution. > > How long do your database connections stay open? forever? If yes, is > memory distributed semi-evenly across all postgres processes or only > to particular ones? If no, do you see excessive consumption with the > non user backends like the stats collector, etc? Nope, nothing is forever, everything is a fairly brief connection (a few seconds, tops, with most under 1s). Although I do have pgbouncer sitting in front of systemA to serve as a connection pooler. > > Anything else running on these boxes? Application server? Anything > written in java? Nothing. They're all 100% dedicated to postgres, and don't run anything else beyond the basic OS level stuff (crond, etc).
On Mon, Aug 29, 2011 at 3:38 PM, Lonni J Friedman <netllama@gmail.com> wrote: > On Mon, Aug 29, 2011 at 2:24 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Mon, Aug 29, 2011 at 2:57 PM, Lonni J Friedman <netllama@gmail.com> wrote: >>> On Mon, Aug 29, 2011 at 1:46 PM, Alan Hodgson <ahodgson@simkin.ca> wrote: >>>> On August 29, 2011 01:36:07 PM Lonni J Friedman wrote: >>>>> I have several Linux-x68_64 based dedicated PostgreSQL servers where >>>>> I'm experiencing significant swap usage growth over time. >>>> >>>> It's the Linux kernel that does it, not PostgreSQL. Set vm.swappiness=0 >>>> (usually in /etc/sysctl.conf) and put that into effect. >>> >>> I understand that the kernel determines what is swapped out, however >>> postgres is what is using nearly all the RAM, and then overflowing >>> into swap. I guess I should have noted that this isn't a case of a >>> significant amount of RAM not being used, and swapping occurring >>> anyway. Most of the RAM is already consumed when the heavy swapping >>> is happening. So, I'd be surprised if setting vm.swappiness=0 will >>> make a significant difference, however I can certainly try. >> >> You haven't shown us how you determined this, it would be nice to see >> some copy and paste of things like top, free, or whatever. How much >> free AND cache is left over when the machine starts to run out of >> memory etc. > > Sorry, I was looking at the output from 'free' (plus we have some > generic monitoring tools which generate pretty graphs that also > illustrate the problem). I restarted postgres this morning, so > everything is in a good state right now: > total used free shared buffers cached > Mem: 56481 55486 995 0 15 53298 > -/+ buffers/cache: 2172 54309 > Swap: 1099 18 1081 > > > total used free shared buffers cached > Mem: 121177 111603 9573 0 0 101007 > -/+ buffers/cache: 10596 110581 > Swap: 1498 10 1488 > > Based on past results, it'll be about two weeks before a few hundred > MB of swap is in use, and perf is noticeably poor. That's all a few hundred Megs? That shouldn't make any real difference. Now a few dozen gigs that would make a difference. Use top or htop or some other method that shows you the VIRT RES and SHR memory usage of the processes. > Although it will > creep up over time, so even in a day or 2, it will be worse than right > now. > > I could post the pretty graph somewhere (or send it to the list, if > you'd prefer) if you want to see something right now (filesize is less > than 40KB). > >> >> Your settings for shared_memory are HUGE. I run a machine witih 128G >> of ram and my shared_memory is 8G and that's quite large. No testing >> anyone has done has shown anything over 10G being useful. > > Do you mean shared_buffers? Yeah
On Mon, Aug 29, 2011 at 2:51 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Mon, Aug 29, 2011 at 3:38 PM, Lonni J Friedman <netllama@gmail.com> wrote: >> On Mon, Aug 29, 2011 at 2:24 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >>> On Mon, Aug 29, 2011 at 2:57 PM, Lonni J Friedman <netllama@gmail.com> wrote: >>>> On Mon, Aug 29, 2011 at 1:46 PM, Alan Hodgson <ahodgson@simkin.ca> wrote: >>>>> On August 29, 2011 01:36:07 PM Lonni J Friedman wrote: >>>>>> I have several Linux-x68_64 based dedicated PostgreSQL servers where >>>>>> I'm experiencing significant swap usage growth over time. >>>>> >>>>> It's the Linux kernel that does it, not PostgreSQL. Set vm.swappiness=0 >>>>> (usually in /etc/sysctl.conf) and put that into effect. >>>> >>>> I understand that the kernel determines what is swapped out, however >>>> postgres is what is using nearly all the RAM, and then overflowing >>>> into swap. I guess I should have noted that this isn't a case of a >>>> significant amount of RAM not being used, and swapping occurring >>>> anyway. Most of the RAM is already consumed when the heavy swapping >>>> is happening. So, I'd be surprised if setting vm.swappiness=0 will >>>> make a significant difference, however I can certainly try. >>> >>> You haven't shown us how you determined this, it would be nice to see >>> some copy and paste of things like top, free, or whatever. How much >>> free AND cache is left over when the machine starts to run out of >>> memory etc. >> >> Sorry, I was looking at the output from 'free' (plus we have some >> generic monitoring tools which generate pretty graphs that also >> illustrate the problem). I restarted postgres this morning, so >> everything is in a good state right now: >> total used free shared buffers cached >> Mem: 56481 55486 995 0 15 53298 >> -/+ buffers/cache: 2172 54309 >> Swap: 1099 18 1081 >> >> >> total used free shared buffers cached >> Mem: 121177 111603 9573 0 0 101007 >> -/+ buffers/cache: 10596 110581 >> Swap: 1498 10 1488 >> >> Based on past results, it'll be about two weeks before a few hundred >> MB of swap is in use, and perf is noticeably poor. > > That's all a few hundred Megs? That shouldn't make any real > difference. Now a few dozen gigs that would make a difference. Use > top or htop or some other method that shows you the VIRT RES and SHR > memory usage of the processes. Yes, a few hundred MB of swap, and its definitely making a huge difference. Upon restarting postgres, its all freed up, and then perf is good again. Also, this box only has 1GB of swap total, so its never going to get up a few dozen GB. Anyway, here's some of top output for systemA right now: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5582 postgres 20 0 13.5g 8.9g 8.9g R 97.7 16.2 2:51.61 postmaster 6554 postgres 20 0 13.5g 1.9g 1.9g D 63.8 3.4 1:50.50 postmaster 6052 postgres 20 0 13.5g 1.3g 1.2g S 22.6 2.3 0:26.33 postmaster 2751 postgres 20 0 13.5g 1.6g 1.6g S 21.6 2.8 0:52.32 postmaster 31221 postgres 20 0 13.5g 2.0g 2.0g S 10.0 3.6 1:19.05 postmaster 1721 postgres 20 0 13.5g 6.7g 6.7g S 3.0 12.2 2:19.21 postmaster 6050 postgres 20 0 13.5g 879m 867m S 8.3 1.6 0:06.89 postmaster I can certainly grab more in a few days once swap usage has started to creep up a bit. > >> Although it will >> creep up over time, so even in a day or 2, it will be worse than right >> now. >> >> I could post the pretty graph somewhere (or send it to the list, if >> you'd prefer) if you want to see something right now (filesize is less >> than 40KB). >> >>> >>> Your settings for shared_memory are HUGE. I run a machine witih 128G >>> of ram and my shared_memory is 8G and that's quite large. No testing >>> anyone has done has shown anything over 10G being useful. >> >> Do you mean shared_buffers? > > Yeah OK, I'll reduce it to 10GB and see if there's any noticable change in performance. thanks
On Mon, Aug 29, 2011 at 4:45 PM, Lonni J Friedman <netllama@gmail.com> wrote: >> using any C code in the backend? this includes 3rd party libraries >> which link in C, including postgis, pljava, xml2, etc. Any features >> being used not included in the standard core distribution are >> interesting. > > Nope, nothing like that. They're fairly generic setups, with nothing > added that isn't part of the core distribution. > >> >> How long do your database connections stay open? forever? If yes, is >> memory distributed semi-evenly across all postgres processes or only >> to particular ones? If no, do you see excessive consumption with the >> non user backends like the stats collector, etc? > > Nope, nothing is forever, everything is a fairly brief connection (a > few seconds, tops, with most under 1s). Although I do have pgbouncer > sitting in front of systemA to serve as a connection pooler. hm. well iirc pgbouncer tries to dump server connections older than an hour or so. this plus your other statements makes me very suspension the problem is in postgres itself. since postgres process dies when the connection dies, long term memory accumulation is just not possible except in the processes that aren't serving client sessions (the very first thing you need to do is rule those processes out). pgbouncer itself could be the issue, but i doubt it. obviously, a full memory profile during your problem times is a critical piece of evidence (a 'top' sorted by memory usage should to the trick nicely). it's possible you've unhappily tripped a leak in the o/s -- is everything properly updated? running any funky hardware (like fiber san drivers)? anything else interesting or out of the ordinary to report? merlin
On Mon, Aug 29, 2011 at 3:17 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Mon, Aug 29, 2011 at 4:45 PM, Lonni J Friedman <netllama@gmail.com> wrote: >>> using any C code in the backend? this includes 3rd party libraries >>> which link in C, including postgis, pljava, xml2, etc. Any features >>> being used not included in the standard core distribution are >>> interesting. >> >> Nope, nothing like that. They're fairly generic setups, with nothing >> added that isn't part of the core distribution. >> >>> >>> How long do your database connections stay open? forever? If yes, is >>> memory distributed semi-evenly across all postgres processes or only >>> to particular ones? If no, do you see excessive consumption with the >>> non user backends like the stats collector, etc? >> >> Nope, nothing is forever, everything is a fairly brief connection (a >> few seconds, tops, with most under 1s). Although I do have pgbouncer >> sitting in front of systemA to serve as a connection pooler. > > hm. well iirc pgbouncer tries to dump server connections older than > an hour or so. this plus your other statements makes me very > suspension the problem is in postgres itself. since postgres process > dies when the connection dies, long term memory accumulation is just > not possible except in the processes that aren't serving client > sessions (the very first thing you need to do is rule those processes > out). pgbouncer itself could be the issue, but i doubt it. > obviously, a full memory profile during your problem times is a > critical piece of evidence (a 'top' sorted by memory usage should to > the trick nicely). OK, I'll get that top output for everyone in a week or so once swap usage has grown noticeably above its current level. > > it's possible you've unhappily tripped a leak in the o/s -- is > everything properly updated? running any funky hardware (like fiber > san drivers)? anything else interesting or out of the ordinary to > report? No funky HW. Fairly standard 1U server with SATA disks, the OS is up to date.
On August 29, 2011 02:34:26 PM you wrote: > On Mon, Aug 29, 2011 at 3:46 PM, Alan Hodgson <ahodgson@simkin.ca> wrote: > > On August 29, 2011 01:36:07 PM Lonni J Friedman wrote: > >> I have several Linux-x68_64 based dedicated PostgreSQL servers where > >> I'm experiencing significant swap usage growth over time. > > > > It's the Linux kernel that does it, not PostgreSQL. Set vm.swappiness=0 > > (usually in /etc/sysctl.conf) and put that into effect. > > that won't help and, in almost all cases, is a bad idea. Overly aggressive swapping with the default settings has frequently caused me performance issues. Using this prevents those problems.
Lonni J Friedman <netllama@gmail.com> writes: > I have several Linux-x68_64 based dedicated PostgreSQL servers where > I'm experiencing significant swap usage growth over time. All of them > have fairly substantial amounts of RAM (not including swap), yet the > amount of swap that postgres is using ramps up over time and > eventually hurts performance badly. In every case, simply restarting > postgresql frees up all the swap in use (until it ramps up again > later). If you're certain that it's restarting *postgres* that does it, and not restarting your application or pgbouncer or some other code, then it seems like you must have uncovered a memory leak someplace. We haven't got nearly enough info here to diagnose it though. First thing I'd want to know is which process(es) exactly are bloating. The top output you showed us is unhelpful for that since it just shows them all as "postmaster" --- you'll need to match up the problem PIDs with "ps auxww" output. Keep in mind also that top is pretty awful about distinguishing a process's actual memory use (private memory) from the portion of PG's shared memory that it happens to have touched. What you need to pay attention to is RES minus SHR, not either number alone. With shared buffers set as high as you've got it, you'll probably not be able to be sure that a process is bloating until it's eaten hundreds of megs of private space. Where we go from there will depend on what you find out ... but if possible, don't restart the server right away, or it'll probably be another couple weeks before you can do the next round of investigation. regards, tom lane
On Mon, Aug 29, 2011 at 5:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Lonni J Friedman <netllama@gmail.com> writes: >> I have several Linux-x68_64 based dedicated PostgreSQL servers where >> I'm experiencing significant swap usage growth over time. All of them >> have fairly substantial amounts of RAM (not including swap), yet the >> amount of swap that postgres is using ramps up over time and >> eventually hurts performance badly. In every case, simply restarting >> postgresql frees up all the swap in use (until it ramps up again >> later). > > If you're certain that it's restarting *postgres* that does it, and not > restarting your application or pgbouncer or some other code, then it > seems like you must have uncovered a memory leak someplace. We haven't > got nearly enough info here to diagnose it though. I'm 100% certain its hte postgres restart that frees up all the swap. > > First thing I'd want to know is which process(es) exactly are bloating. > The top output you showed us is unhelpful for that since it just shows > them all as "postmaster" --- you'll need to match up the problem PIDs > with "ps auxww" output. Keep in mind also that top is pretty awful > about distinguishing a process's actual memory use (private memory) > from the portion of PG's shared memory that it happens to have touched. > What you need to pay attention to is RES minus SHR, not either number > alone. With shared buffers set as high as you've got it, you'll > probably not be able to be sure that a process is bloating until it's > eaten hundreds of megs of private space. ok, I'll do my best to capture this data, and then reply back. > > Where we go from there will depend on what you find out ... but if > possible, don't restart the server right away, or it'll probably be > another couple weeks before you can do the next round of investigation. understood thanks!
You should monitor PageTables value in /proc/meminfo.if the value larger than 1G,I Suggest enable hugepages . To monitor PageTables: # cat /proc/meminfo |grep -i pagetables
On Mon, Aug 29, 2011 at 5:01 PM, Alan Hodgson <ahodgson@simkin.ca> wrote: > On August 29, 2011 02:34:26 PM you wrote: >> On Mon, Aug 29, 2011 at 3:46 PM, Alan Hodgson <ahodgson@simkin.ca> wrote: >> > On August 29, 2011 01:36:07 PM Lonni J Friedman wrote: >> >> I have several Linux-x68_64 based dedicated PostgreSQL servers where >> >> I'm experiencing significant swap usage growth over time. >> > >> > It's the Linux kernel that does it, not PostgreSQL. Set vm.swappiness=0 >> > (usually in /etc/sysctl.conf) and put that into effect. >> >> that won't help and, in almost all cases, is a bad idea. > > Overly aggressive swapping with the default settings has frequently caused me > performance issues. Using this prevents those problems. On a machine with lots of memory, I've run into pathological behaviour with both the RHEL 5 and Ubuntu 10.04 kernels where the kswapd starts eating up CPU and swap io like mad, while doing essentially nothing. Setting swappiness to 0 delayed this behaviour but did not stop it. Given that I'm on a machine with 128G ram, I just put "/sbin/swapoff -a" in /etc/rc.local and viola, problem solved.
On Mon, Aug 29, 2011 at 5:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Lonni J Friedman <netllama@gmail.com> writes: >> I have several Linux-x68_64 based dedicated PostgreSQL servers where >> I'm experiencing significant swap usage growth over time. All of them >> have fairly substantial amounts of RAM (not including swap), yet the >> amount of swap that postgres is using ramps up over time and >> eventually hurts performance badly. In every case, simply restarting >> postgresql frees up all the swap in use (until it ramps up again >> later). > > If you're certain that it's restarting *postgres* that does it, and not > restarting your application or pgbouncer or some other code, then it > seems like you must have uncovered a memory leak someplace. We haven't > got nearly enough info here to diagnose it though. I'm 100% certain its hte postgres restart that frees up all the swap. > > First thing I'd want to know is which process(es) exactly are bloating. > The top output you showed us is unhelpful for that since it just shows > them all as "postmaster" --- you'll need to match up the problem PIDs > with "ps auxww" output. Keep in mind also that top is pretty awful > about distinguishing a process's actual memory use (private memory) > from the portion of PG's shared memory that it happens to have touched. > What you need to pay attention to is RES minus SHR, not either number > alone. With shared buffers set as high as you've got it, you'll > probably not be able to be sure that a process is bloating until it's > eaten hundreds of megs of private space. ok, I'll do my best to capture this data, and then reply back. > > Where we go from there will depend on what you find out ... but if > possible, don't restart the server right away, or it'll probably be > another couple weeks before you can do the next round of investigation. understood thanks!
On 08/29/2011 06:12 PM, Lonni J Friedman wrote: > > OK, I'll reduce it to 10GB and see if there's any noticable change in > performance. thanks > I've never heard a report of a Linux system using more than 8GB of shared_buffers usefully, and peak performance on systems I've tested has sometimes been far less than that even. (I have one server that's stuck at 512MB!) The only report of even 10GB helping came from a Solaris test. I doubt this has anything to do with your problem, just pointing this out as future guidance. Until there's a breakthrough in the PostgreSQL buffer cache code, there really is no reason to give more than 8GB of dedicated memory to the database on Linux via shared_buffers. You're better off letting the OS do caching with it instead. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
* Scott Marlowe: > On a machine with lots of memory, I've run into pathological behaviour > with both the RHEL 5 and Ubuntu 10.04 kernels where the kswapd starts > eating up CPU and swap io like mad, while doing essentially nothing. > Setting swappiness to 0 delayed this behaviour but did not stop it. > Given that I'm on a machine with 128G ram, I just put "/sbin/swapoff > -a" in /etc/rc.local and viola, problem solved. Was this NUMA machine? Some older kernels can only migrate pages between nodes through swap. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
On Tue, Aug 30, 2011 at 12:54 AM, Florian Weimer <fweimer@bfk.de> wrote: > * Scott Marlowe: > >> On a machine with lots of memory, I've run into pathological behaviour >> with both the RHEL 5 and Ubuntu 10.04 kernels where the kswapd starts >> eating up CPU and swap io like mad, while doing essentially nothing. >> Setting swappiness to 0 delayed this behaviour but did not stop it. >> Given that I'm on a machine with 128G ram, I just put "/sbin/swapoff >> -a" in /etc/rc.local and viola, problem solved. > > Was this NUMA machine? Some older kernels can only migrate pages > between nodes through swap. 48 core AMD Magny Cours. I also had to turn off zone reclaim mode on it or it would slow to a crawl as the OS started to cache more and more data.
On Tue, Aug 30, 2011 at 10:39 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Tue, Aug 30, 2011 at 12:54 AM, Florian Weimer <fweimer@bfk.de> wrote: >> * Scott Marlowe: >> >>> On a machine with lots of memory, I've run into pathological behaviour >>> with both the RHEL 5 and Ubuntu 10.04 kernels where the kswapd starts >>> eating up CPU and swap io like mad, while doing essentially nothing. >>> Setting swappiness to 0 delayed this behaviour but did not stop it. >>> Given that I'm on a machine with 128G ram, I just put "/sbin/swapoff >>> -a" in /etc/rc.local and viola, problem solved. >> >> Was this NUMA machine? Some older kernels can only migrate pages >> between nodes through swap. > > 48 core AMD Magny Cours. I also had to turn off zone reclaim mode on > it or it would slow to a crawl as the OS started to cache more and > more data. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > I have experienced such behaviour on Ubuntu 64bit OS but not on Fedora 64bit OS. Try installing Fedora on a test machine and see if you run into the high memory consumption problem(s). Allan.
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> body p { margin-bottom: 0cm; margin-top: 0pt; } <body bidimailui-detected-decoding-type="latin-charset" bgcolor="#ffffff" text="#000000"> <blockquote cite="mid:CAOR=d=3orr8ExXVB1xGahabrhmJAi3npw_Z0jTG3yUAg4WAcbQ@mail.gmail.com" type="cite"> On a machine with lots of memory, I've run into pathological behaviour with both the RHEL 5 and Ubuntu 10.04 kernels where the kswapd starts eating up CPU and swap io like mad, while doing essentially nothing. Setting swappiness to 0 delayed this behaviour but did not stop it. Given that I'm on a machine with 128G ram, I just put "/sbin/swapoff -a" in /etc/rc.local and viola, problem solved. I've tried running without swap and the problem is if you actually do run out of memory then the process killer can take out your postgresql. Sim
<br /><blockquote cite="mid:CAP=oouE75E_6fc7oA87tdqp_jyCeAfjaQEy+hM=MJ8EwfPdVdQ@mail.gmail.com" type="cite"><pre wrap="">Yes,a few hundred MB of swap, and its definitely making a huge difference. Upon restarting postgres, its all freed up, and then perf is good again. Also, this box only has 1GB of swap total, so its never going to get up a few dozen GB. Anyway, here's some of top output for systemA right now:</pre></blockquote> Instead of restarting the database try swapoff-a && swapon -a and see if that helps performance. If it is that little swap in use, it might be somethingelse clogging up the works.<br /><br /> Sim<br />
On Tue, Aug 30, 2011 at 2:50 AM, Sim Zacks <sim@compulab.co.il> wrote: > > On a machine with lots of memory, I've run into pathological behaviour > with both the RHEL 5 and Ubuntu 10.04 kernels where the kswapd starts > eating up CPU and swap io like mad, while doing essentially nothing. > Setting swappiness to 0 delayed this behaviour but did not stop it. > Given that I'm on a machine with 128G ram, I just put "/sbin/swapoff > -a" in /etc/rc.local and viola, problem solved. > > I've tried running without swap and the problem is if you actually do run > out of memory then the process killer can take out your postgresql. My postgres is configured to never use more than 10 or 20 gig at max load. That leaves about 80+Gig for caching the database by the kernel. work_mem is pretty small (8MB) given the maximum connections (1000) So 8Gig if everybody sorts at once. Even under very heavy load memory usage has never really gone anywhere near that high.
On Tue, Aug 30, 2011 at 2:50 AM, Sim Zacks <sim@compulab.co.il> wrote: > > Instead of restarting the database try swapoff -a && swapon -a and see if > that helps performance. If it is that little swap in use, it might be > something else clogging up the works. Check to see if kswapd is going crazy or not. If it is, that's the pathological behavior I was talking about earlier. It works real hard at doing nothing and the machine grinds to a halt. swapoff -a fixed it. And it was happening about every 2 weeks for me too.
Hi, 2011-08-29 22:36 keltezéssel, Lonni J Friedman írta: > ... I read that > (max_connections * work_mem) should never exceed physical RAM, and if > that's accurate, then I suspect that's the root of my problem on > systemA (below). work_mem is process-local memory so (max_connections * work_mem) < (physical RAM - shared_buffers) Some queries may allocate multiples of work_mem, too. Also, the kernel uses some memory for internal accounting, caching and you need to account for the process binary in memory. > However, I'd like confirmation before I start > tweaking things, as one of these servers is in production, and I can't > easily tweak settings to experiment (plus this problem takes a few > weeks before swapping gets bad enough to impact performance). > > A few examples: > > 0) system A: 56GB RAM, running postgresql-8.4.8 with the following parameters: > maintenance_work_mem = 96MB > effective_cache_size = 40GB > work_mem = 256MB > wal_buffers = 16MB > shared_buffers = 13GB > max_connections = 300 RAM (56GB) - shared_buffers (13GB) = 43GB which is less than work_mem * max_connections = 300 * 0.25GB = 75GB The system would start swapping before 43GB/0.25GB = 172 clients. > 1) system B: 120GB RAM, running postgresql-9.0.4 with the following parameters: > maintenance_work_mem = 1GB > effective_cache_size = 88GB > work_mem = 576MB > wal_buffers = 4MB > shared_buffers = 28GB > max_connections = 200 Similarly: 120GB - 28GB = 92GB is less than work_mem * max_connections = 200 * 576MB = 112.5GB Also, if you run anything else on the machine then the system would start swapping much sooner than hitting max_connections number of clients. I would never set work_mem that high by default. 8 - 16MB is usually enough for the common case and you can set work_mem for special queries from the client and then reset it. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
I this case, all the connections using up maximum work_mem is the potential threat. As said by Zoltan, work_mem is very high and shared_buffers as well.
Other considerations would be as following -
- Allocated kernel memory settings (like shmmax and shmget etc..)
- How much memory is used up by the system level processes (like root and non-pg users)
- It also depends on the database size and the amount of data being accessed across CPUs and memory.
- We need to ensure if unnecessary data is being read into the memory
( queries hitting non-vacuumed tables, slow performing queries, unnecessary full table scans etc)
Regards,
Venkat
On Tue, Aug 30, 2011 at 3:30 PM, Boszormenyi Zoltan <zb@cybertec.at> wrote:
Hi,
2011-08-29 22:36 keltezéssel, Lonni J Friedman írta:
> ... I read that> (max_connections * work_mem) should never exceed physical RAM, and ifwork_mem is process-local memory so
> that's accurate, then I suspect that's the root of my problem on
> systemA (below).
(max_connections * work_mem) < (physical RAM - shared_buffers)
Some queries may allocate multiples of work_mem, too.
Also, the kernel uses some memory for internal accounting, caching
and you need to account for the process binary in memory.RAM (56GB) - shared_buffers (13GB) = 43GB
> However, I'd like confirmation before I start
> tweaking things, as one of these servers is in production, and I can't
> easily tweak settings to experiment (plus this problem takes a few
> weeks before swapping gets bad enough to impact performance).
>
> A few examples:
>
> 0) system A: 56GB RAM, running postgresql-8.4.8 with the following parameters:
> maintenance_work_mem = 96MB
> effective_cache_size = 40GB
> work_mem = 256MB
> wal_buffers = 16MB
> shared_buffers = 13GB
> max_connections = 300
which is less than
work_mem * max_connections = 300 * 0.25GB = 75GB
The system would start swapping before 43GB/0.25GB = 172 clients.Similarly:
> 1) system B: 120GB RAM, running postgresql-9.0.4 with the following parameters:
> maintenance_work_mem = 1GB
> effective_cache_size = 88GB
> work_mem = 576MB
> wal_buffers = 4MB
> shared_buffers = 28GB
> max_connections = 200
120GB - 28GB = 92GB
is less than
work_mem * max_connections = 200 * 576MB = 112.5GB
Also, if you run anything else on the machine then the system would start
swapping much sooner than hitting max_connections number of clients.
I would never set work_mem that high by default. 8 - 16MB is usually
enough for the common case and you can set work_mem for special
queries from the client and then reset it.
Best regards,
Zoltán Böszörményi
--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Lonni J Friedman wrote: > ok, I'll do my best to capture this data, and then reply back. If using linux, you should find interesting data on per-process swap and memory usage in /proc/${pid}/smaps Also consider the script here: http://northernmost.org/blog/find-out-what-is-using-your-swap/ to group the numbers per running process. FWIW I have found the swap column of the top utility to be useless, while the information in /proc/${pid}/smaps looks trustworthy. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
On Tue, Aug 30, 2011 at 1:26 AM, Greg Smith <greg@2ndquadrant.com> wrote: > I doubt this has anything to do with your problem, just pointing this out as > future guidance. Until there's a breakthrough in the PostgreSQL buffer > cache code, there really is no reason to give more than 8GB of dedicated > memory to the database on Linux via shared_buffers. You're better off > letting the OS do caching with it instead. > Any differing advice for FreeBSD? I'm running with 5GB on a 24GB RAM server (about to be replaced with a server with 48GB RAM).
On Mon, Aug 29, 2011 at 6:54 PM, peixubin <peixubin@yahoo.com.cn> wrote: > You should monitor PageTables value in /proc/meminfo.if the value larger than 1G,I Suggest enable hugepages . > > To monitor PageTables: > # cat /proc/meminfo |grep -i pagetables $ cat /proc/meminfo |grep -i pagetables PageTables: 608732 kB Looks to be less than 1GB currently.
On Tue, Aug 30, 2011 at 3:00 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: > Hi, > > 2011-08-29 22:36 keltezéssel, Lonni J Friedman írta: >> ... I read that >> (max_connections * work_mem) should never exceed physical RAM, and if >> that's accurate, then I suspect that's the root of my problem on >> systemA (below). > > work_mem is process-local memory so > > (max_connections * work_mem) < (physical RAM - shared_buffers) > > Some queries may allocate multiples of work_mem, too. > > Also, the kernel uses some memory for internal accounting, caching > and you need to account for the process binary in memory. > >> However, I'd like confirmation before I start >> tweaking things, as one of these servers is in production, and I can't >> easily tweak settings to experiment (plus this problem takes a few >> weeks before swapping gets bad enough to impact performance). >> >> A few examples: >> >> 0) system A: 56GB RAM, running postgresql-8.4.8 with the following parameters: >> maintenance_work_mem = 96MB >> effective_cache_size = 40GB >> work_mem = 256MB >> wal_buffers = 16MB >> shared_buffers = 13GB >> max_connections = 300 > > RAM (56GB) - shared_buffers (13GB) = 43GB > > which is less than > > work_mem * max_connections = 300 * 0.25GB = 75GB > > The system would start swapping before 43GB/0.25GB = 172 clients. > >> 1) system B: 120GB RAM, running postgresql-9.0.4 with the following parameters: >> maintenance_work_mem = 1GB >> effective_cache_size = 88GB >> work_mem = 576MB >> wal_buffers = 4MB >> shared_buffers = 28GB >> max_connections = 200 > > Similarly: > > 120GB - 28GB = 92GB > > is less than > > work_mem * max_connections = 200 * 576MB = 112.5GB > > Also, if you run anything else on the machine then the system would start > swapping much sooner than hitting max_connections number of clients. > > I would never set work_mem that high by default. 8 - 16MB is usually > enough for the common case and you can set work_mem for special > queries from the client and then reset it. Thanks for your reply. I've reduced shared_buffers to 8GB everywhere, which should definitely help.
On Mon, Aug 29, 2011 at 5:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Lonni J Friedman <netllama@gmail.com> writes: >> I have several Linux-x68_64 based dedicated PostgreSQL servers where >> I'm experiencing significant swap usage growth over time. All of them >> have fairly substantial amounts of RAM (not including swap), yet the >> amount of swap that postgres is using ramps up over time and >> eventually hurts performance badly. In every case, simply restarting >> postgresql frees up all the swap in use (until it ramps up again >> later). > > If you're certain that it's restarting *postgres* that does it, and not > restarting your application or pgbouncer or some other code, then it > seems like you must have uncovered a memory leak someplace. We haven't > got nearly enough info here to diagnose it though. > > First thing I'd want to know is which process(es) exactly are bloating. > The top output you showed us is unhelpful for that since it just shows > them all as "postmaster" --- you'll need to match up the problem PIDs > with "ps auxww" output. Keep in mind also that top is pretty awful > about distinguishing a process's actual memory use (private memory) > from the portion of PG's shared memory that it happens to have touched. > What you need to pay attention to is RES minus SHR, not either number > alone. With shared buffers set as high as you've got it, you'll > probably not be able to be sure that a process is bloating until it's > eaten hundreds of megs of private space. In the past 18 hours, swap usage has nearly doubled on systemA: $ free -m total used free shared buffers cached Mem: 56481 56210 271 0 11 52470 -/+ buffers/cache: 3727 52753 Swap: 1099 35 1064 As a reminder, this is what it was yesterday afternoon (roughly 18 hours earlier): total used free shared buffers cached Mem: 56481 55486 995 0 15 53298 -/+ buffers/cache: 2172 54309 Swap: 1099 18 1081 Here's top sorted by memory usage: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2250 postgres 20 0 13.5g 10g 10g S 0.0 19.0 4:50.17 postmaster 20427 postgres 20 0 13.6g 8.6g 8.5g D 3.6 15.5 6:15.53 postmaster 1861 postgres 20 0 13.5g 2.2g 2.2g D 7.2 3.9 0:44.34 postmaster 32275 postgres 20 0 13.5g 2.0g 2.0g S 0.0 3.7 0:32.52 postmaster 2444 postgres 20 0 13.5g 2.0g 2.0g S 0.0 3.6 0:21.22 postmaster 21632 postgres 20 0 13.5g 2.0g 1.9g S 0.0 3.6 1:48.76 postmaster 1870 postgres 20 0 13.5g 1.7g 1.7g S 0.0 3.1 0:18.70 postmaster 2383 postgres 20 0 13.5g 1.7g 1.7g S 0.0 3.1 0:13.23 postmaster 32280 postgres 20 0 13.5g 1.6g 1.6g S 0.0 2.9 0:20.97 postmaster 2378 postgres 20 0 13.5g 1.3g 1.3g S 0.0 2.4 0:15.36 postmaster 2367 postgres 20 0 13.5g 1.3g 1.3g S 0.0 2.4 0:15.71 postmaster 2396 postgres 20 0 13.5g 1.2g 1.2g S 0.0 2.3 0:09.33 postmaster 31794 postgres 20 0 13.5g 1.2g 1.2g S 0.0 2.2 0:14.68 postmaster 1891 postgres 20 0 13.5g 1.2g 1.2g S 0.0 2.1 0:06.49 postmaster 2435 postgres 20 0 13.5g 1.2g 1.1g S 0.0 2.1 0:16.38 postmaster 2370 postgres 20 0 13.5g 1.1g 1.1g S 0.0 2.0 0:08.51 postmaster 2411 postgres 20 0 13.5g 1.0g 1.0g S 0.0 1.9 0:09.18 postmaster 31782 postgres 20 0 13.5g 981m 967m S 0.0 1.7 0:12.33 postmaster 32314 postgres 20 0 13.5g 909m 891m S 0.0 1.6 0:09.07 postmaster 2395 postgres 20 0 13.5g 890m 876m S 0.0 1.6 0:07.16 postmaster 2381 postgres 20 0 13.5g 829m 815m S 0.0 1.5 0:05.30 postmaster 2213 postgres 20 0 13.5g 829m 817m S 0.0 1.5 0:04.40 postmaster 2421 postgres 20 0 13.5g 803m 790m S 0.0 1.4 0:05.11 postmaster 1866 postgres 20 0 13.5g 797m 784m S 0.0 1.4 0:08.10 postmaster 2371 postgres 20 0 13.5g 793m 781m S 0.0 1.4 0:05.88 postmaster 3460 postgres 20 0 13.5g 786m 773m S 0.0 1.4 0:06.17 postmaster 2418 postgres 20 0 13.5g 604m 594m S 0.0 1.1 0:01.69 postmaster 2425 postgres 20 0 13.5g 582m 570m S 0.0 1.0 0:02.67 postmaster 5863 postgres 20 0 13.5g 358m 347m D 0.0 0.6 0:01.90 postmaster 5865 postgres 20 0 13.5g 293m 283m S 0.0 0.5 0:01.70 postmaster 2243 postgres 20 0 13.5g 289m 288m S 0.0 0.5 0:07.01 postmaster 5862 postgres 20 0 13.5g 195m 189m S 0.0 0.3 0:00.48 postmaster 1890 postgres 20 0 13.5g 149m 145m S 0.0 0.3 0:06.02 postmaster 5852 postgres 20 0 13.5g 132m 124m S 0.0 0.2 0:01.04 postmaster 32316 postgres 20 0 13.5g 21m 19m S 0.0 0.0 0:00.60 postmaster 1876 postgres 20 0 13.5g 21m 19m S 0.0 0.0 0:00.16 postmaster 5866 postgres 20 0 13.5g 17m 15m S 0.0 0.0 0:00.01 postmaster 2251 postgres 20 0 13.5g 17m 16m S 0.0 0.0 0:57.19 postmaster 32328 postgres 20 0 13.5g 16m 13m S 0.0 0.0 0:00.20 postmaster 4893 postgres 20 0 13.5g 15m 13m S 0.0 0.0 0:00.15 postmaster 2426 postgres 20 0 13.5g 14m 12m S 0.0 0.0 0:00.15 postmaster 1878 postgres 20 0 13.5g 14m 12m S 0.0 0.0 0:00.12 postmaster 2408 postgres 20 0 13.5g 13m 10m S 0.0 0.0 0:00.06 postmaster 1896 postgres 20 0 13.5g 12m 10m S 0.0 0.0 0:00.13 postmaster 2427 postgres 20 0 13.5g 10m 8436 S 0.0 0.0 0:00.07 postmaster 2401 postgres 20 0 13.5g 8484 6412 S 0.0 0.0 0:00.03 postmaster 5864 postgres 20 0 13.5g 7740 5836 S 0.0 0.0 0:00.00 postmaster 5853 postgres 20 0 13.5g 7528 5652 S 0.0 0.0 0:00.00 postmaster I've attached the corresponding 'ps auxwww' output as well (since its rather wide, and it will likely get mangled inside the email body), but here it is too for those who don't want to deal with the attachment: ############## postgres 1861 2.1 4.1 14135092 2421300 ? Ds 08:37 0:45 postgres: lfriedman farm 127.0.0.1(34293) INSERT postgres 1866 0.3 1.4 14142616 816732 ? Ss 08:38 0:08 postgres: lfriedman nightly 127.0.0.1(56661) idle postgres 1870 0.9 3.1 14147660 1816576 ? Ss 08:38 0:18 postgres: lfriedman nightly 127.0.0.1(56665) idle postgres 1876 0.0 0.0 14134040 21892 ? Ss 08:38 0:00 postgres: lfriedman cbs 127.0.0.1(56670) idle postgres 1878 0.0 0.0 14134636 14880 ? Ss 08:38 0:00 postgres: lfriedman cbs 127.0.0.1(56672) idle postgres 1890 0.3 0.2 14134992 153224 ? Ss 08:40 0:06 postgres: lfriedman farm 127.0.0.1(56673) idle postgres 1891 0.3 2.1 14134852 1215668 ? Ss 08:40 0:06 postgres: lfriedman farm 127.0.0.1(56674) idle postgres 1896 0.0 0.0 14134612 13244 ? Ss 08:40 0:00 postgres: lfriedman cbs 127.0.0.1(56677) idle postgres 2213 0.2 1.4 14142356 849448 ? Ss 08:40 0:04 postgres: lfriedman nightly 127.0.0.1(56693) idle postgres 2243 0.0 0.5 14125816 296012 ? S Aug29 0:07 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data postgres 2248 0.0 0.0 159892 908 ? Ss Aug29 0:00 postgres: logger process postgres 2250 0.3 19.0 14133028 11013112 ? Ss Aug29 4:50 postgres: writer process postgres 2251 0.0 0.0 14132476 17532 ? Ss Aug29 0:57 postgres: wal writer process postgres 2252 0.0 0.0 14133884 2428 ? Ss Aug29 0:09 postgres: autovacuum launcher process postgres 2253 0.0 0.0 161272 1872 ? Ss Aug29 0:33 postgres: stats collector process postgres 2367 0.8 2.3 14147948 1385192 ? Ss 08:41 0:15 postgres: lfriedman nightly 127.0.0.1(56695) idle postgres 2370 0.4 1.9 14147088 1151312 ? Ss 08:41 0:08 postgres: lfriedman nightly 127.0.0.1(56698) idle postgres 2371 0.3 1.4 14142216 814360 ? Ds 08:41 0:05 postgres: lfriedman nightly 127.0.0.1(56699) UPDATE postgres 2378 0.8 2.3 14142980 1387028 ? Ss 08:41 0:15 postgres: lfriedman nightly 127.0.0.1(56706) idle postgres 2381 0.2 1.4 14144484 849852 ? Ss 08:41 0:05 postgres: lfriedman nightly 127.0.0.1(56709) idle postgres 2383 0.7 3.1 14147196 1800300 ? Ss 08:41 0:13 postgres: lfriedman nightly 127.0.0.1(56711) idle postgres 2395 0.3 1.5 14142816 911604 ? Ss 08:41 0:07 postgres: lfriedman nightly 127.0.0.1(56722) idle postgres 2396 0.5 2.2 14144544 1302852 ? Ss 08:41 0:09 postgres: lfriedman nightly 127.0.0.1(56723) idle postgres 2401 0.0 0.0 14134040 8984 ? Ss 08:41 0:00 postgres: lfriedman cbs 127.0.0.1(56728) idle postgres 2408 0.0 0.0 14134824 13940 ? Ss 08:41 0:00 postgres: lfriedman cbs 127.0.0.1(56735) idle postgres 2411 0.4 1.8 14147688 1087016 ? Ss 08:41 0:09 postgres: lfriedman nightly 127.0.0.1(56738) idle postgres 2418 0.0 1.0 14141992 619484 ? Ss 08:41 0:01 postgres: lfriedman nightly 127.0.0.1(56745) idle postgres 2421 0.2 1.4 14142428 822400 ? Ss 08:41 0:05 postgres: lfriedman nightly 127.0.0.1(56748) idle postgres 2425 0.1 1.0 14142500 596420 ? Ss 08:41 0:02 postgres: lfriedman nightly 127.0.0.1(56752) idle postgres 2426 0.0 0.0 14134720 15896 ? Ss 08:41 0:00 postgres: lfriedman cbs 127.0.0.1(56753) idle postgres 2427 0.0 0.0 14134040 10536 ? Ss 08:41 0:00 postgres: lfriedman cbs 127.0.0.1(56754) idle postgres 2435 0.8 2.0 14147132 1210368 ? Ss 08:41 0:16 postgres: lfriedman nightly 127.0.0.1(56761) idle postgres 2444 1.1 3.5 14136592 2063556 ? Ss 08:42 0:21 postgres: lfriedman farm 127.0.0.1(56769) idle postgres 3460 0.5 1.3 14142640 805412 ? Ss 08:51 0:06 postgres: lfriedman nightly 127.0.0.1(37675) idle postgres 4893 0.0 0.0 14134520 16068 ? Ss 09:02 0:00 postgres: lfriedman inventory 127.0.0.1(50384) idle postgres 5852 0.6 0.2 14139168 135420 ? Ss 09:09 0:01 postgres: lfriedman nightly 127.0.0.1(39228) idle postgres 5853 0.0 0.0 14133968 7528 ? Ss 09:09 0:00 postgres: lfriedman nightly 127.0.0.1(39229) idle postgres 5862 0.3 0.3 14137720 200300 ? Ss 09:10 0:00 postgres: lfriedman nightly 127.0.0.1(39230) idle postgres 5863 1.5 0.6 14144120 367536 ? Ss 09:10 0:01 postgres: lfriedman nightly 127.0.0.1(39231) idle postgres 5864 0.0 0.0 14133968 7740 ? Ss 09:10 0:00 postgres: lfriedman nightly 127.0.0.1(39232) idle postgres 5865 1.7 0.5 14144176 338516 ? Ss 09:10 0:02 postgres: lfriedman nightly 127.0.0.1(39233) idle postgres 5866 0.0 0.0 14135036 18388 ? Ss 09:10 0:00 postgres: lfriedman nightly 127.0.0.1(39234) idle postgres 20427 0.6 15.5 14232764 8991924 ? Ss Aug29 6:16 postgres: autovacuum worker process nightly postgres 21632 0.1 3.5 14146368 2055856 ? Ss Aug29 1:48 postgres: lfriedman nightly 127.0.0.1(33664) idle postgres 31782 0.3 1.7 14142696 1005052 ? Ss 08:18 0:12 postgres: lfriedman nightly 127.0.0.1(39318) idle postgres 31794 0.4 2.1 14149300 1261472 ? Ss 08:18 0:14 postgres: lfriedman nightly 127.0.0.1(39330) idle postgres 32275 1.0 3.6 14147716 2125608 ? Ss 08:21 0:32 postgres: lfriedman nightly 127.0.0.1(39356) idle postgres 32280 0.6 2.9 14150556 1699160 ? Ss 08:21 0:20 postgres: lfriedman nightly 127.0.0.1(39361) idle postgres 32314 0.3 1.6 14146864 930840 ? Ss 08:22 0:09 postgres: lfriedman nightly 127.0.0.1(39395) idle postgres 32328 0.0 0.0 14134636 16396 ? Ss 08:22 0:00 postgres: lfriedman cbs 127.0.0.1(39409) idle ############## I'm not sure if there's much to work with at this point, since is only been 18 hours. Please let me know if I can get additional data, or if its better to wait several more days before capturing this data again. thanks!
Attachment
On Tue, Aug 30, 2011 at 11:17 AM, Lonni J Friedman <netllama@gmail.com> wrote: > On Mon, Aug 29, 2011 at 5:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Lonni J Friedman <netllama@gmail.com> writes: >>> I have several Linux-x68_64 based dedicated PostgreSQL servers where >>> I'm experiencing significant swap usage growth over time. All of them >>> have fairly substantial amounts of RAM (not including swap), yet the >>> amount of swap that postgres is using ramps up over time and >>> eventually hurts performance badly. In every case, simply restarting >>> postgresql frees up all the swap in use (until it ramps up again >>> later). >> >> If you're certain that it's restarting *postgres* that does it, and not >> restarting your application or pgbouncer or some other code, then it >> seems like you must have uncovered a memory leak someplace. We haven't >> got nearly enough info here to diagnose it though. >> >> First thing I'd want to know is which process(es) exactly are bloating. >> The top output you showed us is unhelpful for that since it just shows >> them all as "postmaster" --- you'll need to match up the problem PIDs >> with "ps auxww" output. Keep in mind also that top is pretty awful >> about distinguishing a process's actual memory use (private memory) >> from the portion of PG's shared memory that it happens to have touched. >> What you need to pay attention to is RES minus SHR, not either number >> alone. With shared buffers set as high as you've got it, you'll >> probably not be able to be sure that a process is bloating until it's >> eaten hundreds of megs of private space. > > In the past 18 hours, swap usage has nearly doubled on systemA: > $ free -m > total used free shared buffers cached > Mem: 56481 56210 271 0 11 52470 > -/+ buffers/cache: 3727 52753 > Swap: 1099 35 1064 > > As a reminder, this is what it was yesterday afternoon (roughly 18 > hours earlier): > total used free shared buffers cached > Mem: 56481 55486 995 0 15 53298 > -/+ buffers/cache: 2172 54309 > Swap: 1099 18 1081 This is totally uninteresting. All this means is that the o/s has found 35 mb (17 more) of memory sitting around doing nothing and swapped it out. That inconsequential amount of extra memory can now be used by something else, which is expected and desired behavior. Swap usage, in terms of the amount of memory in swap holding unused resident memory, is a *good* thing. Swap churn, that is memory constantly moving in and out of the swap volume due to the o/s managing different and competing demands is a bad thing. FWICT, your churn is zero -- the vast majority of your memory is allocated to caching of filesystem pages, which is totally normal. This is making me wonder if your original diagnosis of swap causing your surrounding performance issues is in fact correct. We need to see a memory and i/o profile (iostat/iowait) during your 'bad' times to be sure though. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Tue, Aug 30, 2011 at 11:17 AM, Lonni J Friedman <netllama@gmail.com> wrote: >> In the past 18 hours, swap usage has nearly doubled on systemA: >> $ free -m >> total used free shared buffers cached >> Mem: 56481 56210 271 0 11 52470 >> -/+ buffers/cache: 3727 52753 >> Swap: 1099 35 1064 >> >> As a reminder, this is what it was yesterday afternoon (roughly 18 >> hours earlier): >> total used free shared buffers cached >> Mem: 56481 55486 995 0 15 53298 >> -/+ buffers/cache: 2172 54309 >> Swap: 1099 18 1081 > This is totally uninteresting. Yeah. You're going to need a whole lot more than 17MB of bloat before it'll be possible to tell which process is at fault, given that the expected process sizes are up to 10GB. regards, tom lane
On 08/30/11 12:18 PM, Tom Lane wrote: >>> total used free shared buffers cached >>> >> Mem: 56481 55486 995 0 15 53298 >>> >> -/+ buffers/cache: 2172 54309 >>> >> Swap: 1099 18 1081 >> > This is totally uninteresting. > Yeah. You're going to need a whole lot more than 17MB of bloat before > it'll be possible to tell which process is at fault, given that the > expected process sizes are up to 10GB. indeed, its 100% normal for the linux virtual memory manager to write 'dirty' pages to the swapfile during idle time. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
On Tue, Aug 30, 2011 at 2:55 PM, John R Pierce <pierce@hogranch.com> wrote: > On 08/30/11 12:18 PM, Tom Lane wrote: >>>> >>>> total used free shared buffers cached >>>> >> Mem: 56481 55486 995 0 15 >>>> >> 53298 >>>> >> -/+ buffers/cache: 2172 54309 >>>> >> Swap: 1099 18 1081 >>> >>> > This is totally uninteresting. >> >> Yeah. You're going to need a whole lot more than 17MB of bloat before >> it'll be possible to tell which process is at fault, given that the >> expected process sizes are up to 10GB. > > indeed, its 100% normal for the linux virtual memory manager to write > 'dirty' pages to the swapfile during idle time. Sure, but it shouldn't grow indefinitely?
On Tue, Aug 30, 2011 at 5:05 PM, Lonni J Friedman <netllama@gmail.com> wrote: > On Tue, Aug 30, 2011 at 2:55 PM, John R Pierce <pierce@hogranch.com> wrote: >> On 08/30/11 12:18 PM, Tom Lane wrote: >>>>> >>>>> total used free shared buffers cached >>>>> >> Mem: 56481 55486 995 0 15 >>>>> >> 53298 >>>>> >> -/+ buffers/cache: 2172 54309 >>>>> >> Swap: 1099 18 1081 >>>> >>>> > This is totally uninteresting. >>> >>> Yeah. You're going to need a whole lot more than 17MB of bloat before >>> it'll be possible to tell which process is at fault, given that the >>> expected process sizes are up to 10GB. >> >> indeed, its 100% normal for the linux virtual memory manager to write >> 'dirty' pages to the swapfile during idle time. > > Sure, but it shouldn't grow indefinitely? no it should not (and it can't -- you've only reserved so much space). however 35mb swap is incredibly low. on my linux workstation vm I've got 176mb used with nary an issue -- it fluctuates a lot depending what happens to be going on at the time (a workstation includes a lot of high memory low activity processes like gnome-panel which tend to swap out). had I disabled swappiness, that would have reduced memory available to do stuff with rather drastically. btw symptoms of swap thrashing are 100% full swap volume, extremely high iowait, low/no memory used for file caching, process(s) with high non shared resident memory usage, and, depending on how serious your situation is and how linux is configured, OOM errors and/or randomly killed processes. for systems not under extreme memory stress (including yours, at least at the moment) it's main purpose is to make sure some processes doesn't gobble up memory and sit on it. merlin
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Scott Marlowe > Sent: Tuesday, August 30, 2011 3:52 AM > To: Sim Zacks > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] heavy swapping, not sure why > > On Tue, Aug 30, 2011 at 2:50 AM, Sim Zacks <sim@compulab.co.il> wrote: > > > > On a machine with lots of memory, I've run into pathological > behaviour > > with both the RHEL 5 and Ubuntu 10.04 kernels where the kswapd starts > > eating up CPU and swap io like mad, while doing essentially nothing. > > Setting swappiness to 0 delayed this behaviour but did not stop it. > > Given that I'm on a machine with 128G ram, I just put "/sbin/swapoff > > -a" in /etc/rc.local and viola, problem solved. > > > > I've tried running without swap and the problem is if you actually do > run > > out of memory then the process killer can take out your postgresql. > > My postgres is configured to never use more than 10 or 20 gig at max > load. That leaves about 80+Gig for caching the database by the > kernel. work_mem is pretty small (8MB) given the maximum connections > (1000) So 8Gig if everybody sorts at once. Even under very heavy load > memory usage has never really gone anywhere near that high. Scott, 1000 max connections ? I thought that was several times more than recommended these days, even for 24 or 48 core machines. Or am I living in the past ? (I admit that my most recent runs of pgbench showed that best throughput at around 250 backends from a 2 cpu VM which kind of surprised me for a synthetic load and all that) Thanks. To the broader list, regarding troubles with kswap. I am curious to what others seeing from /proc/zoneinfo for DMA pages (not dma32 or normal) - basically if it sits at 1 or not. Setting swappiness to 0 did not have any affect for us on kswap issues. Another thing I have not had time and resources to go work on... interested in what kernel they are running and what storage drivers they might be using. FWIW (to the list vm. swappiness) at 0 didn't play well for us, with a postgresql fork, until we had a swap partition the size of memory. We were recommended to make that setting change for the fork that we are using and didn't, we learn the hard way sometimes. Granted what works well there isn't always applicable to Postgres. -Mark > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
On Tue, Aug 30, 2011 at 8:36 PM, mark <dvlhntr@gmail.com> wrote: > > Scott, > 1000 max connections ? I thought that was several times more than > recommended these days, even for 24 or 48 core machines. Or am I living in > the past ? (I admit that my most recent runs of pgbench showed that best > throughput at around 250 backends from a 2 cpu VM which kind of surprised me > for a synthetic load and all that) > It's definitely sub optimal, but we haven't had the time to test pgbouncer or pgpool in staging. luckily about 95% of those connections are idle at any given time, and no queries do anything that causes a thundering herd issue.
On Tue, Aug 30, 2011 at 8:36 PM, mark <dvlhntr@gmail.com> wrote: > To the broader list, regarding troubles with kswap. I am curious to what > others seeing from /proc/zoneinfo for DMA pages (not dma32 or normal) - > basically if it sits at 1 or not. Setting swappiness to 0 did not have any > affect for us on kswap issues. Another thing I have not had time and > resources to go work on... interested in what kernel they are running and > what storage drivers they might be using. Well, we had zone reclaim mode autoset to 1, and we had to turn it off to get decent performance with postgresql. Machine was a quad dodecacore Magny Cours, so 48 cores with 128G RAM. RAID controller is an Areca 1680 with BBU, 34 15kRPM 147G SAS Seagate 15k6 drives in two 16 drive external enclosures and 2 drives in the server. The only solution we could find for kswapd going crazy was to just turn off swap. Pretty sure I used a large swap file to test larger swaps, but all that did was put off the eventual kswapd storm. It took anywhere from one to two weeks, maybe more, and then one day you check and your servers maxed out by kswapd.
On Tue, Aug 30, 2011 at 10:05 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Tue, Aug 30, 2011 at 8:36 PM, mark <dvlhntr@gmail.com> wrote: >> To the broader list, regarding troubles with kswap. I am curious to what >> others seeing from /proc/zoneinfo for DMA pages (not dma32 or normal) - >> basically if it sits at 1 or not. Setting swappiness to 0 did not have any >> affect for us on kswap issues. Another thing I have not had time and >> resources to go work on... interested in what kernel they are running and >> what storage drivers they might be using. > > Well, we had zone reclaim mode autoset to 1, and we had to turn it off > to get decent performance with postgresql. Machine was a quad > dodecacore Magny Cours, so 48 cores with 128G RAM. RAID controller is > an Areca 1680 with BBU, 34 15kRPM 147G SAS Seagate 15k6 drives in two > 16 drive external enclosures and 2 drives in the server. > > The only solution we could find for kswapd going crazy was to just > turn off swap. Pretty sure I used a large swap file to test larger > swaps, but all that did was put off the eventual kswapd storm. It took > anywhere from one to two weeks, maybe more, and then one day you check > and your servers maxed out by kswapd. hm, that's an interesting counterpoint to what I've been saying. I've never seen that, I wonder what the underlying trigger was? I typically set shared_buffers fairly low (even to the default, raising only when I think it might help) -- I wonder if that plays in. to setting 1000 connections: some applications rely on database session features (like advisory locks or listen/notfiy) and retooling the client is more trouble than it's worth. This is definitely on the upper bound of what's reasonable though...these days I code with the assumption that pgbouncer is going to be put in even if I don't need it right away. merlin merlin
On Tue, Aug 30, 2011 at 9:05 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > Well, we had zone reclaim mode autoset to 1, and we had to turn it off > to get decent performance with postgresql. Machine was a quad > dodecacore Magny Cours, so 48 cores with 128G RAM. RAID controller is > an Areca 1680 with BBU, 34 15kRPM 147G SAS Seagate 15k6 drives in two > 16 drive external enclosures and 2 drives in the server. Also, Ubuntu 10.04 64 bit with all updates last fall right after it came out.