Thread: oom_killer
Is there anyone that could help me understand why all of a sudden with no noticeable change in data, no change in hardware, no change in OS, I'm seeing postmaster getting killed by oom_killer? The dmesg shows that swap has not been touched free and total are the same, so this system is not running out of total memory per say. I keep thinking it's something to do with lowmem vs highmem 32bit vs 64 bit, but again no changes and I'm getting hit nightly on 2 different servers (running slon, so switched over and same thing, even disabled memory over commit and still got nailed. Is there anyone familiar with this or could take a look at the dmesg output (off list) and decipher it for me? this is a Fedora 12 system, 2.6.32.23-170. I've been reading and appears this is yet another fedora bug, but so far I have not found any concrete evidence on how to fix it. Fedora 12 32gig memory, 8 proc postgres 8.4.4, slony 1.20 5 gigs of swap (never hit it!) Thanks Tory
Funny concidence, I was just reading up a blog post on postgres an OOM killer. http://gentooexperimental.org/~patrick/weblog/archives/2011-04.html#e2011-04-20T21_58_37.txt Hope this helps. 2011/4/21 Tory M Blue <tmblue@gmail.com>: > Is there anyone that could help me understand why all of a sudden with > no noticeable change in data, no change in hardware, no change in OS, > I'm seeing postmaster getting killed by oom_killer? > > The dmesg shows that swap has not been touched free and total are the > same, so this system is not running out of total memory per say. > > I keep thinking it's something to do with lowmem vs highmem 32bit vs > 64 bit, but again no changes and I'm getting hit nightly on 2 > different servers (running slon, so switched over and same thing, even > disabled memory over commit and still got nailed. > > Is there anyone familiar with this or could take a look at the dmesg > output (off list) and decipher it for me? > > this is a Fedora 12 system, 2.6.32.23-170. I've been reading and > appears this is yet another fedora bug, but so far I have not found > any concrete evidence on how to fix it. > > Fedora 12 > 32gig memory, 8 proc > postgres 8.4.4, slony 1.20 > 5 gigs of swap (never hit it!) > > Thanks > Tory > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
* Tory M Blue (tmblue@gmail.com) wrote: > Is there anyone that could help me understand why all of a sudden with > no noticeable change in data, no change in hardware, no change in OS, > I'm seeing postmaster getting killed by oom_killer? You would really be best off just turning off the oom_killer.. Of course, you should probably also figure out what process is actually chewing through your memory to the point that the OOM killer is getting run. > The dmesg shows that swap has not been touched free and total are the > same, so this system is not running out of total memory per say. There's probably something else that's trying to grab all the memory and then tries to use it and PG ends up getting nailed because the kernel over-attributes memory to it. You should be looking for that other process.. Thanks, Stephen
Attachment
On Thu, Apr 21, 2011 at 2:48 PM, Stephen Frost <sfrost@snowman.net> wrote: > > There's probably something else that's trying to grab all the memory and > then tries to use it and PG ends up getting nailed because the kernel > over-attributes memory to it. You should be looking for that other > process.. Not only that, you probably should set up your oom killer not to kill postmaster. Ever.
On Thu, Apr 21, 2011 at 2:53 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Thu, Apr 21, 2011 at 2:48 PM, Stephen Frost <sfrost@snowman.net> wrote: >> >> There's probably something else that's trying to grab all the memory and >> then tries to use it and PG ends up getting nailed because the kernel >> over-attributes memory to it. You should be looking for that other >> process.. > > Not only that, you probably should set up your oom killer not to kill > postmaster. Ever. > Here: http://developer.postgresql.org/pgdocs/postgres/kernel-resources.html
On Thu, Apr 21, 2011 at 3:28 AM, Tory M Blue <tmblue@gmail.com> wrote: > Is there anyone that could help me understand why all of a sudden with > no noticeable change in data, no change in hardware, no change in OS, > I'm seeing postmaster getting killed by oom_killer? > > The dmesg shows that swap has not been touched free and total are the > same, so this system is not running out of total memory per say. > > I keep thinking it's something to do with lowmem vs highmem 32bit vs > 64 bit, but again no changes and I'm getting hit nightly on 2 > different servers (running slon, so switched over and same thing, even > disabled memory over commit and still got nailed. > > Is there anyone familiar with this or could take a look at the dmesg > output (off list) and decipher it for me? > > this is a Fedora 12 system, 2.6.32.23-170. I've been reading and > appears this is yet another fedora bug, but so far I have not found > any concrete evidence on how to fix it. > > Fedora 12 > 32gig memory, 8 proc > postgres 8.4.4, slony 1.20 > 5 gigs of swap (never hit it!) curious: using 32/64 bit postgres? what are your postgresql.conf memory settings? merlin
On Thu, Apr 21, 2011 at 7:27 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Thu, Apr 21, 2011 at 3:28 AM, Tory M Blue <tmblue@gmail.com> wrote: >> Fedora 12 >> 32gig memory, 8 proc >> postgres 8.4.4, slony 1.20 >> 5 gigs of swap (never hit it!) > > curious: using 32/64 bit postgres? what are your postgresql.conf > memory settings? > > merlin > 32bit 32gb PAE kernel # - Checkpoints - checkpoint_segments = 100 max_connections = 300 shared_buffers = 2500MB # min 128kB or max_connections*16kB max_prepared_transactions = 0 work_mem = 100MB maintenance_work_mem = 128MB fsync = on thanks Tory
On Thu, Apr 21, 2011 at 5:53 AM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Thu, Apr 21, 2011 at 2:48 PM, Stephen Frost <sfrost@snowman.net> wrote: >> >> There's probably something else that's trying to grab all the memory and >> then tries to use it and PG ends up getting nailed because the kernel >> over-attributes memory to it. You should be looking for that other >> process.. > > Not only that, you probably should set up your oom killer not to kill > postmaster. Ever. Ya did that last night setting it to a -17 ya. and to the other user stating I should disable oom_killer all together, Ya of setting vm.overcommit to 2 and the ratio to 0 doesn't disable it, I don't know what else to do. out of memory is out of memory, but if swap is not being touched, I can't tell you what the heck this fedora team is doing/thinking Tory
On Thu, Apr 21, 2011 at 5:50 PM, Tory M Blue <tmblue@gmail.com> wrote: > # - Checkpoints - > checkpoint_segments = 100 > max_connections = 300 > shared_buffers = 2500MB # min 128kB or max_connections*16kB > max_prepared_transactions = 0 > work_mem = 100MB > maintenance_work_mem = 128MB > fsync = on That's an unrealistic setting for a 32-bit system, which can only address 3GB of memory per process. You take away 2500MB for shared buffers, that leaves you only 500M for data, some of which is code. There's no way PG can operate with 100MB work_mem llike that. Either decrease shared_buffers, or get a 64-bit system.
On Thu, Apr 21, 2011 at 8:57 AM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Thu, Apr 21, 2011 at 5:50 PM, Tory M Blue <tmblue@gmail.com> wrote: >> # - Checkpoints - >> checkpoint_segments = 100 >> max_connections = 300 >> shared_buffers = 2500MB # min 128kB or max_connections*16kB >> max_prepared_transactions = 0 >> work_mem = 100MB >> maintenance_work_mem = 128MB >> fsync = on > > That's an unrealistic setting for a 32-bit system, which can only > address 3GB of memory per process. > > You take away 2500MB for shared buffers, that leaves you only 500M for > data, some of which is code. > > There's no way PG can operate with 100MB work_mem llike that. > > Either decrease shared_buffers, or get a 64-bit system. While I don't mind the occasional slap of reality. This configuration has run for 4+ years. It's possible that as many other components each fedora release is worse then the priors. The Os has changed 170 days ago from fc6 to f12, but the postgres configuration has been the same, and umm no way it can operate, is so black and white, especially when it has ran performed well with a decent sized data set for over 4 years. This is not the first time I've posted configs to this list over the last few years and not once has anyone pointed this shortcoming out or said this will never work. While i'm still a newb when it comes to postgres performance tuning, I don't generally see things in black and white. And again zero swap is being used but oom_killer is being called?? But if I remove
On Thu, Apr 21, 2011 at 6:15 PM, Tory M Blue <tmblue@gmail.com> wrote: > While I don't mind the occasional slap of reality. This configuration > has run for 4+ years. It's possible that as many other components each > fedora release is worse then the priors. I'd say you've been lucky. You must be running overnight report queries that didn't run before, and that require more sorting memory than usual. Or... I dunno... but something did change.
On Thu, Apr 21, 2011 at 11:15 AM, Tory M Blue <tmblue@gmail.com> wrote: > While I don't mind the occasional slap of reality. This configuration > has run for 4+ years. It's possible that as many other components each > fedora release is worse then the priors. How many of those 300 max connections do you generally use? If you've always used a handful, or you've used more but they weren't memory hungry then you've been lucky. work_mem is how much memory postgresql can allocate PER sort or hash type operation. Each connection can do that more than once. A complex query can do it dozens of times. Can you see that going from 20 to 200 connections and increasing complexity can result in memory usage going from a few megabytes to something like 200 connections * 100Megabytes per sort * 3 sorts = 60Gigabytes. > The Os has changed 170 days ago from fc6 to f12, but the postgres > configuration has been the same, and umm no way it can operate, is so > black and white, especially when it has ran performed well with a > decent sized data set for over 4 years. Just because you've been walking around with a gun pointing at your head without it going off does not mean walking around with a gun pointing at your head is a good idea.
On Thu, Apr 21, 2011 at 1:04 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Thu, Apr 21, 2011 at 11:15 AM, Tory M Blue <tmblue@gmail.com> wrote: > >> While I don't mind the occasional slap of reality. This configuration >> has run for 4+ years. It's possible that as many other components each >> fedora release is worse then the priors. > > How many of those 300 max connections do you generally use? If you've > always used a handful, or you've used more but they weren't memory > hungry then you've been lucky. max of 45 > work_mem is how much memory postgresql can allocate PER sort or hash > type operation. Each connection can do that more than once. A > complex query can do it dozens of times. Can you see that going from > 20 to 200 connections and increasing complexity can result in memory > usage going from a few megabytes to something like 200 connections * > 100Megabytes per sort * 3 sorts = 60Gigabytes. > >> The Os has changed 170 days ago from fc6 to f12, but the postgres >> configuration has been the same, and umm no way it can operate, is so >> black and white, especially when it has ran performed well with a >> decent sized data set for over 4 years. > > Just because you've been walking around with a gun pointing at your > head without it going off does not mean walking around with a gun > pointing at your head is a good idea. Yes that is what I gathered. It's good information and I'm always open to a smack if I learn something, which in this case I did. We were already working on moving to 64bit, but again the oom_killer popping up without the system even attempting to use swap is what has caused me some pause. Thanks again Tory
On Thu, Apr 21, 2011 at 3:04 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > Just because you've been walking around with a gun pointing at your > head without it going off does not mean walking around with a gun > pointing at your head is a good idea. +1
On Thu, Apr 21, 2011 at 3:08 PM, Tory M Blue <tmblue@gmail.com> wrote: > On Thu, Apr 21, 2011 at 1:04 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Thu, Apr 21, 2011 at 11:15 AM, Tory M Blue <tmblue@gmail.com> wrote: >> >>> While I don't mind the occasional slap of reality. This configuration >>> has run for 4+ years. It's possible that as many other components each >>> fedora release is worse then the priors. >> >> How many of those 300 max connections do you generally use? If you've >> always used a handful, or you've used more but they weren't memory >> hungry then you've been lucky. > > max of 45 > >> work_mem is how much memory postgresql can allocate PER sort or hash >> type operation. Each connection can do that more than once. A >> complex query can do it dozens of times. Can you see that going from >> 20 to 200 connections and increasing complexity can result in memory >> usage going from a few megabytes to something like 200 connections * >> 100Megabytes per sort * 3 sorts = 60Gigabytes. >> >>> The Os has changed 170 days ago from fc6 to f12, but the postgres >>> configuration has been the same, and umm no way it can operate, is so >>> black and white, especially when it has ran performed well with a >>> decent sized data set for over 4 years. >> >> Just because you've been walking around with a gun pointing at your >> head without it going off does not mean walking around with a gun >> pointing at your head is a good idea. > > > Yes that is what I gathered. It's good information and I'm always open > to a smack if I learn something, which in this case I did. > > We were already working on moving to 64bit, but again the oom_killer > popping up without the system even attempting to use swap is what has > caused me some pause. I think this might have been the 32 bit address space biting you. But that's just a guess. Or the OS was running out of something other than just plain memory, like file handles or something. But I'm not that familiar with OOM killer as it's one of the things I tend to shut off when building a pg server. I also turn off swap and zone_reclaim mode.
On Thu, Apr 21, 2011 at 3:08 PM, Tory M Blue <tmblue@gmail.com> wrote: > On Thu, Apr 21, 2011 at 1:04 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Thu, Apr 21, 2011 at 11:15 AM, Tory M Blue <tmblue@gmail.com> wrote: >> >>> While I don't mind the occasional slap of reality. This configuration >>> has run for 4+ years. It's possible that as many other components each >>> fedora release is worse then the priors. >> >> How many of those 300 max connections do you generally use? If you've >> always used a handful, or you've used more but they weren't memory >> hungry then you've been lucky. > > max of 45 > >> work_mem is how much memory postgresql can allocate PER sort or hash >> type operation. Each connection can do that more than once. A >> complex query can do it dozens of times. Can you see that going from >> 20 to 200 connections and increasing complexity can result in memory >> usage going from a few megabytes to something like 200 connections * >> 100Megabytes per sort * 3 sorts = 60Gigabytes. >> >>> The Os has changed 170 days ago from fc6 to f12, but the postgres >>> configuration has been the same, and umm no way it can operate, is so >>> black and white, especially when it has ran performed well with a >>> decent sized data set for over 4 years. >> >> Just because you've been walking around with a gun pointing at your >> head without it going off does not mean walking around with a gun >> pointing at your head is a good idea. > > > Yes that is what I gathered. It's good information and I'm always open > to a smack if I learn something, which in this case I did. > > We were already working on moving to 64bit, but again the oom_killer > popping up without the system even attempting to use swap is what has > caused me some pause. Your shared_buffers is way way to high...you have dangerously oversubscribed this system. I would consider dropping down to 256-512mb. Yeah, you have PAE but that only helps so much. Your server can only address so much memory and you allocated a huge chunk of it right off the bat. Also, you might want to consider connection pooler to keep your #backends down, especially if you need to keep work_mem high. merlin
2011/4/21 Tory M Blue <tmblue@gmail.com>: > On Thu, Apr 21, 2011 at 7:27 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Thu, Apr 21, 2011 at 3:28 AM, Tory M Blue <tmblue@gmail.com> wrote: > >>> Fedora 12 >>> 32gig memory, 8 proc >>> postgres 8.4.4, slony 1.20 >>> 5 gigs of swap (never hit it!) >> >> curious: using 32/64 bit postgres? what are your postgresql.conf >> memory settings? >> >> merlin >> > > 32bit > 32gb > PAE kernel > > # - Checkpoints - > checkpoint_segments = 100 > max_connections = 300 > shared_buffers = 2500MB # min 128kB or max_connections*16kB > max_prepared_transactions = 0 > work_mem = 100MB > maintenance_work_mem = 128MB > fsync = on > I didn't understand what value you set for vm.overcommit parameters. Can you give it and the values in /proc/meminfo, the interesting one are "Commit*" ? If you have strict rules(overcommit=2), then your current kernel config may need some love : the commit_limit is probably too low because you have a small swap partition. One way is to change : vm.overcommit_ratio. By default it should be something like 21GB (0.5*32+5) of commit_limit, and you probably want 32GB :) Maybe you have some minor changes in your install or application usage and you just hit the limit. -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On Fri, Apr 22, 2011 at 4:03 AM, Cédric Villemain <cedric.villemain.debian@gmail.com> wrote: > 2011/4/21 Tory M Blue <tmblue@gmail.com>: >> On Thu, Apr 21, 2011 at 7:27 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >>> On Thu, Apr 21, 2011 at 3:28 AM, Tory M Blue <tmblue@gmail.com> wrote: >> >>>> Fedora 12 >>>> 32gig memory, 8 proc >>>> postgres 8.4.4, slony 1.20 >>>> 5 gigs of swap (never hit it!) >>> >>> curious: using 32/64 bit postgres? what are your postgresql.conf >>> memory settings? >>> >>> merlin >>> >> >> 32bit >> 32gb >> PAE kernel >> >> # - Checkpoints - >> checkpoint_segments = 100 >> max_connections = 300 >> shared_buffers = 2500MB # min 128kB or max_connections*16kB >> max_prepared_transactions = 0 >> work_mem = 100MB >> maintenance_work_mem = 128MB >> fsync = on >> > > I didn't understand what value you set for vm.overcommit parameters. > Can you give it and the values in /proc/meminfo, the interesting one > are "Commit*" ? > > If you have strict rules(overcommit=2), then your current kernel > config may need some love : the commit_limit is probably too low > because you have a small swap partition. One way is to change : > vm.overcommit_ratio. > By default it should be something like 21GB (0.5*32+5) of > commit_limit, and you probably want 32GB :) > > Maybe you have some minor changes in your install or application usage > and you just hit the limit. Thanks Cedric the sysctl vm's are # 04/17/2011 to keep overcommit memory in check vm.overcommit_memory = 2 vm.overcommit_ratio = 0 CommitLimit: 4128760 kB Committed_AS: 2380408 kB Ya I do think my swap space is biting us, (but again just starting to grasp that my swap space which has not grown with the continued addition of memory). I am just not starting to learn that the swap does need to be properly sized whether it's being used or not. I figured it would use the swap and it would run out, but sounds like the system takes the size into consideration and just decides not to use it. I appreciate the totally no postgres responses with this. Thanks Tory
Tory M Blue <tmblue@gmail.com> wrote: > I appreciate the totally no postgres responses with this. I didn't understand that. What do you mean? -Kevin
On Fri, Apr 22, 2011 at 9:34 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Tory M Blue <tmblue@gmail.com> wrote: > >> I appreciate the totally no postgres responses with this. > > I didn't understand that. What do you mean? > > -Kevin I meant that when starting to talk about kernel commit limits/ etc, it's not really postgres centric, but you folks are still assisting me with this. So thanks, some could say take it to Linux kernels, even though this is killing postgres. Tory
2011/4/22 Tory M Blue <tmblue@gmail.com>: > On Fri, Apr 22, 2011 at 4:03 AM, Cédric Villemain > <cedric.villemain.debian@gmail.com> wrote: >> 2011/4/21 Tory M Blue <tmblue@gmail.com>: >>> On Thu, Apr 21, 2011 at 7:27 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >>>> On Thu, Apr 21, 2011 at 3:28 AM, Tory M Blue <tmblue@gmail.com> wrote: >>> >>>>> Fedora 12 >>>>> 32gig memory, 8 proc >>>>> postgres 8.4.4, slony 1.20 >>>>> 5 gigs of swap (never hit it!) >>>> >>>> curious: using 32/64 bit postgres? what are your postgresql.conf >>>> memory settings? >>>> >>>> merlin >>>> >>> >>> 32bit >>> 32gb >>> PAE kernel >>> >>> # - Checkpoints - >>> checkpoint_segments = 100 >>> max_connections = 300 >>> shared_buffers = 2500MB # min 128kB or max_connections*16kB >>> max_prepared_transactions = 0 >>> work_mem = 100MB >>> maintenance_work_mem = 128MB >>> fsync = on >>> >> >> I didn't understand what value you set for vm.overcommit parameters. >> Can you give it and the values in /proc/meminfo, the interesting one >> are "Commit*" ? >> >> If you have strict rules(overcommit=2), then your current kernel >> config may need some love : the commit_limit is probably too low >> because you have a small swap partition. One way is to change : >> vm.overcommit_ratio. >> By default it should be something like 21GB (0.5*32+5) of >> commit_limit, and you probably want 32GB :) >> >> Maybe you have some minor changes in your install or application usage >> and you just hit the limit. > > Thanks Cedric > > the sysctl vm's are > > # 04/17/2011 to keep overcommit memory in check > vm.overcommit_memory = 2 > vm.overcommit_ratio = 0 > > CommitLimit: 4128760 kB > Committed_AS: 2380408 kB Are you sure it is a PAE kernel ? You look limited to 4GB. I don't know atm if overcommit_ratio=0 has a special meaning, else I would suggest to update it to something like 40% (the default), but 60% should still be safe (60% of 32GB + 5GB) > > > Ya I do think my swap space is biting us, (but again just starting to > grasp that my swap space which has not grown with the continued > addition of memory). I am just not starting to learn that the swap > does need to be properly sized whether it's being used or not. I > figured it would use the swap and it would run out, but sounds like > the system takes the size into consideration and just decides not to > use it. > > I appreciate the totally no postgres responses with this. > > Thanks > Tory > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
2011/4/22 Cédric Villemain <cedric.villemain.debian@gmail.com>: > 2011/4/22 Tory M Blue <tmblue@gmail.com>: >> On Fri, Apr 22, 2011 at 4:03 AM, Cédric Villemain >> <cedric.villemain.debian@gmail.com> wrote: >>> 2011/4/21 Tory M Blue <tmblue@gmail.com>: >>>> On Thu, Apr 21, 2011 at 7:27 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >>>>> On Thu, Apr 21, 2011 at 3:28 AM, Tory M Blue <tmblue@gmail.com> wrote: >>>> >>>>>> Fedora 12 >>>>>> 32gig memory, 8 proc >>>>>> postgres 8.4.4, slony 1.20 >>>>>> 5 gigs of swap (never hit it!) >>>>> >>>>> curious: using 32/64 bit postgres? what are your postgresql.conf >>>>> memory settings? >>>>> >>>>> merlin >>>>> >>>> >>>> 32bit >>>> 32gb >>>> PAE kernel >>>> >>>> # - Checkpoints - >>>> checkpoint_segments = 100 >>>> max_connections = 300 >>>> shared_buffers = 2500MB # min 128kB or max_connections*16kB >>>> max_prepared_transactions = 0 >>>> work_mem = 100MB >>>> maintenance_work_mem = 128MB >>>> fsync = on >>>> >>> >>> I didn't understand what value you set for vm.overcommit parameters. >>> Can you give it and the values in /proc/meminfo, the interesting one >>> are "Commit*" ? >>> >>> If you have strict rules(overcommit=2), then your current kernel >>> config may need some love : the commit_limit is probably too low >>> because you have a small swap partition. One way is to change : >>> vm.overcommit_ratio. >>> By default it should be something like 21GB (0.5*32+5) of >>> commit_limit, and you probably want 32GB :) >>> >>> Maybe you have some minor changes in your install or application usage >>> and you just hit the limit. >> >> Thanks Cedric >> >> the sysctl vm's are >> >> # 04/17/2011 to keep overcommit memory in check >> vm.overcommit_memory = 2 >> vm.overcommit_ratio = 0 >> >> CommitLimit: 4128760 kB >> Committed_AS: 2380408 kB > > Are you sure it is a PAE kernel ? You look limited to 4GB. > > I don't know atm if overcommit_ratio=0 has a special meaning, else I > would suggest to update it to something like 40% (the default), but default being 50 ... > 60% should still be safe (60% of 32GB + 5GB) > >> >> >> Ya I do think my swap space is biting us, (but again just starting to >> grasp that my swap space which has not grown with the continued >> addition of memory). I am just not starting to learn that the swap >> does need to be properly sized whether it's being used or not. I >> figured it would use the swap and it would run out, but sounds like >> the system takes the size into consideration and just decides not to >> use it. >> >> I appreciate the totally no postgres responses with this. >> >> Thanks >> Tory >> > > > > -- > Cédric Villemain 2ndQuadrant > http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On Fri, Apr 22, 2011 at 9:46 AM, Cédric Villemain <cedric.villemain.debian@gmail.com> wrote: > 2011/4/22 Cédric Villemain <cedric.villemain.debian@gmail.com>: >> Are you sure it is a PAE kernel ? You look limited to 4GB. >> >> I don't know atm if overcommit_ratio=0 has a special meaning, else I >> would suggest to update it to something like 40% (the default), but > > default being 50 ... > >> 60% should still be safe (60% of 32GB + 5GB) 2.6.32.23-170.fc12.i686.PAE , so it says. Okay so instead of dropping to 2-0 with the overcommit and ratio settings, I should look at matching the ratio more to what I actually have in swap? Sorry but knee jerk reaction when I got hit by the oom_killer twice. Tory
On Thu, Apr 21, 2011 at 1:28 AM, Tory M Blue <tmblue@gmail.com> wrote: > this is a Fedora 12 system, 2.6.32.23-170. I've been reading and > appears this is yet another fedora bug, but so far I have not found > any concrete evidence on how to fix it. If it's a "fedora" bug, it's most likely related to the kernel where the OOM-killer lives which really makes it more of a kernel bug than a fedora bug as fedora kernels generally track upstream very closely. Given that both the version of Fedora you're using is no longer supported, at a minimum you should be running F-13 (or preferably F-14 since F-13 will lose maintenance in appx 2 months). If you have to stay on F-12 you might at least try building the latest 2.6.32-longterm kernel which is up to version 2.6.32.39. All that said - have you tried tracking memory usage of the machine leading up to OOM killer events? -Dave
On Fri, Apr 22, 2011 at 11:15 AM, David Rees <drees76@gmail.com> wrote: > On Thu, Apr 21, 2011 at 1:28 AM, Tory M Blue <tmblue@gmail.com> wrote: >> this is a Fedora 12 system, 2.6.32.23-170. I've been reading and >> appears this is yet another fedora bug, but so far I have not found >> any concrete evidence on how to fix it. > > If it's a "fedora" bug, it's most likely related to the kernel where > the OOM-killer lives which really makes it more of a kernel bug than a > fedora bug as fedora kernels generally track upstream very closely. > > Given that both the version of Fedora you're using is no longer > supported, at a minimum you should be running F-13 (or preferably F-14 > since F-13 will lose maintenance in appx 2 months). If you have to > stay on F-12 you might at least try building the latest > 2.6.32-longterm kernel which is up to version 2.6.32.39. > > All that said - have you tried tracking memory usage of the machine > leading up to OOM killer events? > Thanks David and I have and in fact I do see spikes that would cause my system to run out of memory, but one thing I'm struggling with is my system always runs at the limit. It's the nature of linux to take all the memory and manage it. The larger hurdle is why no swap is ever used, it's there, but the system never uses it. even the oom killer shows that I have the full 5gb of swap available, yet nothing is using is. I want want want to see swap being used! If I run a script to do a bunch of malocs and hold I can see the system use up available memory then lay the smack down on my swap before oom is invoked. So I'm starting to think in the meantime, while I rebuild, I need to make sure I've got my postgres/kernel params in a good place. my ratio of 0 still allows oom_killer, but I've removed postgres from being targeted by oom_killer now. I should still set the overcommit ratio correct for my 32gb 4-5gb swap system, but having a hard time wrapping my head around that setting. Tory
On Fri, Apr 22, 2011 at 9:45 AM, Cédric Villemain <cedric.villemain.debian@gmail.com> wrote: >> CommitLimit: 4128760 kB >> Committed_AS: 2380408 kB > > Are you sure it is a PAE kernel ? You look limited to 4GB. Figured that the Commitlimit is actually the size of swap, so on one server it's 4gb and the other it's 5gb. So still need to figure out with 32gig of ram and 4 to 5gig swap, what my overcommit ratio should be. Tory
2011/4/22 Tory M Blue <tmblue@gmail.com>: > On Fri, Apr 22, 2011 at 9:45 AM, Cédric Villemain > <cedric.villemain.debian@gmail.com> wrote: > >>> CommitLimit: 4128760 kB >>> Committed_AS: 2380408 kB >> >> Are you sure it is a PAE kernel ? You look limited to 4GB. > > Figured that the Commitlimit is actually the size of swap, so on one > server it's 4gb and the other it's 5gb. > > So still need to figure out with 32gig of ram and 4 to 5gig swap, what > my overcommit ratio should be. at least the default value of 50, probably more, up to you to adjust. You should have something ok with 50, given that it used to work well until now with 0 (so you'll have 21GB of commitable memory ) > > Tory > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On Fri, Apr 22, 2011 at 6:45 PM, Cédric Villemain <cedric.villemain.debian@gmail.com> wrote: > Are you sure it is a PAE kernel ? You look limited to 4GB. If my memory/knowledge serves me right, PAE doesn't remove that limit. PAE allows more processes, and they can use more memory together, but one process alone has to live within an addressable range, and that is still 4GB, mandated by the 32-bit addressable space when operating in linear addressing mode. But linux kernels usually reserve 1GB for kernel stuff (buffers and that kind of stuff), so the addressable portion for processes is 3GB. Take away 2.5GB of shared buffers, and you only leave 0.5G for general data and code. Really, lowering shared_buffers will probably be a solution. Moving to 64 bits would be a better one.
On Apr 22, 2011, at 2:22 PM, Tory M Blue <tmblue@gmail.com> wrote: > Thanks David and I have and in fact I do see spikes that would cause > my system to run out of memory, but one thing I'm struggling with is > my system always runs at the limit. It's the nature of linux to take > all the memory and manage it. One thing to watch is the size of the filesystem cache. Generally as the system comes under memory pressure you will seethe cache shrink. Not sure what is happening on your system, but typically when it gets down to some minimal size, that'swhen the swapping starts. ...Robert
On Sat, Apr 23, 2011 at 12:24 PM, Robert Haas <robertmhaas@gmail.com> wrote: > One thing to watch is the size of the filesystem cache. Generally as the system comes under memory pressure you will seethe cache shrink. Not sure what is happening on your system, but typically when it gets down to some minimal size, that'swhen the swapping starts. > > ...Robert Thanks everyone, I've tuned the system in the tune of overcommit 2 and ratio of 80% this makes my commit look like: CommitLimit: 31694880 kB Committed_AS: 2372084 kB So with 32G of system memory and 4gb cache so far it's running okay, no ooms in the last 2 days and the DB is performing well again. I've also dropped the shared buffers to 2gb, that gives me 1 gb for data etc. I'll test with smaller 1.5gb if need be. I've already started the 64bit process, I've got to test if slon will replicate between a 32bit and 64 bit system, if the postgres/slon versions are the same (slon being the key here). If this works, I will be able to do the migration to 64bit that much easier, if not well, ya that changes the scheme a ton. Thanks for the all the assistance in this, it's really appreciated Tory