Thread: PostgreSQL 8.0.6 crash
PostgreSQL promptly uses all available memory for the query and subsequently crashes. I'm sure it can be corrected with a setting, but should it crash? freedb=# create table ucode as select distinct ucode from cdtitles group by ucode having count(ucode)>1 ; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. (Table layout) freedb=# \d cdtitles Table "public.cdtitles" Column | Type | Modifiers -----------+-------------------+----------------------------------- cdid | integer | default nextval('seq_cdid'::text) cdcode | character varying | ucode | character varying | title | character varying | artist | character varying | genre | character varying | dgenre | character varying | year | integer | processed | character varying | submit | character varying | revision | integer | disclen | integer | tracks | integer | extkey | integer | offsets | integer[] | Indexes: "cdtitles_ucode" btree (ucode) CLUSTER (Size of table) freedb=# select count(*) from cdtitles ; count --------- 1927912 (1 row) (Sample row) freedb=# select * from cdtitles where cdid = 100001 limit 1; cdid | cdcode | ucode | title | artist | genre | dgenre | year | processed | submit | revision | disclen | tracks | extkey | offsets --------+----------+-----------------------+--------------------------+------------+-------+--------+------+-------------------------------------------------+-----------------+----------+---------+--------+--------+----------------------------------------------------------------------------------------------------------------------------------------------- 100001 | 320e9e14 | 0ea0-14-3399-0006ad6a | Var Slutar V�arna (CD1) | Streaplers | rock | | 0 | cddbd v1.5PL3 Copyright (c) Steve Scherf et al. | dBpowerAMP V4.0 | 0 | | 19 | | {21,150,13920,31180,43664,59907,73163,86629,98447,113684,130205,141682,154920,166783,179028,192689,205161,222164,234649,249692,264340,3744,0} (1 row
Attachment
More info: the machine has 512M RAM and 512M swap Work mem is set to:work_mem = 1024 This should't have crashed, should it? > PostgreSQL promptly uses all available memory for the query and > subsequently crashes. > > I'm sure it can be corrected with a setting, but should it crash? > > freedb=# create table ucode as select distinct ucode from cdtitles group > by ucode having count(ucode)>1 ; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > (Table layout) > freedb=# \d cdtitles > Table "public.cdtitles" > Column | Type | Modifiers > -----------+-------------------+----------------------------------- > cdid | integer | default nextval('seq_cdid'::text) > cdcode | character varying | > ucode | character varying | > title | character varying | > artist | character varying | > genre | character varying | > dgenre | character varying | > year | integer | > processed | character varying | > submit | character varying | > revision | integer | > disclen | integer | > tracks | integer | > extkey | integer | > offsets | integer[] | > Indexes: > "cdtitles_ucode" btree (ucode) CLUSTER > > (Size of table) > freedb=# select count(*) from cdtitles ; > count > --------- > 1927912 > (1 row) > > (Sample row) > freedb=# select * from cdtitles where cdid = 100001 limit 1; > cdid | cdcode | ucode | title | > artist | genre | dgenre | year | processed > | submit | revision | disclen | tracks | extkey | > offsets > --------+----------+-----------------------+--------------------------+------------+-------+--------+------+-------------------------------------------------+-----------------+----------+---------+--------+--------+----------------------------------------------------------------------------------------------------------------------------------------------- > 100001 | 320e9e14 | 0ea0-14-3399-0006ad6a | Var Slutar V�arna > (CD1) | Streaplers | rock | | 0 | cddbd v1.5PL3 Copyright (c) > Steve Scherf et al. | dBpowerAMP V4.0 | 0 | | 19 | > | > {21,150,13920,31180,43664,59907,73163,86629,98447,113684,130205,141682,154920,166783,179028,192689,205161,222164,234649,249692,264340,3744,0} > (1 row > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
"Mark Woodward" <pgsql@mohawksoft.com> writes: > PostgreSQL promptly uses all available memory for the query and > subsequently crashes. I'll bet a nickel this is on a Linux machine with OOM kill enabled. What does the postmaster log show --- or look in the kernel log to see if it mentions anything about an out-of-memory kill. > freedb=# create table ucode as select distinct ucode from cdtitles group > by ucode having count(ucode)>1 ; > server closed the connection unexpectedly What does EXPLAIN show as the plan for that? If it's a hash aggregate, try with "enable_hashagg" turned off. How many distinct ucode values are there in the table? regards, tom lane
> "Mark Woodward" <pgsql@mohawksoft.com> writes: >> PostgreSQL promptly uses all available memory for the query and >> subsequently crashes. > > I'll bet a nickel this is on a Linux machine with OOM kill enabled. > What does the postmaster log show --- or look in the kernel log to > see if it mentions anything about an out-of-memory kill. That's a no brainer. Maybe I should have rephrased the condition, crash may be the wrong word, it was definitely killed by out of memory. Sorry. > >> freedb=# create table ucode as select distinct ucode from cdtitles group >> by ucode having count(ucode)>1 ; >> server closed the connection unexpectedly > > What does EXPLAIN show as the plan for that? If it's a hash aggregate, > try with "enable_hashagg" turned off. How many distinct ucode values > are there in the table? There are over 1.7M distinct rows, about 200K non-distinct that I want to somehow remove. It does have hash aggregate: freedb=# explain select distinct ucode from cdtitles group by ucode having count(ucode)>1 ; QUERY PLAN -------------------------------------------------------------------------------------Unique (cost=106536.32..106537.32 rows=200width=32) -> Sort (cost=106536.32..106536.82 rows=200 width=32) Sort Key: ucode -> HashAggregate (cost=106527.68..106528.68 rows=200 width=32) Filter: (count(ucode) > 1) -> SeqScan on cdtitles (cost=0.00..96888.12 rows=1927912 width=32) (6 rows) Well, shouldn't hash aggregate respect work memory limits?
"Mark Woodward" <pgsql@mohawksoft.com> writes: > -> HashAggregate (cost=106527.68..106528.68 rows=200 width=32) > Filter: (count(ucode) > 1) > -> Seq Scan on cdtitles (cost=0.00..96888.12 rows=1927912 > width=32) > Well, shouldn't hash aggregate respect work memory limits? If the planner thought there were 1.7M distinct values, it wouldn't have used hash aggregate ... but it only thinks there are 200, which IIRC is the default assumption. Have you ANALYZEd this table lately? Meanwhile, I'd strongly recommend turning off OOM kill. That's got to be the single worst design decision in the entire Linux kernel. regards, tom lane
> "Mark Woodward" <pgsql@mohawksoft.com> writes: >> -> HashAggregate (cost=106527.68..106528.68 rows=200 >> width=32) >> Filter: (count(ucode) > 1) >> -> Seq Scan on cdtitles (cost=0.00..96888.12 >> rows=1927912 >> width=32) > >> Well, shouldn't hash aggregate respect work memory limits? > > If the planner thought there were 1.7M distinct values, it wouldn't have > used hash aggregate ... but it only thinks there are 200, which IIRC is > the default assumption. Have you ANALYZEd this table lately? I thought that I had, but I did CLUSTER at some point. Or maybe I didn't I'm, not sure. I have been working on a file reader/parser/importer program. I created and dropped the DB so many times it is hard to keep track. Still, I would say that is is extremly bad behavior for not having stats, wouldn't you think? > > Meanwhile, I'd strongly recommend turning off OOM kill. That's got to > be the single worst design decision in the entire Linux kernel. How is this any different than the FreeBSD having a default 512M process size limit? On FreeBSD, the process would have been killed earlier.
"Mark Woodward" <pgsql@mohawksoft.com> writes: > Still, I would say that is is extremly bad behavior for not having > stats, wouldn't you think? Think of it as a kernel bug. >> Meanwhile, I'd strongly recommend turning off OOM kill. That's got to >> be the single worst design decision in the entire Linux kernel. > How is this any different than the FreeBSD having a default 512M process > size limit? On FreeBSD, the process would have been killed earlier. No, the process would have been politely told it was out of memory, and would have told you the same. If the kernel's way of notifying a process that it's out of memory is SIGKILL, there is not a damn thing that we can do to operate robustly. regards, tom lane
> "Mark Woodward" <pgsql@mohawksoft.com> writes: >> Still, I would say that is is extremly bad behavior for not having >> stats, wouldn't you think? > > Think of it as a kernel bug. While I respect your viewpoint that the Linux kernel should not kill an offending process if the system runs out of memory, I sort of disagree in that OOM is a disaster preventor. It should be viewed as a last ditch "him or me" choice the kernel needs to make and it should not get into that position in the first place. Regardless, it is troubling that failing to have current stats can cause the system, with a large data set, to exceed working memory limits. I think it is still a bug. While it may manifest itself as a pg crash on Linux because of a feature with which you have issue, the fact remains that PG is exeeding its working memory limit. Should failing to run ANALYZE cause this behavior? If so, how does this get clearly documented? If not, can it be prevented? > >>> Meanwhile, I'd strongly recommend turning off OOM kill. That's got to >>> be the single worst design decision in the entire Linux kernel. > >> How is this any different than the FreeBSD having a default 512M process >> size limit? On FreeBSD, the process would have been killed earlier. > > No, the process would have been politely told it was out of memory, and > would have told you the same. If the kernel's way of notifying a > process that it's out of memory is SIGKILL, there is not a damn thing > that we can do to operate robustly. Lets not waste time on a Linux discussion. Linux and FreeBSD have their strengths, and a debate on the dubious merits of either is a long and contentious debate. Both systems are fine, just with some subtle differences in design goals.
Tom Lane wrote: >"Mark Woodward" <pgsql@mohawksoft.com> writes: > > >>Still, I would say that is is extremly bad behavior for not having >>stats, wouldn't you think? >> >> > >Think of it as a kernel bug. > > > >>>Meanwhile, I'd strongly recommend turning off OOM kill. That's got to >>>be the single worst design decision in the entire Linux kernel. >>> >>> > > > >>How is this any different than the FreeBSD having a default 512M process >>size limit? On FreeBSD, the process would have been killed earlier. >> >> > >No, the process would have been politely told it was out of memory, and >would have told you the same. If the kernel's way of notifying a >process that it's out of memory is SIGKILL, there is not a damn thing >that we can do to operate robustly. > > And we have docco on it: http://www.postgresql.org/docs/current/static/kernel-resources.html#AEN18105 which I assume is still current. Back in October I mentioned the OOM killer to Andrew Morton - his reaction was a very pained look and a curt admonition: "turn it off". cheers andrew
"Mark Woodward" <pgsql@mohawksoft.com> writes: > I think it is still a bug. While it may manifest itself as a pg crash on > Linux because of a feature with which you have issue, the fact remains > that PG is exeeding its working memory limit. The problem is that *we have no way to know what that limit is* --- short of exceeding it and being summarily killed. (BTW, the kernel doesn't know what the limit is either.) There is simply not any way to operate robustly under the OOM-kill regime. While I'll certainly acknowledge that it'd be nice if hashagg had spill-to-disk capability, that wouldn't alter the fundamental fact that if you want reliable behavior you MUST turn off OOM kill. There is not anything we can do at the database level to work around that kernel-level misdesign. regards, tom lane
> "Mark Woodward" <pgsql@mohawksoft.com> writes: >> I think it is still a bug. While it may manifest itself as a pg crash on >> Linux because of a feature with which you have issue, the fact remains >> that PG is exeeding its working memory limit. > > The problem is that *we have no way to know what that limit is* --- > short of exceeding it and being summarily killed. (BTW, the kernel > doesn't know what the limit is either.) There is simply not any way > to operate robustly under the OOM-kill regime. No, you misunderstand what I said, the "working memory" as defined in postgresql.conf. I don't care about the OS debate. > > While I'll certainly acknowledge that it'd be nice if hashagg had > spill-to-disk capability, that wouldn't alter the fundamental fact that > if you want reliable behavior you MUST turn off OOM kill. There is not > anything we can do at the database level to work around that kernel-level > misdesign. Again, regardless of OS used, hashagg will exceed "working memory" as defined in postgresql.conf. At issue is would a lack of ANALYZE justify this behavior? If so, it should be documented.
"Mark Woodward" <pgsql@mohawksoft.com> writes: > Again, regardless of OS used, hashagg will exceed "working memory" as > defined in postgresql.conf. So? If you've got OOM kill enabled, it can zap a process whether it's strictly adhered to work_mem or not. The OOM killer is entirely capable of choosing a victim process whose memory footprint hasn't changed materially since it started (eg, the postmaster). regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > "Mark Woodward" <pgsql@mohawksoft.com> writes: > > Again, regardless of OS used, hashagg will exceed "working memory" as > > defined in postgresql.conf. > > So? If you've got OOM kill enabled, it can zap a process whether it's > strictly adhered to work_mem or not. The OOM killer is entirely capable > of choosing a victim process whose memory footprint hasn't changed > materially since it started (eg, the postmaster). Unless I've missed something here, disabling the OOM killer doesn't really solve the problem here. What solves the problem is running ANALYZE but it's still certainly the case that it would make some sense for the Postmaster, upon realizing that it's gone well beyond its work_mem boundary, to ideally switch plans to one which isn't going to exceed its work_mem limit. Less ideally, it could give up and issue an error to the user instead of running the box out of memory. I appriciate that this is probably not very easy to implement but I do believe the current situation could be improved in this regard. Thanks, Stephen
> "Mark Woodward" <pgsql@mohawksoft.com> writes: >> Again, regardless of OS used, hashagg will exceed "working memory" as >> defined in postgresql.conf. > > So? If you've got OOM kill enabled, it can zap a process whether it's > strictly adhered to work_mem or not. The OOM killer is entirely capable > of choosing a victim process whose memory footprint hasn't changed > materially since it started (eg, the postmaster). Sorry, I must strongly disagree here. The postgresql.conf "working mem" is a VERY IMPORTANT setting, it is intended to limit the consumption of memory by the postgresql process. Often times PostgreSQL will work along side other application servers on the same system, infact, may be a sub-part of application servers on the same system. (This is, in fact, how it is used on one of my site servers.) Clearly, if the server will use 1000 times this number (Set for 1024K, but exceeds 1G) this is broken, and it may cause other systems to fail or perform very poorly. If it is not something that can be fixed, it should be clearly documented.
On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote: > > "Mark Woodward" <pgsql@mohawksoft.com> writes: > >> Again, regardless of OS used, hashagg will exceed "working memory" as > >> defined in postgresql.conf. > > > > So? If you've got OOM kill enabled, it can zap a process whether it's > > strictly adhered to work_mem or not. The OOM killer is entirely capable > > of choosing a victim process whose memory footprint hasn't changed > > materially since it started (eg, the postmaster). > > Sorry, I must strongly disagree here. The postgresql.conf "working mem" is > a VERY IMPORTANT setting, it is intended to limit the consumption of > memory by the postgresql process. Often times PostgreSQL will work along Actually, no, it's not designed for that at all. > side other application servers on the same system, infact, may be a > sub-part of application servers on the same system. (This is, in fact, how > it is used on one of my site servers.) > > Clearly, if the server will use 1000 times this number (Set for 1024K, but > exceeds 1G) this is broken, and it may cause other systems to fail or > perform very poorly. > > If it is not something that can be fixed, it should be clearly documented. work_mem (integer) Specifies the amount of memory to be used by internal sort operations and hash tables before switching to temporarydisk files. The value is specified in kilobytes, and defaults to 1024 kilobytes (1 MB). Note that for a complexquery, several sort or hash operations might be running in parallel; each one will be allowed to use as much memoryas this value specifies before it starts to put data into temporary files. Also, several running sessions could be doing such operations concurrently. So the total memory used could be many times the value of work_mem; it is necessaryto keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and mergejoins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of IN subqueries. So it says right there that it's very easy to exceed work_mem by a very large amount. Granted, this is a very painful problem to deal with and will hopefully be changed at some point, but it's pretty clear as to how this works. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, Feb 09, 2006 at 11:42:57AM -0500, Mark Woodward wrote: > > "Mark Woodward" <pgsql@mohawksoft.com> writes: > >> Still, I would say that is is extremly bad behavior for not having > >> stats, wouldn't you think? > > > > Think of it as a kernel bug. > > While I respect your viewpoint that the Linux kernel should not kill an > offending process if the system runs out of memory, I sort of disagree in > that OOM is a disaster preventor. It should be viewed as a last ditch "him > or me" choice the kernel needs to make and it should not get into that > position in the first place. I've had processes run away on a FreeBSD box before, to the extent of running entirely out of swap and memory. Instead of random processes just dying for no apparent reason, I instead started getting a bunch of out-of-memory errors. No disaster, I just fixed the problem and life went on. Well, ok, the box did become rather unresponsive when my fix for the problem meant that all the sudden there were about 950 perl processes trying to run at the same time. I wish I'd captured top showing 900+ runnable processes. But after a few minutes the processes started completing and exiting and everything was soon back to normal. I rather doubt Linux would survive that... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
* Jim C. Nasby (jnasby@pervasive.com) wrote: > On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote: > > If it is not something that can be fixed, it should be clearly documented. > > work_mem (integer) > > Specifies the amount of memory to be used by internal sort > operations and hash tables before switching to temporary disk files. > The value is specified in kilobytes, and defaults to 1024 kilobytes > (1 MB). Note that for a complex query, several sort or hash > operations might be running in parallel; each one will be allowed to > use as much memory as this value specifies before it starts to put > data into temporary files. Also, several running sessions could be > doing such operations concurrently. So the total memory used could > be many times the value of work_mem; it is necessary to keep this > fact in mind when choosing the value. Sort operations are used for > ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash > joins, hash-based aggregation, and hash-based processing of IN > subqueries. > > So it says right there that it's very easy to exceed work_mem by a very > large amount. Granted, this is a very painful problem to deal with and > will hopefully be changed at some point, but it's pretty clear as to how > this works. It also says that when it goes over, it'll spill to disk. Additionally, we're talking about one hash here, not multiple ones. It seems at least misleading as, if I understand correctly, Postgres isn't actually actively checking to see if the amount of memory used by an in-progress hash creation has gone over the limit but rather it guesses at how much memory will be used during the planning stage to decide if a hash plan is possible or not. That guess can certainly be wrong but there's nothing in place to handle the situation where the guess is wrong... Thanks, Stephen
> On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote: >> > "Mark Woodward" <pgsql@mohawksoft.com> writes: >> >> Again, regardless of OS used, hashagg will exceed "working memory" as >> >> defined in postgresql.conf. >> > >> > So? If you've got OOM kill enabled, it can zap a process whether it's >> > strictly adhered to work_mem or not. The OOM killer is entirely >> capable >> > of choosing a victim process whose memory footprint hasn't changed >> > materially since it started (eg, the postmaster). >> >> Sorry, I must strongly disagree here. The postgresql.conf "working mem" >> is >> a VERY IMPORTANT setting, it is intended to limit the consumption of >> memory by the postgresql process. Often times PostgreSQL will work along > > Actually, no, it's not designed for that at all. I guess that's a matter of opinion. > >> side other application servers on the same system, infact, may be a >> sub-part of application servers on the same system. (This is, in fact, >> how >> it is used on one of my site servers.) >> >> Clearly, if the server will use 1000 times this number (Set for 1024K, >> but >> exceeds 1G) this is broken, and it may cause other systems to fail or >> perform very poorly. >> >> If it is not something that can be fixed, it should be clearly >> documented. > > work_mem (integer) > > Specifies the amount of memory to be used by internal sort > operations and hash tables before switching to temporary disk files. > The value is specified in kilobytes, and defaults to 1024 kilobytes > (1 MB). Note that for a complex query, several sort or hash > operations might be running in parallel; each one will be allowed to > use as much memory as this value specifies before it starts to put > data into temporary files. Also, several running sessions could be > doing such operations concurrently. So the total memory used could > be many times the value of work_mem; it is necessary to keep this > fact in mind when choosing the value. Sort operations are used for > ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash > joins, hash-based aggregation, and hash-based processing of IN > subqueries. > > So it says right there that it's very easy to exceed work_mem by a very > large amount. Granted, this is a very painful problem to deal with and > will hopefully be changed at some point, but it's pretty clear as to how > this works. Well, if you read that paragraph carefully, I'll admit that I was a little too literal in my statement apliying it to the "process" and not specific functions within the process, but in the documentation: "each one will be allowed to use as much memory as this value specifies before it starts to put data into temporary files." According to the documentation the behavior of hashagg is broken. It did not use up to this amount and then start to use temporary files, it used 1000 times this limit and was killed by the OS. I think it should be documented as the behavior is unpredictable.
Stephen Frost <sfrost@snowman.net> writes: > Unless I've missed something here, disabling the OOM killer doesn't > really solve the problem here. Well in a way it does. Postgres would get an out-of-memory error from malloc which it would handle properly and the world would be happy. Except not quite, since I think an out of memory error still means that backend exits instead of just that query failing. That means if you have an application running such as apache then all subsequent transactions on that connection fail too, instead of just the transaction that misbehaved. And as the other poster mentioned, having Postgres use up every available byte of memory isn't really very friendly to anything else running on the box. It doesn't seem like a bad idea to have a max_memory parameter that if a backend ever exceeded it would immediately abort the current transaction. That would let an application continue operating normally after getting an error. -- greg
Greg Stark <gsstark@mit.edu> writes: > Except not quite, since I think an out of memory error still means that > backend exits instead of just that query failing. Not at all! PG will recover from this perfectly well ... if it's given the opportunity, rather than being SIGKILLed. > It doesn't seem like a bad idea to have a max_memory parameter that if a > backend ever exceeded it would immediately abort the current > transaction. See ulimit (or local equivalent). regards, tom lane
Greg Stark wrote: > Well in a way it does. Postgres would get an out-of-memory error from malloc > which it would handle properly and the world would be happy. > > Except not quite, since I think an out of memory error still means that > backend exits instead of just that query failing. Not at all -- the transaction is aborted, but the backend can continue working perfectly fine. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Greg Stark <gsstark@mit.edu> writes: > > It doesn't seem like a bad idea to have a max_memory parameter that if a > > backend ever exceeded it would immediately abort the current > > transaction. > > See ulimit (or local equivalent). As much as setting ulimit in shell scripts is fun, I have to admit that I really don't see it happening very much. Having Postgres set a ulimit for itself may not be a bad idea and would perhaps provide a "least suprise" for new users. Perhaps shared_buffers + 10*work_mem + maintenance_work_mem + max_stack_depth? Then errors from running out of memory could provide a 'HINT: Memory consumption went well over allowed work_mem, perhaps you need to run ANALYZE or raise work_mem?'. Just some thoughts, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > Greg Stark <gsstark@mit.edu> writes: > > > It doesn't seem like a bad idea to have a max_memory parameter that if a > > > backend ever exceeded it would immediately abort the current > > > transaction. > > > > See ulimit (or local equivalent). > > As much as setting ulimit in shell scripts is fun, I have to admit that > I really don't see it happening very much. For one thing it requires admin access to the startup scripts to arrange this. And it's always cluster-wide. Having a GUC parameter would mean it could be set per-session. Even if the GUC parameter were just implemented by calling setrlimit it might be useful. -- greg
> Stephen Frost <sfrost@snowman.net> writes: > >> * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> > Greg Stark <gsstark@mit.edu> writes: >> > > It doesn't seem like a bad idea to have a max_memory parameter that >> if a >> > > backend ever exceeded it would immediately abort the current >> > > transaction. >> > >> > See ulimit (or local equivalent). >> >> As much as setting ulimit in shell scripts is fun, I have to admit that >> I really don't see it happening very much. > > For one thing it requires admin access to the startup scripts to arrange > this. > And it's always cluster-wide. > > Having a GUC parameter would mean it could be set per-session. Even if the > GUC > parameter were just implemented by calling setrlimit it might be useful. > I don't think it needs a new GUC parameter, just having hashagg respect work_mem would fix the problem.
On Thu, Feb 09, 2006 at 02:35:34PM -0500, Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > > Except not quite, since I think an out of memory error still means that > > backend exits instead of just that query failing. > > Not at all! PG will recover from this perfectly well ... if it's given > the opportunity, rather than being SIGKILLed. FWIW, the problem is mainly from the situation where some process accesses a piece of memory that has been swapped out, but there is no memory available to swap the page in. Or write to a page marked copy-on-write. What do you do? There's is no way to return -ENOMEM from a normal memory access and PostgreSQL wouldn't handle that anyway. When people talk about disabling the OOM killer, it doesn't stop the SIGKILL behaviour, it just causes the kernel to return -ENOMEM for malloc() much much earlier... (ie when you still actually have memory available). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On Feb 9, 2006, at 11:22 AM, Stephen Frost wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> "Mark Woodward" <pgsql@mohawksoft.com> writes: >>> Again, regardless of OS used, hashagg will exceed "working >>> memory" as >>> defined in postgresql.conf. >> >> So? If you've got OOM kill enabled, it can zap a process whether >> it's >> strictly adhered to work_mem or not. The OOM killer is entirely >> capable >> of choosing a victim process whose memory footprint hasn't changed >> materially since it started (eg, the postmaster). > > Unless I've missed something here, disabling the OOM killer doesn't > really solve the problem here. What solves the problem is running > ANALYZE but it's still certainly the case that it would make some > sense > for the Postmaster, upon realizing that it's gone well beyond its > work_mem boundary, to ideally switch plans to one which isn't going to > exceed its work_mem limit. Less ideally, it could give up and > issue an > error to the user instead of running the box out of memory. So is the work_mem paramater that is set not strictly enforced? Is it more like a suggestions as to what it SHOULD use and not a hard limit on how much memory the each process is ALLOWED to use? If his work_mem is set to 1 mb and that process is using 500 mb for tasks that are supposed to stay in work_mem then doesn't that mean that that limit is not really a hard limit but rather a suggestion? Rick
Martijn van Oosterhout <kleptog@svana.org> writes: > When people talk about disabling the OOM killer, it doesn't stop the > SIGKILL behaviour, Yes it does, because the situation will never arise. > it just causes the kernel to return -ENOMEM for > malloc() much much earlier... (ie when you still actually have memory > available). Given the current price of disk, there is no sane reason not to have enough swap space configured to make this not-a-problem. The OOM kill mechanism was a reasonable solution for running systems that were not expected to be too reliable anyway on small hardware, but if you're trying to run a 24/7 server you're simply incompetent if you don't disable it. regards, tom lane
> Martijn van Oosterhout <kleptog@svana.org> writes: >> When people talk about disabling the OOM killer, it doesn't stop the >> SIGKILL behaviour, > > Yes it does, because the situation will never arise. > >> it just causes the kernel to return -ENOMEM for >> malloc() much much earlier... (ie when you still actually have memory >> available). > > Given the current price of disk, there is no sane reason not to have > enough swap space configured to make this not-a-problem. The OOM kill > mechanism was a reasonable solution for running systems that were not > expected to be too reliable anyway on small hardware, but if you're > trying to run a 24/7 server you're simply incompetent if you don't > disable it. And people say I have STRONG opinions. Don't hold back Tom, let us know what you really think.
On Friday 10 February 2006 00:53, Mark Woodward wrote: > > Martijn van Oosterhout <kleptog@svana.org> writes: > >> When people talk about disabling the OOM killer, it doesn't stop the > >> SIGKILL behaviour, > > > > Yes it does, because the situation will never arise. > > > >> it just causes the kernel to return -ENOMEM for > >> malloc() much much earlier... (ie when you still actually have memory > >> available). > > > > Given the current price of disk, there is no sane reason not to have > > enough swap space configured to make this not-a-problem. The OOM kill > > mechanism was a reasonable solution for running systems that were not > > expected to be too reliable anyway on small hardware, but if you're > > trying to run a 24/7 server you're simply incompetent if you don't > > disable it. > > And people say I have STRONG opinions. Don't hold back Tom, let us know > what you really think. Read http://linux-mm.org/OverCommitAccounting or file://usr/src/linux/Documentation/vm/overcommit-accounting It is a good idea to have enough swap space. If not, set vm.overcommit_memory=2
On Feb 9, 2006, at 12:49 PM, Mark Woodward wrote: >> On Thu, Feb 09, 2006 at 02:03:41PM -0500, Mark Woodward wrote: >>>> "Mark Woodward" <pgsql@mohawksoft.com> writes: >>>>> Again, regardless of OS used, hashagg will exceed "working >>>>> memory" as >>>>> defined in postgresql.conf. >>>> >>>> So? If you've got OOM kill enabled, it can zap a process >>>> whether it's >>>> strictly adhered to work_mem or not. The OOM killer is entirely >>> capable >>>> of choosing a victim process whose memory footprint hasn't changed >>>> materially since it started (eg, the postmaster). >>> >>> Sorry, I must strongly disagree here. The postgresql.conf >>> "working mem" >>> is >>> a VERY IMPORTANT setting, it is intended to limit the consumption of >>> memory by the postgresql process. Often times PostgreSQL will >>> work along >> >> Actually, no, it's not designed for that at all. > > I guess that's a matter of opinion. > >> >>> side other application servers on the same system, infact, may be a >>> sub-part of application servers on the same system. (This is, in >>> fact, >>> how >>> it is used on one of my site servers.) >>> >>> Clearly, if the server will use 1000 times this number (Set for >>> 1024K, >>> but >>> exceeds 1G) this is broken, and it may cause other systems to >>> fail or >>> perform very poorly. >>> >>> If it is not something that can be fixed, it should be clearly >>> documented. >> >> work_mem (integer) >> >> Specifies the amount of memory to be used by internal sort >> operations and hash tables before switching to temporary disk >> files. >> The value is specified in kilobytes, and defaults to 1024 >> kilobytes >> (1 MB). Note that for a complex query, several sort or hash >> operations might be running in parallel; each one will be >> allowed to >> use as much memory as this value specifies before it starts to >> put >> data into temporary files. Also, several running sessions >> could be >> doing such operations concurrently. So the total memory used >> could >> be many times the value of work_mem; it is necessary to keep this >> fact in mind when choosing the value. Sort operations are used >> for >> ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash >> joins, hash-based aggregation, and hash-based processing of IN >> subqueries. >> >> So it says right there that it's very easy to exceed work_mem by a >> very >> large amount. Granted, this is a very painful problem to deal with >> and >> will hopefully be changed at some point, but it's pretty clear as >> to how >> this works. > > Well, if you read that paragraph carefully, I'll admit that I was a > little > too literal in my statement apliying it to the "process" and not > specific > functions within the process, but in the documentation: > > "each one will be allowed to use as much memory as this value > specifies > before it starts to put data into temporary files." > > According to the documentation the behavior of hashagg is broken. > It did > not use up to this amount and then start to use temporary files, it > used > 1000 times this limit and was killed by the OS. > > I think it should be documented as the behavior is unpredictable. It seems to me that the solution for THIS INCIDENT is to run an analyze. That should fix the problem at hand. I have nothing to say about the OOM issue except that hopefully the analyze will prevent him from running out of memory at all. However if hashagg truly does not obey the limit that is supposed to be imposed by work_mem then it really ought to be documented. Is there a misunderstanding here and it really does obey it? Or is hashagg an exception but the other work_mem associated operations work fine? Or is it possible for them all to go out of bounds? Even if you've got 100 terabyts of swap space though if seems like if your system is very heavy on reads then you would really want that single backend to start using up your disk space and leave your memory alone so that most of your data can stay cached and largely unaffeted by the problem of one backend. If your bottleneck is writing to the disk then it doesn't really seem to matter. You just need to make sure that huge out of control hashagg never occurs. If your disks get saturated with writes because of the hashagg of one backend then all other processes that need to write a lot of info to disk are going to come to a grinding halt and queries are not going to complete quickly and build up and you will have a huge mess on your hands that will essentially prevent postgres from being able to do it's job even if it doesn't actually die. In this situation disk bandwidth is a scarce commodity and whether you let the OS handle it all with virtual memory or you let postgres swap everything out to disc for that one operation you are still using disc to make up for a lack of RAM. At some point you you've either got to stock up on enough RAM to run your queries properly or alter how your queries run to use less RAM. Having a process go out of control in resource usage is going to cause big problems one way or another.
Rick Gigger <rick@alpinenetworking.com> writes: > However if hashagg truly does not obey the limit that is supposed to > be imposed by work_mem then it really ought to be documented. Is > there a misunderstanding here and it really does obey it? Or is > hashagg an exception but the other work_mem associated operations > work fine? Or is it possible for them all to go out of bounds? hashagg is the exception. It should be fixed, not documented, but no one's got round to that. One point to consider is that if the planner's estimate is as far off as exhibited in the OP's example, a hashagg that does spill to disk is likely to take so long that he'll be back here complaining that the query never terminates ;-). In most practical situations, I think exceeding work_mem is really the best solution, as long as it's not by more than 10x or 100x. It's when the estimate is off by many orders of magnitude that you've got a problem. Running out of memory is not necessarily the worst response ... as long as the system doesn't kill the process in response to that. regards, tom lane
> Rick Gigger <rick@alpinenetworking.com> writes: >> However if hashagg truly does not obey the limit that is supposed to >> be imposed by work_mem then it really ought to be documented. Is >> there a misunderstanding here and it really does obey it? Or is >> hashagg an exception but the other work_mem associated operations >> work fine? Or is it possible for them all to go out of bounds? > > hashagg is the exception. It should be fixed, not documented, but no > one's got round to that. Well, it is clearly a pathalogical condition. Fixed? Sure, but someone should document it so that others don't stumble across it. > > One point to consider is that if the planner's estimate is as far off > as exhibited in the OP's example, a hashagg that does spill to disk > is likely to take so long that he'll be back here complaining that > the query never terminates ;-). That's not fair, now is it? This isn't about the OP (me), it is about PostgreSQL behaving badly. > In most practical situations, I think > exceeding work_mem is really the best solution, as long as it's not > by more than 10x or 100x. It's when the estimate is off by many > orders of magnitude that you've got a problem. Running out of memory > is not necessarily the worst response ... as long as the system doesn't > kill the process in response to that. I don't agree with you here. Many PostgreSQL installations use PostgreSQL as part of a larger whole. Adjusting "work_mem" should give the admin some control over the memory footprint of the system. It is documented as the limit a specific function path will use before spilling to disk. I set up a lot of systems and I write a lot of software that uses PostgreSQL. Periodically I run across features/problems/limitations of PostgreSQL and post them. This was/is an example of where the behavior of PostgreSQL is clearly unacceptable. OK, yes, this problem goes away with an ANALYZE, but it isn't clear how anyone could have known this, and unexpected behavior is bad in any product. In your statement, "he'll be back here complaining that the query never terminates," that's not true. A long query typically gets examined with explain (or in Oracle, explain plan) and evaluated from there. When the process exhibits runaway memory use, that's a problem.
On Thu, Feb 09, 2006 at 03:13:22PM -0500, Greg Stark wrote: > Stephen Frost <sfrost@snowman.net> writes: > > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > > Greg Stark <gsstark@mit.edu> writes: > > > > It doesn't seem like a bad idea to have a max_memory parameter that if a > > > > backend ever exceeded it would immediately abort the current > > > > transaction. > > > > > > See ulimit (or local equivalent). > > > > As much as setting ulimit in shell scripts is fun, I have to admit that > > I really don't see it happening very much. > > For one thing it requires admin access to the startup scripts to arrange this. > And it's always cluster-wide. > > Having a GUC parameter would mean it could be set per-session. Even if the GUC > parameter were just implemented by calling setrlimit it might be useful. Trying to tune work_mem is extremely difficult in PostgreSQL, because you are constantly running the risk of sending the server into a swap-storm. Having a set-able per-backend memory limit would allow a lot more flexability in setting work_mem, because you could now ensure that you wouldn't push the server into serious swapping. Even better would be a means to set a cluster-wide memory limit, but of course that's substantially more work. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, Feb 09, 2006 at 05:04:38PM -0500, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > When people talk about disabling the OOM killer, it doesn't stop the > > SIGKILL behaviour, > > Yes it does, because the situation will never arise. > > > it just causes the kernel to return -ENOMEM for > > malloc() much much earlier... (ie when you still actually have memory > > available). > > Given the current price of disk, there is no sane reason not to have > enough swap space configured to make this not-a-problem. The OOM kill > mechanism was a reasonable solution for running systems that were not > expected to be too reliable anyway on small hardware, but if you're > trying to run a 24/7 server you're simply incompetent if you don't > disable it. BTW, I was shocked when I found out that FreeBSD actually has an OOM killer itself. Yet I've never heard of anyone having problems with it. Granted, the FreeBSD OOM could be better designed to pick the right process to kill, but I'd bet that the real reason you never hear about it is because FreeBSD admins are clued enough to a) setup a reasonable amount of swap and b) do a better job of monitoring memory usage so that you don't start swapping in the first place. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Fri, Feb 10, 2006 at 09:57:12AM -0500, Mark Woodward wrote: > > In most practical situations, I think > > exceeding work_mem is really the best solution, as long as it's not > > by more than 10x or 100x. It's when the estimate is off by many > > orders of magnitude that you've got a problem. Running out of memory > > is not necessarily the worst response ... as long as the system doesn't > > kill the process in response to that. > > I don't agree with you here. Many PostgreSQL installations use PostgreSQL > as part of a larger whole. Adjusting "work_mem" should give the admin some > control over the memory footprint of the system. It is documented as the > limit a specific function path will use before spilling to disk. And even when PostgreSQL has the server all to itself, having a hashagg spill to disk is *way* better than pushing the machine into a swap storm. At least if you spill the hashagg you only have one backend running at a snail's pace; a swap storm means next to nothing gets done. > This was/is an example of where the behavior of PostgreSQL is clearly > unacceptable. OK, yes, this problem goes away with an ANALYZE, but it > isn't clear how anyone could have known this, and unexpected behavior is > bad in any product. Care to submit a documentation patch before releases are bundled (I think on Sunday?) At least then people would be aware that work_mem is just a suggestion to hash_aggs. I'd do a patch myself but I doubt I'll have time before the release. :( -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Fri, Feb 10, 2006 at 10:01:18AM -0600, Jim C. Nasby wrote: > BTW, I was shocked when I found out that FreeBSD actually has an OOM > killer itself. Yet I've never heard of anyone having problems with it. > Granted, the FreeBSD OOM could be better designed to pick the right > process to kill, but I'd bet that the real reason you never hear about > it is because FreeBSD admins are clued enough to a) setup a reasonable > amount of swap and b) do a better job of monitoring memory usage so that > you don't start swapping in the first place. Hmm, I do wonder what FreeBSDs overcommit policy is. For example on my computer right now the total allocated VM is approximately 3 times the actual memory in the computer and about twice if you include swap. By a strict policy of overcommit my computer wouldn't complete the boot sequence, whereas as currently it runs without using any swap. Disabling overcommit has a serious cost in that most of your VM will never be used. Are people really suggesting that I can't run a few daemons, X and a web-browser on FreeBSD without allocating 3 times my physical memory in swap? However, my real question is: while trying to find info about FreeBSDs overcommit policy, I just get lot of people complaining about freebsd killing random processes. Does anyone know a site that describes how it works? I understand Linux's overcommit policy just fine. Disclaimer: The Linux OOM killer has never killed the wrong process for me, so I don't have any bad experiences with overcommit. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
> On Fri, Feb 10, 2006 at 09:57:12AM -0500, Mark Woodward wrote: >> > In most practical situations, I think >> > exceeding work_mem is really the best solution, as long as it's not >> > by more than 10x or 100x. It's when the estimate is off by many >> > orders of magnitude that you've got a problem. Running out of memory >> > is not necessarily the worst response ... as long as the system >> doesn't >> > kill the process in response to that. >> >> I don't agree with you here. Many PostgreSQL installations use >> PostgreSQL >> as part of a larger whole. Adjusting "work_mem" should give the admin >> some >> control over the memory footprint of the system. It is documented as the >> limit a specific function path will use before spilling to disk. > > And even when PostgreSQL has the server all to itself, having a hashagg > spill to disk is *way* better than pushing the machine into a swap > storm. At least if you spill the hashagg you only have one backend > running at a snail's pace; a swap storm means next to nothing gets done. > >> This was/is an example of where the behavior of PostgreSQL is clearly >> unacceptable. OK, yes, this problem goes away with an ANALYZE, but it >> isn't clear how anyone could have known this, and unexpected behavior is >> bad in any product. > > Care to submit a documentation patch before releases are bundled (I > think on Sunday?) At least then people would be aware that work_mem is > just a suggestion to hash_aggs. I'd do a patch myself but I doubt I'll > have time before the release. :( I would be glad too. What's the process?
Martijn van Oosterhout <kleptog@svana.org> writes: > Disclaimer: The Linux OOM killer has never killed the wrong process for > me, so I don't have any bad experiences with overcommit. You haven't tried real hard. What I've seen recently when I do something that makes a PG backend go overboard is that the kernel zaps both the misbehaving backend and the bgwriter process. No idea what it's got against the bgwriter, but the behavior's been pretty consistent under recent Fedora 4 kernels ... (This is on a development machine, not a server, so I'm not particularly worried by leaving the default overcommit policy in place. I wouldn't do that on a server --- but it's not worth my time to change it on a devel machine.) regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > Disabling overcommit has a serious cost in that most of your VM will > never be used. Are people really suggesting that I can't run a few > daemons, X and a web-browser on FreeBSD without allocating 3 times my > physical memory in swap? There's a possibility you're misreading your process info there. The X server maps large areas of graphics memory in its address space which shows up as virtual memory used in ps on some OSes. I'm not sure if BSD is included there. That's not memory that can ever be swapped out and it doesn't take up any memory from the non-video memory. That said, it's true that some applications allocate much more memory than needed. There's a bit of a feedback loop here. Because application writers know that OSes overcommit they don't worry about avoiding unnecessary allocations. Sun's original Java memory management system just started off with allocating 20MB before it ran a single bytecode for example. That's why merely allocating tons of swap doesn't necessarily protect you. It's still possible for a process (or several processes if you allocate more swap than you have address space) to mmap gigabytes of memory without touching it and then start touching those pages. Hopefully the OOM killer targets the offender but there's no real way for it to guarantee it. -- greg
Greg Stark <gsstark@mit.edu> writes: > That's why merely allocating tons of swap doesn't necessarily protect you. > It's still possible for a process (or several processes if you allocate more > swap than you have address space) to mmap gigabytes of memory without touching > it and then start touching those pages. So? If the swap exists to back that memory, there's no problem. It might be slow, but it will not fail. regards, tom lane
On Fri, Feb 10, 2006 at 12:16:04PM -0500, Mark Woodward wrote: > > And even when PostgreSQL has the server all to itself, having a hashagg > > spill to disk is *way* better than pushing the machine into a swap > > storm. At least if you spill the hashagg you only have one backend > > running at a snail's pace; a swap storm means next to nothing gets done. > > > >> This was/is an example of where the behavior of PostgreSQL is clearly > >> unacceptable. OK, yes, this problem goes away with an ANALYZE, but it > >> isn't clear how anyone could have known this, and unexpected behavior is > >> bad in any product. > > > > Care to submit a documentation patch before releases are bundled (I > > think on Sunday?) At least then people would be aware that work_mem is > > just a suggestion to hash_aggs. I'd do a patch myself but I doubt I'll > > have time before the release. :( > > I would be glad too. What's the process? Well, find the appropriate file in doc/src/sgml, make a copy, edit the file, generate a diff with diff -u, and email that diff/patch to pgsql-patches. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > That's why merely allocating tons of swap doesn't necessarily protect you. > > It's still possible for a process (or several processes if you allocate more > > swap than you have address space) to mmap gigabytes of memory without touching > > it and then start touching those pages. > > So? If the swap exists to back that memory, there's no problem. It > might be slow, but it will not fail. Sure, but there's no way to know how much swap you need. No matter how much swap you allocate these processes can allocate more pages of untouched RAM and then blow up. Of course realistically allocating 4G of swap is enough to deal with something like Postgres where you're not being maliciously attacked. One process on ia32 can't accidentally allocate more than 4G of ram. I was just trying to clarify the situation since someone made some comment about it having to do with memory being swapped out and then finding nowhere to swap in when needed. That's not exactly what's happening. -- greg
Greg Stark wrote: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > Greg Stark <gsstark@mit.edu> writes: > > > That's why merely allocating tons of swap doesn't necessarily protect you. > > > It's still possible for a process (or several processes if you allocate more > > > swap than you have address space) to mmap gigabytes of memory without touching > > > it and then start touching those pages. > > > > So? If the swap exists to back that memory, there's no problem. It > > might be slow, but it will not fail. > > Sure, but there's no way to know how much swap you need. No matter how much > swap you allocate these processes can allocate more pages of untouched RAM and > then blow up. > > Of course realistically allocating 4G of swap is enough to deal with something > like Postgres where you're not being maliciously attacked. One process on ia32 > can't accidentally allocate more than 4G of ram. > > I was just trying to clarify the situation since someone made some comment > about it having to do with memory being swapped out and then finding nowhere > to swap in when needed. That's not exactly what's happening. I guess the fundamental issue is whether Linux requires all mmap()'ed file contents to be in memory, or whether it pushes data to disk and unmaps it as it runs low on memory. I don't know the answer to that. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Greg Stark <gsstark@mit.edu> writes: > I was just trying to clarify the situation since someone made some comment > about it having to do with memory being swapped out and then finding nowhere > to swap in when needed. That's not exactly what's happening. No. I believe the case that is actually hard for the kernel to predict comes from copy-on-write: when a process forks, you could potentially need twice its current memory image, but in reality you probably won't ever need that much since many of the shared pages won't ever be written by either process. However, a non-overcommitting kernel must assume that worst case, and hence fail the fork() if it doesn't have enough swap space to cover both processes. If it does not, then the crunch comes when one process does touch a shared page. If there is no available swap space at that time, kill -9 is the only recourse, because there is no way in the Unix API to fail a write to valid memory. The reason for having a lot more swap space than you really need is just to cover the potential demand from copy-on-write of pages that are currently shared. But given the price of disk these days, it's pretty cheap insurance. regards, tom lane