Thread: Pre-allocation of shared memory ...

Pre-allocation of shared memory ...

From
Hans-Jürgen Schönig
Date:
There is a problem which occurs from time to time and which is a bit 
nasty in business environments.
When the shared memory is eaten up by some application such as Apache 
PostgreSQL will refuse to do what it should do because there is no 
memory around. To many people this looks like a problem relatd to 
stability. Also, it influences availability of the database itself.

I was thinking of a solution which might help to get around this problem:
If we had a flag to tell PostgreSQL that XXX Megs of shared memory 
should be preallocated by PostgreSQL. The database would the sure that 
there is always enough memory around. The problem is that PostgreSQL had 
to care more about memory consumption.

Of course, the best solution is to put PostgreSQL on a separate machine 
but many people don't do it so we have to live with memory leaks caused 
by other software (we have just seen a nasty one in mod_perl).

Does it make sense?
Regards,
    Hans


-- 
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706; +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at




Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
We already pre-allocate all shared memory and resources on postmaster
start.

---------------------------------------------------------------------------

Hans-J�rgen Sch�nig wrote:
> There is a problem which occurs from time to time and which is a bit 
> nasty in business environments.
> When the shared memory is eaten up by some application such as Apache 
> PostgreSQL will refuse to do what it should do because there is no 
> memory around. To many people this looks like a problem relatd to 
> stability. Also, it influences availability of the database itself.
> 
> I was thinking of a solution which might help to get around this problem:
> If we had a flag to tell PostgreSQL that XXX Megs of shared memory 
> should be preallocated by PostgreSQL. The database would the sure that 
> there is always enough memory around. The problem is that PostgreSQL had 
> to care more about memory consumption.
> 
> Of course, the best solution is to put PostgreSQL on a separate machine 
> but many people don't do it so we have to live with memory leaks caused 
> by other software (we have just seen a nasty one in mod_perl).
> 
> Does it make sense?
> 
>     Regards,
> 
>         Hans
> 
> 
> -- 
> Cybertec Geschwinde u Schoenig
> Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
> Tel: +43/2952/30706; +43/664/233 90 75
> www.cybertec.at, www.postgresql.at, kernel.cybertec.at
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/docs/faqs/FAQ.html
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
Hans-Jürgen Schönig
Date:
Bruce Momjian wrote:
> We already pre-allocate all shared memory and resources on postmaster
> start.


I guess we allocate memory when a backend starts, don't we?
Or do we allocate when the instance starts?

I have two explanations for the following behaviour:

a. a bug
b. not enough shared memory


WARNING:  Message from PostgreSQL backend:The Postmaster has informed me that some other backenddied abnormally and
possiblycorrupted shared memory.I have rolled back the current transaction and amgoing to terminate your database
systemconnection and exit.Please reconnect to the database system and repeat your query.
 
server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing
therequest.
 
connection to server was lost

The problem is that this only happens with mod_perl and Apache on the 
same machine so I thought it has to do with a known memory leak in 
mod_perl/Apache. I happens after about two weeks (it seems to occur 
regularily).

> Are you suggesting pre-acquiring resources like oracle does? Like you start a 
> database instance, 350MB memory is gone types?
> 
> One thing I love about postgresql is that it does not do any such silly thing. 
> I agree in the case you suggest, it makes sense.
> 
> If at all postgresql goes that way, I would like to see it configurable. I 
> would rather remove an app. from a machine rather than letting it stamp on 
> other apps feet.


Shridhar. Yes, when preallocating some memory it has to be configurable 
(default = off).







-- 
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706; +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at




Re: Pre-allocation of shared memory ...

From
Tom Lane
Date:
Hans-Jürgen Schönig <hs@cybertec.at> writes:
> I have two explanations for the following behaviour:
> a. a bug
> b. not enough shared memory

> WARNING:  Message from PostgreSQL backend:
>     The Postmaster has informed me that some other backend
>     died abnormally and possibly corrupted shared memory.

Is this a Linux machine?  If so, the true explanation is probably (c):
the kernel is kill 9'ing randomly-chosen database processes whenever
it starts to feel low on memory.  I would suggest checking the
postmaster log to determine the signal number the failed backends are
dying with.  The client-side message does not give nearly enough info
to debug such problems.

There is also possibility (d): you have some bad RAM that is located in
an address range that doesn't get used until the machine is under full
load.  But if the backends are dying with signal 9 then I'll take the
kernel-kill theory.

AFAIK the only good way around this problem is to use another OS with a
more rational design for handling low-memory situations.  No other Unix
does anything remotely as brain-dead as what Linux does.  Or bug your
favorite Linux kernel hacker to fix the kernel.
        regards, tom lane


Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
Tom Lane wrote:
> Hans-Jürgen Schönig <hs@cybertec.at> writes:
> > I have two explanations for the following behaviour:
> > a. a bug
> > b. not enough shared memory
> 
> > WARNING:  Message from PostgreSQL backend:
> >     The Postmaster has informed me that some other backend
> >     died abnormally and possibly corrupted shared memory.
> 
> Is this a Linux machine?  If so, the true explanation is probably (c):
> the kernel is kill 9'ing randomly-chosen database processes whenever
> it starts to feel low on memory.  I would suggest checking the
> postmaster log to determine the signal number the failed backends are
> dying with.  The client-side message does not give nearly enough info
> to debug such problems.
> 
> There is also possibility (d): you have some bad RAM that is located in
> an address range that doesn't get used until the machine is under full
> load.  But if the backends are dying with signal 9 then I'll take the
> kernel-kill theory.
> 
> AFAIK the only good way around this problem is to use another OS with a
> more rational design for handling low-memory situations.  No other Unix
> does anything remotely as brain-dead as what Linux does.  Or bug your
> favorite Linux kernel hacker to fix the kernel.

Is there no sysctl way to disable such kills?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
Doug McNaught
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> Tom Lane wrote:
> > AFAIK the only good way around this problem is to use another OS with a
> > more rational design for handling low-memory situations.  No other Unix
> > does anything remotely as brain-dead as what Linux does.  Or bug your
> > favorite Linux kernel hacker to fix the kernel.
> 
> Is there no sysctl way to disable such kills?

The -ac kernel patches from Alan Cox have a sysctl to control memory
overcommit--you can set it to track memory usage and fail allocations
when memory runs out, rather than the random kill behavior.  I'm not
sure whether those have made it into the stock kernel yet, but the
vendor kernels (such as Red Hat's) might have it too.

-Doug


Re: Pre-allocation of shared memory ...

From
Alvaro Herrera
Date:
On Wed, Jun 11, 2003 at 07:35:20PM -0400, Doug McNaught wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> 
> > Is there no sysctl way to disable such kills?
> 
> The -ac kernel patches from Alan Cox have a sysctl to control memory
> overcommit--you can set it to track memory usage and fail allocations
> when memory runs out, rather than the random kill behavior.  I'm not
> sure whether those have made it into the stock kernel yet, but the
> vendor kernels (such as Red Hat's) might have it too.

Yeah, I see it in the Mandrake kernel.  But it's not in stock 2.4.19, so
you can't assume everybody has it.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"¿Qué importan los años?  Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo"  (Mafalda)


Re: Pre-allocation of shared memory ...

From
Hans-Jürgen Schönig
Date:
 > Yeah, I see it in the Mandrake kernel.  But it's not in stock 2.4.19, so> you can't assume everybody has it.>

We had this problem on a recent version of good old Slackware.
I think we also had it on RedHat 8 or so.

Doing this kind of killing is definitely a bad habit. I thought it had 
it had to do with something else so my proposal for pre-allocation seems 
to be pretty obsolete ;).

Thanks a lot.
Hans


-- 
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706; +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at




Re: Pre-allocation of shared memory ...

From
"Andrew Dunstan"
Date:
On this machine (RH9, kernel 2.4.20-18.9) the docs say (in
/usr/src/linux-2.4/Documentation/vm/overcommit-accounting ):

-----------------
The Linux kernel supports four overcommit handling modes

0       -       Heuristic overcommit handling. Obvious overcommits of               address space are refused. Used for
atypical system. It               ensures a seriously wild allocation fails while allowing               overcommit to
reduceswap usage
 

1       -       No overcommit handling. Appropriate for some scientific               applications

2       -       (NEW) strict overcommit. The total address space commit               for the system is not permitted
toexceed swap + half ram.               In almost all situations this means a process will not be               killed
whileaccessing pages but only by malloc failures               that are reported back by the kernel mmap/brk code.
 

3       -       (NEW) paranoid overcommit The total address space commit               for the system is not permitted
toexceed swap. The machine               will never kill a process accessing pages it has mapped               except
dueto a bug (ie report it!)
 
----------------------

So maybe
 sysctl -w vm.overcommit_memory=3

is what's needed? I guess you might pay a performance hit for doing that,
though.

andrew

> > Yeah, I see it in the Mandrake kernel.  But it's not in stock 2.4.19,
> > so you can't assume everybody has it.
> >
>
> We had this problem on a recent version of good old Slackware.
> I think we also had it on RedHat 8 or so.
>
> Doing this kind of killing is definitely a bad habit. I thought it had
> it had to do with something else so my proposal for pre-allocation
> seems  to be pretty obsolete ;).
>
> Thanks a lot.
>
>     Hans





Re: Pre-allocation of shared memory ...

From
Jon Lapham
Date:
Tom Lane wrote:
> Is this a Linux machine?  If so, the true explanation is probably (c):
> the kernel is kill 9'ing randomly-chosen database processes whenever
> it starts to feel low on memory.  I would suggest checking the
> postmaster log to determine the signal number the failed backends are
> dying with.  The client-side message does not give nearly enough info
> to debug such problems.
> 
> AFAIK the only good way around this problem is to use another OS with a
> more rational design for handling low-memory situations.  No other Unix
> does anything remotely as brain-dead as what Linux does.  Or bug your
> favorite Linux kernel hacker to fix the kernel.

Tom-

Just curious.  What would a rationally designed OS do in an out of 
memory situation?

It seems like from the discussions I've read about the subject there 
really is no rational solution to this irrational problem.

Some solutions such as "suspend process, write image to file" and 
"increase swap space" assume available disk space, which is obviously 
not guaranteed to be avaliable.

-- 
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*--- Jon Lapham  <lapham@extracta.com.br>
Riode Janeiro, Brasil Work: Extracta Moléculas Naturais SA     http://www.extracta.com.br/ Web: http://www.jandr.org/
 
***-*--*----*-------*------------*--------------------*---------------




Re: Pre-allocation of shared memory ...

From
Tom Lane
Date:
Jon Lapham <lapham@extracta.com.br> writes:
> Just curious.  What would a rationally designed OS do in an out of 
> memory situation?

Fail malloc() requests.

The sysctl docs that Andrew Dunstan just provided give some insight into
the problem: the default behavior of Linux is to promise more virtual
memory than it can actually deliver.  That is, it allows malloc to
succeed even when it's not going to be able to actually provide the
address space when push comes to shove.  When called to stand and
deliver, the kernel has no way to report failure (other than perhaps a
software-induced SIGSEGV, which would hardly be an improvement).  So it
kills the process instead.  Unfortunately, the process that happens to
be in the line of fire at this point could be any process, not only the
one that made unreasonable memory demands.

This is perhaps an okay behavior for desktop systems being run by
people who are accustomed to Microsoft-like reliability.  But to make it
the default is brain-dead, and to make it the only available behavior
(as seems to have been true until very recently) defies belief.  The
setting now called "paranoid overcommit" is IMHO the *only* acceptable
one for any sort of server system.  With anything else, you risk having
critical userspace daemons killed through no fault of their own.
        regards, tom lane


Re: Pre-allocation of shared memory ...

From
Jon Lapham
Date:
Tom Lane wrote:> [snip]
> The
> setting now called "paranoid overcommit" is IMHO the *only* acceptable
> one for any sort of server system.  With anything else, you risk having
> critical userspace daemons killed through no fault of their own.

Wow.  Thanks for the info.  I found the documentation you are referring 
to in Documentation/vm/overcommit-accounting (on a stock RH9 machine).

It seems that the overcommit policy is set via the sysctl 
`vm.overcommit_memory'.  So...

[root@bilbo src]# sysctl -a | grep -i overcommit
vm.overcommit_memory = 0

...the default seems to be "Heuristic overcommit handling".  It seems 
that what we want is "vm.overcommit_memory = 3" for paranoid overcommit.

Thanks for getting to the bottom of this Tom.  It *is* insane that the 
default isn't "paranoid overcommit".

-- 
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*--- Jon Lapham  <lapham@extracta.com.br>
Riode Janeiro, Brasil Work: Extracta Moléculas Naturais SA     http://www.extracta.com.br/ Web: http://www.jandr.org/
 
***-*--*----*-------*------------*--------------------*---------------




Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
What really kills [:-)] me is that they allocate memory assuming I will
not be using it all, then terminate the executable in an unrecoverable
way when I go to use the memory.

And, they make a judgement on users who don't want this by calling them
"paranoid".

I will add something to the docs about this.

---------------------------------------------------------------------------

Tom Lane wrote:
> Jon Lapham <lapham@extracta.com.br> writes:
> > Just curious.  What would a rationally designed OS do in an out of 
> > memory situation?
> 
> Fail malloc() requests.
> 
> The sysctl docs that Andrew Dunstan just provided give some insight into
> the problem: the default behavior of Linux is to promise more virtual
> memory than it can actually deliver.  That is, it allows malloc to
> succeed even when it's not going to be able to actually provide the
> address space when push comes to shove.  When called to stand and
> deliver, the kernel has no way to report failure (other than perhaps a
> software-induced SIGSEGV, which would hardly be an improvement).  So it
> kills the process instead.  Unfortunately, the process that happens to
> be in the line of fire at this point could be any process, not only the
> one that made unreasonable memory demands.
> 
> This is perhaps an okay behavior for desktop systems being run by
> people who are accustomed to Microsoft-like reliability.  But to make it
> the default is brain-dead, and to make it the only available behavior
> (as seems to have been true until very recently) defies belief.  The
> setting now called "paranoid overcommit" is IMHO the *only* acceptable
> one for any sort of server system.  With anything else, you risk having
> critical userspace daemons killed through no fault of their own.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> What really kills [:-)] me is that they allocate memory assuming I will
> not be using it all, then terminate the executable in an unrecoverable
> way when I go to use the memory.

To be fair, I'm probably misstating things by referring to malloc().
The big problem probably comes from fork() with copy-on-write --- the
kernel has no good way to estimate how much of the shared address space
will eventually become private modified copies, but it can be forgiven
for wanting to make less than the worst-case assumption.

Still, if you are wanting to run a reliable server, I think worst-case
assumption is exactly what you want.  Swap space is cheap, and there's
no reason you shouldn't have enough swap to support the worst-case
situation.  If the swap area goes largely unused, that's fine.

The policy they're calling "paranoid overcommit" (don't allocate more
virtual memory than you have swap) is as far as I know the standard on
all Unixen other than Linux; certainly it's the traditional behavior.
        regards, tom lane


Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
OK, doc patch attached and applied.  Improvements?

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > What really kills [:-)] me is that they allocate memory assuming I will
> > not be using it all, then terminate the executable in an unrecoverable
> > way when I go to use the memory.
>
> To be fair, I'm probably misstating things by referring to malloc().
> The big problem probably comes from fork() with copy-on-write --- the
> kernel has no good way to estimate how much of the shared address space
> will eventually become private modified copies, but it can be forgiven
> for wanting to make less than the worst-case assumption.
>
> Still, if you are wanting to run a reliable server, I think worst-case
> assumption is exactly what you want.  Swap space is cheap, and there's
> no reason you shouldn't have enough swap to support the worst-case
> situation.  If the swap area goes largely unused, that's fine.
>
> The policy they're calling "paranoid overcommit" (don't allocate more
> virtual memory than you have swap) is as far as I know the standard on
> all Unixen other than Linux; certainly it's the traditional behavior.
>
>             regards, tom lane
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/runtime.sgml
===================================================================
RCS file: /cvsroot/pgsql-server/doc/src/sgml/runtime.sgml,v
retrieving revision 1.184
diff -c -c -r1.184 runtime.sgml
*** doc/src/sgml/runtime.sgml    11 Jun 2003 22:13:21 -0000    1.184
--- doc/src/sgml/runtime.sgml    12 Jun 2003 15:29:45 -0000
***************
*** 2780,2785 ****
--- 2780,2795 ----
          <filename>/usr/src/linux/include/asm-<replaceable>xxx</>/shmpara
          m.h</> and <filename>/usr/src/linux/include/linux/sem.h</>.
         </para>
+
+        <para>
+         Linux has poor default memory overcommit behavior.  Rather than
+         failing if it can not reserve enough memory, it returns success,
+         but later fails when the memory can't be mapped and terminates
+         the application.  To prevent unpredictable process termination, use:
+ <programlisting>
+ sysctl -w vm.overcommit_memory=3
+ </programlisting>
+        </para>
        </listitem>
       </varlistentry>


Re: Pre-allocation of shared memory ...

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> OK, doc patch attached and applied.  Improvements?

I think it would be worth spending another sentence to tell people
exactly what the symptom looks like, ie, backends dying with signal 9.
        regards, tom lane


Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
I have added the following sentence to the docs too:
       Note, you will need enough swap space to cover all your memoryneeds.

I still wish Linux would just fail the fork/malloc when memory is low,
rather than requiring swap for everything _or_ overcommitting.  I wonder
if making a unified buffer cache just made that too hard to do.

---------------------------------------------------------------------------

Andrew Dunstan wrote:
> 
> On this machine (RH9, kernel 2.4.20-18.9) the docs say (in
> /usr/src/linux-2.4/Documentation/vm/overcommit-accounting ):
> 
> -----------------
> The Linux kernel supports four overcommit handling modes
> 
> 0       -       Heuristic overcommit handling. Obvious overcommits of
>                 address space are refused. Used for a typical system. It
>                 ensures a seriously wild allocation fails while allowing
>                 overcommit to reduce swap usage
> 
> 1       -       No overcommit handling. Appropriate for some scientific
>                 applications
> 
> 2       -       (NEW) strict overcommit. The total address space commit
>                 for the system is not permitted to exceed swap + half ram.
>                 In almost all situations this means a process will not be
>                 killed while accessing pages but only by malloc failures
>                 that are reported back by the kernel mmap/brk code.
> 
> 3       -       (NEW) paranoid overcommit The total address space commit
>                 for the system is not permitted to exceed swap. The machine
>                 will never kill a process accessing pages it has mapped
>                 except due to a bug (ie report it!)
> ----------------------
> 
> So maybe
> 
>   sysctl -w vm.overcommit_memory=3
> 
> is what's needed? I guess you might pay a performance hit for doing that,
> though.
> 
> andrew
> 
> > > Yeah, I see it in the Mandrake kernel.  But it's not in stock 2.4.19,
> > > so you can't assume everybody has it.
> > >
> >
> > We had this problem on a recent version of good old Slackware.
> > I think we also had it on RedHat 8 or so.
> >
> > Doing this kind of killing is definitely a bad habit. I thought it had
> > it had to do with something else so my proposal for pre-allocation
> > seems  to be pretty obsolete ;).
> >
> > Thanks a lot.
> >
> >     Hans
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
OK, new text is:
      <para>       Linux has poor default memory overcommit behavior.  Rather than       failing if it can not reserve
enoughmemory, it returns success,       but later fails when the memory can't be mapped and terminates       the
applicationwith <literal>kill -9</>.  To prevent unpredictable       process termination, use:
 
<programlisting>
sysctl -w vm.overcommit_memory=3
</programlisting>       Note, you will need enough swap space to cover all your memory needs.      </para>
</listitem>   </varlistentry>
 

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > OK, doc patch attached and applied.  Improvements?
> 
> I think it would be worth spending another sentence to tell people
> exactly what the symptom looks like, ie, backends dying with signal 9.
> 
>             regards, tom lane
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
"Andrew Dunstan"
Date:
A couple of points:

. It is probably a good idea to put do this via /etc/sysctl.conf, which will
be called earlyish by init scripts (on RH9 it is in the network startup
file, for some reason).

. The setting is not available on all kernel versions AFAIK. The admin needs
to check the docs. I have no idea when this went into the kernel, and no
time to spend finding out. Even if we knew, it might have gone into vendor
kernels at other odd times  - there are often times when the vendors are in
advance of the officially released kernels.

Andrew


Bruce wrote:
>
> OK, new text is:
>
>       <para>
>        Linux has poor default memory overcommit behavior.  Rather than
>        failing if it can not reserve enough memory, it returns success,
>        but later fails when the memory can't be mapped and terminates
>        the application with <literal>kill -9</>.  To prevent
>        unpredictable process termination, use:
> <programlisting>
> sysctl -w vm.overcommit_memory=3
> </programlisting>
>        Note, you will need enough swap space to cover all your memory
>        needs.
>       </para>
>      </listitem>
>     </varlistentry>
>
> ---------------------------------------------------------------------------
>
> Tom Lane wrote:
>> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> > OK, doc patch attached and applied.  Improvements?
>>
>> I think it would be worth spending another sentence to tell people
>> exactly what the symptom looks like, ie, backends dying with signal 9.
>>
>>             regards, tom lane
>>
>
> --
>  Bruce Momjian                        |  http://candle.pha.pa.us
>  pgman@candle.pha.pa.us               |  (610) 359-1001
>  +  If your life is a hard drive,     |  13 Roberts Road
>  +  Christ can be your backup.        |  Newtown Square, Pennsylvania
>  19073
>
> ---------------------------(end of
> broadcast)--------------------------- TIP 2: you can get off all lists
> at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)





Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
Well, let's see what feedback we get.

---------------------------------------------------------------------------

Andrew Dunstan wrote:
> 
> A couple of points:
> 
> . It is probably a good idea to put do this via /etc/sysctl.conf, which will
> be called earlyish by init scripts (on RH9 it is in the network startup
> file, for some reason).
> 
> . The setting is not available on all kernel versions AFAIK. The admin needs
> to check the docs. I have no idea when this went into the kernel, and no
> time to spend finding out. Even if we knew, it might have gone into vendor
> kernels at other odd times  - there are often times when the vendors are in
> advance of the officially released kernels.
> 
> Andrew
> 
> 
> Bruce wrote:
> >
> > OK, new text is:
> >
> >       <para>
> >        Linux has poor default memory overcommit behavior.  Rather than
> >        failing if it can not reserve enough memory, it returns success,
> >        but later fails when the memory can't be mapped and terminates
> >        the application with <literal>kill -9</>.  To prevent
> >        unpredictable process termination, use:
> > <programlisting>
> > sysctl -w vm.overcommit_memory=3
> > </programlisting>
> >        Note, you will need enough swap space to cover all your memory
> >        needs.
> >       </para>
> >      </listitem>
> >     </varlistentry>
> >
> > ---------------------------------------------------------------------------
> >
> > Tom Lane wrote:
> >> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> > OK, doc patch attached and applied.  Improvements?
> >>
> >> I think it would be worth spending another sentence to tell people
> >> exactly what the symptom looks like, ie, backends dying with signal 9.
> >>
> >>             regards, tom lane
> >>
> >
> > --
> >  Bruce Momjian                        |  http://candle.pha.pa.us
> >  pgman@candle.pha.pa.us               |  (610) 359-1001
> >  +  If your life is a hard drive,     |  13 Roberts Road
> >  +  Christ can be your backup.        |  Newtown Square, Pennsylvania
> >  19073
> >
> > ---------------------------(end of
> > broadcast)--------------------------- TIP 2: you can get off all lists
> > at once with the unregister command
> >    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> The policy they're calling "paranoid overcommit" (don't allocate more
> virtual memory than you have swap) is as far as I know the standard on
> all Unixen other than Linux; certainly it's the traditional behavior.

Uhm, it's traditional for Unixen without extensive shared memory usage like
SunOS 4. But it's not nearly as standard as you say. 

In fact Linux wasn't the first major Unix to behave this way at all. As far as
I know, that honour belongs to AIX. Not coincidentally, one of the first
Unixen to have shared libraries. Hence the AIX invention of SIGDANGER which
told a process its death was imminent.

On AIX the heuristic was to kill the largest process in order to clear up the
most memory -- which had a nasty habit of picking the X server to kill, which
of course, well, it cleared up lots of memory... I think they "fixed" that by
changing the heuristic to kill the *second* biggest process.

I think you'll find this overcommit issue affects many if not most Unixen.
There's a bit of a vicious circle here, a lot of software now have the habit
of starting off by mallocing huge chunks of memory that they never need
because "well the machine has virtual memory so it doesn't cost anything".

-- 
greg



Re: Pre-allocation of shared memory ...

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> I think you'll find this overcommit issue affects many if not most Unixen.

I'm unconvinced, because I've only ever heard of the problem affecting
Postgres on Linux.
        regards, tom lane


Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > I think you'll find this overcommit issue affects many if not most Unixen.
> 
> I'm unconvinced, because I've only ever heard of the problem affecting
> Postgres on Linux.

What I don't understand is why they just don't start failing on
fork/malloc rather than killing things.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
"Jeroen T. Vermeulen"
Date:
On Thu, Jun 12, 2003 at 08:08:28PM -0400, Bruce Momjian wrote:
> > 
> > I'm unconvinced, because I've only ever heard of the problem affecting
> > Postgres on Linux.
> 
> What I don't understand is why they just don't start failing on
> fork/malloc rather than killing things.

I may be way off the mark here, falling into the middle of this as I am,
but it may be because the kernel overcommits the memory (which is sort of
logical in a way given the way fork() works).  That may mean that malloc()
thinks it gets more memory and returns a pointer, but the kernel hasn't
actually committed that address space yet and waits to see if it's ever
going to be needed.

Given the right allocation proportions, this may mean that in the end the
kernel has no way to handle a shortage gracefully by causing fork() or
allocations to fail.  I would assume it then goes through its alternatives
like scaling back its file cache--which it'd probably start to do before
a lot of swapping was needed, so not much to scrape out of that barrel.

After that, where do you go?  Try to find a reasonable process to shoot
in the head.  From what I heard, although I haven't kept current, a lot
of work went into selecting a "reasonable" process, so there will be some
determinism.  And if you have occasion to find out in the first place,
"some determinism" usually means "suspiciously bad luck."


Jeroen



Re: Pre-allocation of shared memory ...

From
"Andrew Dunstan"
Date:
I'm not saying you're wrong, but I also think it's true that typical Linux
usage patterns are rather different from those of other *nixen. Linux
started out being able to do a lot with a little, and is still often used
that way - with more functions crammed into boxes with less resources. When
I last worked in a data centre (a few years ago now, for one of the world's
largest companies) they had hundreds of AIX and HP-UX boxes, each well
resourced and each dedicated to exactly one function. I rarely see Linux
being used that way, and I often see it configured with lowish memory and
not nearly enough swap.

In any case, it seems to me we need to have someone check that setting the
vm.overcommit_memory to paranoid will actually stop the postmaster being
killed. I'd love to help but I'm up to my ears in stuff right now. If we
know that we can save the philosophical stuff for another day :-)

cheers

andrew

----- Original Message ----- 
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Greg Stark" <gsstark@mit.edu>
Cc: <pgsql-hackers@postgresql.org>
Sent: Thursday, June 12, 2003 6:19 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...


> Greg Stark <gsstark@mit.edu> writes:
> > I think you'll find this overcommit issue affects many if not most
Unixen.
>
> I'm unconvinced, because I've only ever heard of the problem affecting
> Postgres on Linux.
>
> regards, tom lane



Re: Pre-allocation of shared memory ...

From
Tom Lane
Date:
"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes:
> Given the right allocation proportions, this may mean that in the end the
> kernel has no way to handle a shortage gracefully by causing fork() or
> allocations to fail.

Sure it does.  All you need is a conservative allocation policy: fork()
fails if it cannot reserve enough swap space to guarantee that the new
process could write over its entire address space.  Copy-on-write is
an optimization that reduces physical RAM usage, not virtual address
space or swap-space requirements.

Given that swap space is cheap, and that killing random processes is
obviously bad, it's not apparent to me why people think this is not
a good approach --- at least for high-reliability servers.  And Linux
would definitely like to think of itself as a server-grade OS.

> After that, where do you go?  Try to find a reasonable process to shoot
> in the head.  From what I heard, although I haven't kept current, a lot
> of work went into selecting a "reasonable" process, so there will be some
> determinism.

Considering the frequency with which we hear of database backends
getting shot in the head, I'd say those heuristics need lots of work
yet.  I'll take a non-heuristic solution for any system I have to
administer, thanks.
        regards, tom lane


Re: Pre-allocation of shared memory ...

From
Alvaro Herrera
Date:
On Thu, Jun 12, 2003 at 09:18:33PM -0400, Tom Lane wrote:

> Given that swap space is cheap, and that killing random processes is
> obviously bad, it's not apparent to me why people think this is not
> a good approach --- at least for high-reliability servers.  And Linux
> would definitely like to think of itself as a server-grade OS.

Well, it was a toy OS when conceived, that's for sure.  But it's getting
better.

> Considering the frequency with which we hear of database backends
> getting shot in the head, I'd say those heuristics need lots of work
> yet.

Previous versions were said to attempt to kill init.  You have to admit
there has been some progress.

But then there's the problem of people running database servers on
misconfigured machines.  They should know better than not setting enough
swap space, IMHO anyway.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Y una voz del caos me hablo y me dijo
"Sonrie y se feliz, podria ser peor".
Y sonrei. Y fui feliz.
Y fue peor.


Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
Tom Lane wrote:
> "Jeroen T. Vermeulen" <jtv@xs4all.nl> writes:
> > Given the right allocation proportions, this may mean that in the end the
> > kernel has no way to handle a shortage gracefully by causing fork() or
> > allocations to fail.
> 
> Sure it does.  All you need is a conservative allocation policy: fork()
> fails if it cannot reserve enough swap space to guarantee that the new
> process could write over its entire address space.  Copy-on-write is
> an optimization that reduces physical RAM usage, not virtual address
> space or swap-space requirements.
> 
> Given that swap space is cheap, and that killing random processes is
> obviously bad, it's not apparent to me why people think this is not
> a good approach --- at least for high-reliability servers.  And Linux
> would definitely like to think of itself as a server-grade OS.

BSD used to require full swap behind all RAM.  I am not sure if that was
changed in BSD 4.4 or in later BSD/OS releases, but it is no longer
true.  I think now it can use RAM or swap as reserved backing store for
fork page modifications.  However, when the system runs of of swap, it
hangs!

> > After that, where do you go?  Try to find a reasonable process to shoot
> > in the head.  From what I heard, although I haven't kept current, a lot
> > of work went into selecting a "reasonable" process, so there will be some
> > determinism.
> 
> Considering the frequency with which we hear of database backends
> getting shot in the head, I'd say those heuristics need lots of work
> yet.  I'll take a non-heuristic solution for any system I have to
> administer, thanks.

You have to love that swap + 1/2 ram option --- when you need four
possible options, there is something wrong with your approach.  :-)

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> You have to love that swap + 1/2 ram option --- when you need four
> possible options, there is something wrong with your approach.  :-)

I'm still wondering what the "no overcommit handling" option does,
exactly.
        regards, tom lane


Re: Pre-allocation of shared memory ...

From
Greg Stark
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

> On Thu, Jun 12, 2003 at 09:18:33PM -0400, Tom Lane wrote:
> 
> > Given that swap space is cheap, and that killing random processes is
> > obviously bad, it's not apparent to me why people think this is not
> > a good approach --- at least for high-reliability servers.  And Linux
> > would definitely like to think of itself as a server-grade OS.

Consider the case of huge processes trying to fork/exec to run ls. It might
seem kind of strange to be getting "Out of memory" errors from your java or
database engine when there are hundreds of megs free on the machine...

I suspect this was less of an issue in the days before copy on write because
vfork was more widely used/implemented. I'm not sure linux even implements
vfork other than just as a wrapper around fork. Even BSD ditched it a while
back though I think I saw that NetBSD reimplemented it since then.

> But then there's the problem of people running database servers on
> misconfigured machines.  They should know better than not setting enough
> swap space, IMHO anyway.

Well, I've seen DBAs say "Since I don't want the database swapping anyways,
I'll make really sure it doesn't swap by just not giving it any swap space --
that's why we bought so much RAM in the first place". It's not obvious that
you need swap to back memory the machine doesn't even report as being in
use...

-- 
greg



Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > You have to love that swap + 1/2 ram option --- when you need four
> > possible options, there is something wrong with your approach.  :-)
> 
> I'm still wondering what the "no overcommit handling" option does,
> exactly.

I assume it does no kills, and allows you to commit until you run of of
swap and hang.  This might be the BSD 4.4 behavior, actually.

It is bad to hang the system, but if it reports swap failure, at least
the admin knows why it failed, rather than killing random processes.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
Greg Stark wrote:
> I suspect this was less of an issue in the days before copy on write because
> vfork was more widely used/implemented. I'm not sure linux even implements
> vfork other than just as a wrapper around fork. Even BSD ditched it a while
> back though I think I saw that NetBSD reimplemented it since then.
> 
> > But then there's the problem of people running database servers on
> > misconfigured machines.  They should know better than not setting enough
> > swap space, IMHO anyway.
> 
> Well, I've seen DBAs say "Since I don't want the database swapping anyways,
> I'll make really sure it doesn't swap by just not giving it any swap space --
> that's why we bought so much RAM in the first place". It's not obvious that
> you need swap to back memory the machine doesn't even report as being in
> use...

I see no reason RAM can't be used as backing store for possible
copy-on-write use.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
Greg Stark
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> I see no reason RAM can't be used as backing store for possible
> copy-on-write use.

Depends on the scenario. For a database like postgres it would work fairly
well since that RAM is still available for filesystem buffers. For Oracle it
would suck because it's not available for Oracle to allocate to use for its
own buffers. And for a web server with an architecture like Apache it would
suck because it would mean being restricted to a much lower number of
processes than the machine could really handle.

> > I'm still wondering what the "no overcommit handling" option does,
> > exactly.
> 
> I assume it does no kills, and allows you to commit until you run of of
> swap and hang.  This might be the BSD 4.4 behavior, actually.

I think it just makes fork/mmap/sbrk return an error if you run out of swap.
That makes the error appear most likely as malloc() returning null which most
applications don't handle anyways and the user sees the same behaviour:
programs crashing randomly.

Of course that's not what high availability server software does but since
most users' big memory consumers these days seem to be their window manager
and its 3d animated window decorations...

-- 
greg



Re: Pre-allocation of shared memory ...

From
"Ron Mayer"
Date:
Jeroen T. Vermeulen wrote:
>
>After that, where do you go?  Try to find a reasonable process to shoot
>in the head.  From what I heard, although I haven't kept current, a lot
>of work went into selecting a "reasonable" process, so there will be some
>determinism.

FWIW, you can browse the logic linux uses to choose 
which process to kill here: http://lxr.linux.no/source/mm/oom_kill.c

If I read that right, this calculates "points" for each process, where:  points = vm_size_of_process            /
sqrt(cpu_time_it_ran)          / sqrt(sqrt(clock_time_it_had)           * 2 if the process was niced           / 4 if
theprocess ran a root           / 4 if the process had hardware access.
 
and whichever process has the most points dies.

I'm guessing any database backend (postgres, oracle)
that wasn't part of a long-lived connection seems like 
an especially attractive target to this algorithm.  

(Though hopefully it's all moot now that Andrew / Tomfound/recommended the paranoid overcommit option, whichsure seems
likethe most sane thing for a server to me)
 
  Ron

PS: Oracle DBAs suffer from the same pain.  http://www.cs.helsinki.fi/linux/linux-kernel/2001-12/0098.html
http://www.ussg.iu.edu/hypermail/linux/kernel/0103.3/0094.html




Re: Pre-allocation of shared memory ...

From
Alvaro Herrera
Date:
On Thu, Jun 12, 2003 at 07:22:14PM -0700, Ron Mayer wrote:

> FWIW, you can browse the logic linux uses to choose 
> which process to kill here:
>   http://lxr.linux.no/source/mm/oom_kill.c

Hey, this LXR thing is cool.  It'd be nice to have one of those for
Postgres.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La naturaleza, tan fragil, tan expuesta a la muerte... y tan viva"


Re: Pre-allocation of shared memory ...

From
"Shridhar Daithankar"
Date:
On 12 Jun 2003 at 11:31, Bruce Momjian wrote:

> 
> OK, doc patch attached and applied.  Improvements?

Can we point people to /usr/src/linux/doc...place where they can find more 
documentation  and if their kernel supports it or not.

ByeShridhar

--
Zall's Laws:    (1) Any time you get a mouthful of hot soup, the next thing you do      will be wrong.    (2) How long
aminute is, depends on which side of the 
 
bathroom       door you're on.



Re: Pre-allocation of shared memory ...

From
"Jeroen T. Vermeulen"
Date:
On Thu, Jun 12, 2003 at 07:22:14PM -0700, Ron Mayer wrote:
> I'm guessing any database backend (postgres, oracle)
> that wasn't part of a long-lived connection seems like 
> an especially attractive target to this algorithm.  

Yeah, IIRC it tries to pick daemons that can be restarted, or will be
restarted automatically, but may need a lot less memory after that.


Jeroen



Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
Shridhar Daithankar wrote:
> On 12 Jun 2003 at 11:31, Bruce Momjian wrote:
> 
> > 
> > OK, doc patch attached and applied.  Improvements?
> 
> Can we point people to /usr/src/linux/doc...place where they can find more 
> documentation  and if their kernel supports it or not.

Yes, we could, but the name of the parameter seems enough.  They
certainly can look that up.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
Patrick Welche
Date:
On Thu, Jun 12, 2003 at 10:10:02PM -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > You have to love that swap + 1/2 ram option --- when you need four
> > > possible options, there is something wrong with your approach.  :-)
> > 
> > I'm still wondering what the "no overcommit handling" option does,
> > exactly.
> 
> I assume it does no kills, and allows you to commit until you run of of
> swap and hang.  This might be the BSD 4.4 behavior, actually.

? I thought the idea of no overcommit was that your malloc fails ENOMEM
if there isn't enough memory free for your whole request, rather than
gambling that other processes aren't actually using all of theirs right now
and have pages swapped out. I don't see where the hang comes in..

> It is bad to hang the system, but if it reports swap failure, at least
> the admin knows why it failed, rather than killing random processes.

Yes!

Patrick


Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
Patrick Welche wrote:
> On Thu, Jun 12, 2003 at 10:10:02PM -0400, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > > You have to love that swap + 1/2 ram option --- when you need four
> > > > possible options, there is something wrong with your approach.  :-)
> > > 
> > > I'm still wondering what the "no overcommit handling" option does,
> > > exactly.
> > 
> > I assume it does no kills, and allows you to commit until you run of of
> > swap and hang.  This might be the BSD 4.4 behavior, actually.
> 
> ? I thought the idea of no overcommit was that your malloc fails ENOMEM
> if there isn't enough memory free for your whole request, rather than
> gambling that other processes aren't actually using all of theirs right now
> and have pages swapped out. I don't see where the hang comes in..

I think there are two important memory cases:

malloc() - should fail right away if it can't reserve the requested
memory;  assuming application request memory they don't use just seems
dumb --- fix the bad apps.

fork() - this is the tricky one because you don't know at fork time who
is going to be sharing the data pages as read-only or doing an exec to
overlay a new process, and who is going to be modifying them and need a
private copy.

I think only the fork case is tricky.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
"Jeroen T. Vermeulen"
Date:
On Fri, Jun 13, 2003 at 09:25:49AM -0400, Bruce Momjian wrote:
> 
> malloc() - should fail right away if it can't reserve the requested
> memory;  assuming application request memory they don't use just seems
> dumb --- fix the bad apps.
> 
> fork() - this is the tricky one because you don't know at fork time who
> is going to be sharing the data pages as read-only or doing an exec to
> overlay a new process, and who is going to be modifying them and need a
> private copy.
> 
> I think only the fork case is tricky.

But how do you tell that a malloc() can't get enough memory, once you've
had to overcommit on fork()s?  If a really large program did a regular
fork()/exec() and there wasn't enough free virtual memory to support
the full fork() "just in case the program isn't going to exec()," then
*any* malloc() occurring between the two calls would have to fail.  That
may be better than random killing in theory, but the practical effect
would be close to that.

There's other complications as well, I'm sure.  If this were easy, we
probably wouldn't be discussing this problem now.


Jeroen



Re: Pre-allocation of shared memory ...

From
Josh Berkus
Date:
Tom, et al,

> > Given that swap space is cheap, and that killing random processes is
> > obviously bad, it's not apparent to me why people think this is not
> > a good approach --- at least for high-reliability servers.  And Linux
> > would definitely like to think of itself as a server-grade OS.

Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for 
example), include adequate swap space in their "suggested" disk formatting.  
Some versions of some distributions do not create a swap partition at all; 
others allocate only 130mb to this partition regardless of actual RAM.

So regardless of what they *should* be doing, there's thousands of Linux users 
out there with too little or no swap on disk ...

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
Josh Berkus wrote:
> Tom, et al,
> 
> > > Given that swap space is cheap, and that killing random processes is
> > > obviously bad, it's not apparent to me why people think this is not
> > > a good approach --- at least for high-reliability servers.  And Linux
> > > would definitely like to think of itself as a server-grade OS.
> 
> Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for 
> example), include adequate swap space in their "suggested" disk formatting.  
> Some versions of some distributions do not create a swap partition at all; 
> others allocate only 130mb to this partition regardless of actual RAM.
> 
> So regardless of what they *should* be doing, there's thousands of Linux users 
> out there with too little or no swap on disk ...

Yes, I have seen that on BSD's too.  I am unsure if we need actual swap
backing store, or just sufficient RAM to allow fork expansion for dirty
pages.


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
Lamar Owen
Date:
On Friday 13 June 2003 11:55, Josh Berkus wrote:
> Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
> example), include adequate swap space in their "suggested" disk formatting.
> Some versions of some distributions do not create a swap partition at all;
> others allocate only 130mb to this partition regardless of actual RAM.

Incidentally, Red Hat as of about 7.0 began insisting on swap space at least 
as large as twice RAM size.  In my case on my 512MB RAM notebook, that meant 
it wanted 1GB swap.  If you upgrade your RAM you could get into trouble.  In 
that case, you create a swap file on one of your other partitions that the 
kernel can use.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11



Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
Lamar Owen wrote:
> On Friday 13 June 2003 11:55, Josh Berkus wrote:
> > Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
> > example), include adequate swap space in their "suggested" disk formatting.
> > Some versions of some distributions do not create a swap partition at all;
> > others allocate only 130mb to this partition regardless of actual RAM.
> 
> Incidentally, Red Hat as of about 7.0 began insisting on swap space at least 
> as large as twice RAM size.  In my case on my 512MB RAM notebook, that meant 
> it wanted 1GB swap.  If you upgrade your RAM you could get into trouble.  In 
> that case, you create a swap file on one of your other partitions that the 
> kernel can use.

Oh, that's interesting. I know the newer BSD releases got rid of the
large swap requirement, on the understanding that you usually aren't
going to be using it anyway.

What old BSD releases used to do was to allocate swap space as backing
_all_ RAM, even when it wasn't going to need it, while later releases
allocated swap only when it was needed, so it was only for cases
_exceeding_ RAM, so your virtual memory was now RAM _plus_ swap.

Of course, if you exceed swap, your system hangs.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
"Nigel J. Andrews"
Date:
On Fri, 13 Jun 2003, Lamar Owen wrote:

> On Friday 13 June 2003 11:55, Josh Berkus wrote:
> > Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
> > example), include adequate swap space in their "suggested" disk formatting.
> > Some versions of some distributions do not create a swap partition at all;
> > others allocate only 130mb to this partition regardless of actual RAM.
> 
> Incidentally, Red Hat as of about 7.0 began insisting on swap space at least 
> as large as twice RAM size.  In my case on my 512MB RAM notebook, that meant 
> it wanted 1GB swap.  If you upgrade your RAM you could get into trouble.  In 
> that case, you create a swap file on one of your other partitions that the 
> kernel can use.

I'm not sure I agree with this. To a large extent these days of cheap memory
swap space is there to give you time to notice the excessive use of it and
repair the system, since you'd normally be running everything in RAM.

Using the old measure of twice physical memory for swap is excessive on a
decent system imo. I certainly would not allocate 1GB of swap! Well, okay, I
might if I've got a 16GB machine with the potential for an excessive
but transitory workload, or say 4-8GB machine with a few very large memory
usage processes that can be started as part of the normal work load.

In short, imo these days swap is there to prevent valid processes dying for
lack of system memory and not to provide normal workspace for them.

Having said all that, I haven't read the start of this thread so I've probably
missed the reason for the complaint about lack of swap space, like a problem on
a small memory system.


-- 
Nigel J. Andrews



Re: Pre-allocation of shared memory ...

From
Bruce Momjian
Date:
I will say I do use swap sometimes when I am editing a huge image or
something --- there are peak times when it is required.

---------------------------------------------------------------------------

Nigel J. Andrews wrote:
> On Fri, 13 Jun 2003, Lamar Owen wrote:
> 
> > On Friday 13 June 2003 11:55, Josh Berkus wrote:
> > > Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
> > > example), include adequate swap space in their "suggested" disk formatting.
> > > Some versions of some distributions do not create a swap partition at all;
> > > others allocate only 130mb to this partition regardless of actual RAM.
> > 
> > Incidentally, Red Hat as of about 7.0 began insisting on swap space at least 
> > as large as twice RAM size.  In my case on my 512MB RAM notebook, that meant 
> > it wanted 1GB swap.  If you upgrade your RAM you could get into trouble.  In 
> > that case, you create a swap file on one of your other partitions that the 
> > kernel can use.
> 
> I'm not sure I agree with this. To a large extent these days of cheap memory
> swap space is there to give you time to notice the excessive use of it and
> repair the system, since you'd normally be running everything in RAM.
> 
> Using the old measure of twice physical memory for swap is excessive on a
> decent system imo. I certainly would not allocate 1GB of swap! Well, okay, I
> might if I've got a 16GB machine with the potential for an excessive
> but transitory workload, or say 4-8GB machine with a few very large memory
> usage processes that can be started as part of the normal work load.
> 
> In short, imo these days swap is there to prevent valid processes dying for
> lack of system memory and not to provide normal workspace for them.
> 
> Having said all that, I haven't read the start of this thread so I've probably
> missed the reason for the complaint about lack of swap space, like a problem on
> a small memory system.
> 
> 
> -- 
> Nigel J. Andrews
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: most folks find a random_page_cost between 1 or 2 is ideal
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pre-allocation of shared memory ...

From
"Jeroen T. Vermeulen"
Date:
On Fri, Jun 13, 2003 at 12:32:24PM -0400, Lamar Owen wrote:
> 
> Incidentally, Red Hat as of about 7.0 began insisting on swap space at least 
> as large as twice RAM size.  In my case on my 512MB RAM notebook, that meant 
> it wanted 1GB swap.  If you upgrade your RAM you could get into trouble.  In 
> that case, you create a swap file on one of your other partitions that the 
> kernel can use.

RedHat's position may be influenced by the fact that, AFAIR, they use
the Rik van Riel virtual memory system which is inclusive--i.e., you need
at least as much swap as you have physical memory before you really have
any virtual memory at all.  This was fixed by the competing Andrea
Arcangeli system, which became standard for the Linux kernel around
2.4.10 or so.


Jeroen



Re: Pre-allocation of shared memory ...

From
Lamar Owen
Date:
On Friday 13 June 2003 12:46, Nigel J. Andrews wrote:
> On Fri, 13 Jun 2003, Lamar Owen wrote:
> > Incidentally, Red Hat as of about 7.0 began insisting on swap space at
> > least as large as twice RAM size.  In my case on my 512MB RAM notebook,
> > that meant it wanted 1GB swap.  If you upgrade your RAM you could get
> > into trouble.  In that case, you create a swap file on one of your other
> > partitions that the kernel can use.

> I'm not sure I agree with this. To a large extent these days of cheap
> memory swap space is there to give you time to notice the excessive use of
> it and repair the system, since you'd normally be running everything in
> RAM.

It is or was a Linux kernel problem.  The 2.2 kernel required double swap 
space, even though it wasn't well documented.  Early 2.4 kernels also 
required double swap space, and it was better documented.  Current Red Hat 
2.4 kernels, I'm not sure which VM system is in use.  The old VM certainly 
DID require double physical memory swap space.

From a message I wrote in January of 2002:
"On Tuesday 22 January 2002 03:48 pm, Jim Wilcoxson wrote:
> I should have said, we're running this way on 2.2.19, not 2.4   -J

> > Is this Linux requirement documented anywhere?  We're running 256MB
> > of swap on 1GB machines and have not had any problems.  But we don't
> > swap much either.

2.2 actually needs 2x swap, but the problems are worse with 2.4.  2.2 won't
die a horrible screaming death -- but 2.4 WILL DIE if you run out of swap in
the wrong way. As to documentation, I can't tell you how I found out about
it, as I'm under NDA from that source.

However, it is public information:  see http://lwn.net/2001/0607/kernel.php3
for some pointers.  Also see
http://www.geocrawler.com/archives/3/84/2001/5/0/5867356/
http://www.tuxedo.org/~esr/writings/ultimate-linux-box/configuration.html
and
http://www.ultraviolet.org/mail-archives/linux-kernel.2001/28831.html

And note that Red Hat Linux 7.1 and 7.2 will complain vociferously if you
create a swap partition smaller than 2x RAM during installation (anaconda).
What it doesn't do is complain when you upgrade RAM but don't upgrade your
swap."

Now, as to whether this is _still_ a requirement or not, I don't know.  Search 
the lkml (Linux Kernel Mailing List) for it.

However, understand that the Red Hat kernel is closer to an Alan Cox kernel 
than to a Linus kernel.  At least that was true up to 2.4.18; the Red Hat 
2.4.20 is very different, with NPTL and its ilk thrown in.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11



Re: Pre-allocation of shared memory ...

From
Lamar Owen
Date:
On Friday 13 June 2003 15:29, Lamar Owen wrote:
> It is or was a Linux kernel problem.  The 2.2 kernel required double swap
> space, even though it wasn't well documented.  Early 2.4 kernels also
> required double swap space, and it was better documented.  Current Red Hat
> 2.4 kernels, I'm not sure which VM system is in use.  The old VM certainly
> DID require double physical memory swap space.

After consulting with some kernel gurus, you can upgrade to a straight Alan 
Cox (-ac) kernel and turn off overcommits to cause it to fail the allocation 
instead of blowing processes out at random when the overcommit bites.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11



Re: Pre-allocation of shared memory ...

From
"Andrew Dunstan"
Date:
The trouble with this advice is that if I am an SA wanting to run a DBMS
server, I will want to run a kernel supplied by a vendor, not an arbitrary
kernel released by a developer, even one as respected as Alan Cox.

andrew

----- Original Message ----- 
From: "Lamar Owen" <lamar.owen@wgcr.org>
To: "Nigel J. Andrews" <nandrews@investsystems.co.uk>
Cc: "Josh Berkus" <josh@agliodbs.com>; <pgsql-hackers@postgresql.org>
Sent: Saturday, June 14, 2003 11:52 AM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...


> On Friday 13 June 2003 15:29, Lamar Owen wrote:
> > It is or was a Linux kernel problem.  The 2.2 kernel required double
swap
> > space, even though it wasn't well documented.  Early 2.4 kernels also
> > required double swap space, and it was better documented.  Current Red
Hat
> > 2.4 kernels, I'm not sure which VM system is in use.  The old VM
certainly
> > DID require double physical memory swap space.
>
> After consulting with some kernel gurus, you can upgrade to a straight
Alan
> Cox (-ac) kernel and turn off overcommits to cause it to fail the
allocation
> instead of blowing processes out at random when the overcommit bites.



Re: Pre-allocation of shared memory ...

From
"Andrew Dunstan"
Date:
http://lwn.net/Articles/4628/ has this possibly useful info:

---------------So what is strict VM overcommit?  We introduce new overcommit policies
that attempt to never succeed an allocation that can not be fulfilled by
the backing store and consequently never OOM.  This is achieved through
strict accounting of the committed address space and a policy to
allow/refuse allocations based on that accounting.

In the strictest of modes, it should be impossible to allocate more
memory than available and impossible to OOM.  All memory failures should
be pushed down to the allocation routines -- malloc, mmap, etc.
--------------
But see also the discussion from July last
year:http://www.ussg.iu.edu/hypermail/linux/kernel/0207.2/index.htmlA quick
investigation of 2.4 releases on kernel.org appears to show this still
hasn't made it into mainline kernels. Apparently Alan did this work
originally because RH had customers using Oracle who were running into OOM
... Surprise!I don't keep copies of old kernel sources around on my Linux
machine, so I don't know when it went into the RH kernel series - that at
least would be nice to know.andrew

----- Original Message ----- 
From: "Andrew Dunstan" <andrew@dunslane.net>
To: <pgsql-hackers@postgresql.org>
Sent: Saturday, June 14, 2003 12:30 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...


> The trouble with this advice is that if I am an SA wanting to run a DBMS
> server, I will want to run a kernel supplied by a vendor, not an arbitrary
> kernel released by a developer, even one as respected as Alan Cox.
>
> andrew
>
> ----- Original Message ----- 
> From: "Lamar Owen" <lamar.owen@wgcr.org>
> To: "Nigel J. Andrews" <nandrews@investsystems.co.uk>
> Cc: "Josh Berkus" <josh@agliodbs.com>; <pgsql-hackers@postgresql.org>
> Sent: Saturday, June 14, 2003 11:52 AM
> Subject: Re: [HACKERS] Pre-allocation of shared memory ...
>
>
> > On Friday 13 June 2003 15:29, Lamar Owen wrote:
> > > It is or was a Linux kernel problem.  The 2.2 kernel required double
> swap
> > > space, even though it wasn't well documented.  Early 2.4 kernels also
> > > required double swap space, and it was better documented.  Current Red
> Hat
> > > 2.4 kernels, I'm not sure which VM system is in use.  The old VM
> certainly
> > > DID require double physical memory swap space.
> >
> > After consulting with some kernel gurus, you can upgrade to a straight
> Alan
> > Cox (-ac) kernel and turn off overcommits to cause it to fail the
> allocation
> > instead of blowing processes out at random when the overcommit bites.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match



Re: Pre-allocation of shared memory ...

From
Matthew Kirkwood
Date:
On Sat, 14 Jun 2003, Andrew Dunstan wrote:

> The trouble with this advice is that if I am an SA wanting to run a
> DBMS server, I will want to run a kernel supplied by a vendor, not an
> arbitrary kernel released by a developer, even one as respected as
> Alan Cox.

Like, say, Red Hat:

$ ls -l /proc/sys/vm/overcommit_memory
-rw-r--r--    1 root     root            0 Jun 14 18:58 /proc/sys/vm/overcommit_memory
$ uname -a
Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31 EDT 2003 i686 i686 i386 GNU/Linux

(This is a Rawhide kernel, but I think that control has been
in stock RH kernels for some time now.)

Matthew.



Re: Pre-allocation of shared memory ...

From
Kurt Roeckx
Date:
On Sat, Jun 14, 2003 at 08:32:40PM +0100, Matthew Kirkwood wrote:
> On Sat, 14 Jun 2003, Andrew Dunstan wrote:
> 
> > The trouble with this advice is that if I am an SA wanting to run a
> > DBMS server, I will want to run a kernel supplied by a vendor, not an
> > arbitrary kernel released by a developer, even one as respected as
> > Alan Cox.
> 
> Like, say, Red Hat:
> 
> $ ls -l /proc/sys/vm/overcommit_memory
> -rw-r--r--    1 root     root            0 Jun 14 18:58 /proc/sys/vm/overcommit_memory
> $ uname -a
> Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31 EDT 2003 i686 i686 i386 GNU/Linux


I also got that /proc/sys/vm/overcommit_memory on a plain 2.4.21.


Kurt



Re: Pre-allocation of shared memory ...

From
Matthew Kirkwood
Date:
On Sat, 14 Jun 2003, Kurt Roeckx wrote:

> > $ ls -l /proc/sys/vm/overcommit_memory
> > -rw-r--r--    1 root     root            0 Jun 14 18:58 /proc/sys/vm/overcommit_memory
> > $ uname -a
> > Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31 EDT 2003 i686 i686 i386 GNU/Linux
>
> I also got that /proc/sys/vm/overcommit_memory on a plain 2.4.21.

This might also be interesting:

http://www.cs.helsinki.fi/linux/linux-kernel/2002-33/0826.html

I couldn't say how much of it is in the stock RH kernels,
or how successful the heuristic is.

Matthew.



Re: Pre-allocation of shared memory ...

From
"Andrew Dunstan"
Date:
Yes, but it's only a binary flag. Non-zero says "cheerfully overcommit" and
0 says "try not to overcommit"  but there isn't a value that says "make sure
not to overcommit".

Have a look in mm/mmap.c in the plain 2.4.21 sources for evidence. There's
nothing like the Alan Cox patch.

IOW, simply the presence of /proc/sys/vm/overcommit_memory with a value set
to 0 doesn't guarantee you won't get an OOM kill, AFAICS.

I *know* the latest RH kernel docs *say* they have paranoid mode that
supposedly guarantees against OOM - it was me that pointed that out
originally :-). I just checked on the latest sources (today it's RH8, kernel
2.4.20-18.8) to be doubly sure, and can't see the patches. (That would be
really bad of RH, btw, if I'm correct - saying in your docs you support
something that you don't)

The proof, if any is needed, that the mainline kernel still does not have
this, is that it is still in Alan's patch set against 2.4.21, at
http://www.kernel.org/pub/linux/kernel/people/alan/linux-2.4/2.4.21/patch-2.4.21-ac1.gz

Summary: don't take shortcuts looking for this - Read the Source, Luke. It's
important not to give people false expectations. For now, I'm leaning in
Tom's direction of advising people to avoid Linux for mission-critical
situations that could run into an OOM.

cheers

andrew

----- Original Message ----- 
From: "Kurt Roeckx" <Q@ping.be>
To: "Matthew Kirkwood" <matthew@hairy.beasts.org>
Cc: "Andrew Dunstan" <andrew@dunslane.net>; <pgsql-hackers@postgresql.org>
Sent: Saturday, June 14, 2003 3:44 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...


> On Sat, Jun 14, 2003 at 08:32:40PM +0100, Matthew Kirkwood wrote:
> > On Sat, 14 Jun 2003, Andrew Dunstan wrote:
> >
> > > The trouble with this advice is that if I am an SA wanting to run a
> > > DBMS server, I will want to run a kernel supplied by a vendor, not an
> > > arbitrary kernel released by a developer, even one as respected as
> > > Alan Cox.
> >
> > Like, say, Red Hat:
> >
> > $ ls -l /proc/sys/vm/overcommit_memory
> > -rw-r--r--    1 root     root            0 Jun 14 18:58
/proc/sys/vm/overcommit_memory
> > $ uname -a
> > Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31
EDT 2003 i686 i686 i386 GNU/Linux
>
>
> I also got that /proc/sys/vm/overcommit_memory on a plain 2.4.21.
>
>
> Kurt
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html



Re: Pre-allocation of shared memory ...

From
Tom Lane
Date:
"Andrew Dunstan" <andrew@dunslane.net> writes:
> I *know* the latest RH kernel docs *say* they have paranoid mode that
> supposedly guarantees against OOM - it was me that pointed that out
> originally :-). I just checked on the latest sources (today it's RH8, kernel
> 2.4.20-18.8) to be doubly sure, and can't see the patches.

I think you must be looking in the wrong place.  Red Hat's kernels have
included the mode 2/3 overcommit logic since RHL 7.3, according to
what I can find.  (Don't forget Alan Cox works for Red Hat ;-).)

But it is true that it's not in Linus' tree yet.  This may be because
there are still some loose ends.  The copy of the overcommit document
in my RHL 8.0 system lists some ToDo items down at the bottom:

To Do
-----
o       Account ptrace pages (this is hard)
o       Disable MAP_NORESERVE in mode 2/3
o       Account for shared anonymous mappings properly       - right now we account them per instance

I have not installed RHL 9 yet --- is the ToDo list any shorter there?
        regards, tom lane


Re: Pre-allocation of shared memory ...

From
Tom Lane
Date:
"Andrew Dunstan" <andrew@dunslane.net> writes:
> I *know* the latest RH kernel docs *say* they have paranoid mode that
> supposedly guarantees against OOM - it was me that pointed that out
> originally :-). I just checked on the latest sources (today it's RH8, kernel
> 2.4.20-18.8) to be doubly sure, and can't see the patches. (That would be
> really bad of RH, btw, if I'm correct - saying in your docs you support
> something that you don't)

I tried a direct test on my RHL 8.0 box, and was able to prove that
indeed the overcommit 2/3 modes do something, though whether they work
exactly as documented is another question.

I wrote this silly little test program to get an approximate answer
about the largest amount a program could malloc:

#include <stdio.h>
#include <stdlib.h>

int
main (int argc, char **argv)
{ size_t min = 1024;        /* assume this'd work */ size_t max = -1;        /* = max unsigned */ size_t sz; void
*ptr;
 while ((max - min) >= 1024ul) {   sz = (((unsigned long long) max) + ((unsigned long long) min)) / 2;   ptr =
malloc(sz);  if (ptr) {     free(ptr);
 
//      printf("malloc(%lu) succeeded\n", sz);     min = sz;   } else {
//      printf("malloc(%lu) failed\n", sz);     max = sz;   } }
 printf("Max malloc is %lu Kb\n", min / 1024);
 return 0;
}

and got these results:

[root@rh1 tmp]# echo 0 > /proc/sys/vm/overcommit_memory
[root@rh1 tmp]# ./alloc
Max malloc is 1489075 Kb
[root@rh1 tmp]# echo 1 > /proc/sys/vm/overcommit_memory
[root@rh1 tmp]# ./alloc
Max malloc is 2063159 Kb
[root@rh1 tmp]# echo 2 > /proc/sys/vm/overcommit_memory
[root@rh1 tmp]# ./alloc
Max malloc is 1101639 Kb
[root@rh1 tmp]# echo 3 > /proc/sys/vm/overcommit_memory
[root@rh1 tmp]# ./alloc
Max malloc is 974179 Kb

So it's definitely doing something.  /proc/meminfo shows
       total:    used:    free:  shared: buffers:  cached:
Mem:  261042176 160456704 100585472        0 72015872 63344640
Swap: 1077501952 44974080 1032527872
MemTotal:       254924 kB
MemFree:         98228 kB
MemShared:           0 kB
Buffers:         70328 kB
Cached:          59244 kB
SwapCached:       2616 kB
Active:         102532 kB
Inact_dirty:     11644 kB
Inact_clean:     21840 kB
Inact_target:    27200 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       254924 kB
LowFree:         98228 kB
SwapTotal:     1052248 kB
SwapFree:      1008328 kB
Committed_AS:    77164 kB

It does appear that the limit in mode 3 is not too far from where
you'd expect (SwapTotal - Committed_AS), and mode 2 allows about
128M more, which is correct since there's 256 M of RAM.
        regards, tom lane


Re: Pre-allocation of shared memory ...

From
"Andrew Dunstan"
Date:
I know he does -  *but* I think it has probably been wiped out by accident
somewhere along the line (like when they went to 2.4.20?)

Here's what's in RH sources - tell me after you look that I am looking in
the wrong place. (Or did RH get cute and decide to do this only for the AS
product?)

first, RH7.3/kernel 2.4.18-3 (patch present):

----------------
int vm_enough_memory(long pages, int charge)
{       /* Stupid algorithm to decide if we have enough memory: while        * simple, it hopefully works in most
obviouscases.. Easy to        * fool it, but this should catch most mistakes.        *        * 23/11/98 NJC: Somewhat
lessstupid version of algorithm,        * which tries to do "TheRightThing".  Instead of using half of        *
(buffers+cache),use the minimum values.  Allow an extra 2%        * of num_physpages for safety margin.        *
*2002/02/26 Alan Cox: Added two new modes that do real accounting        */       unsigned long free, allowed;
structsysinfo i;
 
       if(charge)               atomic_add(pages, &vm_committed_space);
       /* Sometimes we want to use more memory than we have. */       if (sysctl_overcommit_memory == 1)
return1;       if (sysctl_overcommit_memory == 0)       {               /* The page cache contains buffer pages these
days..*/               free = atomic_read(&page_cache_size);               free += nr_free_pages();               free
+=nr_swap_pages;
 
               /*                * This double-counts: the nrpages are both in the
page-cache                * and in the swapper space. At the same time, this
compensates                * for the swap-space over-allocation (ie "nr_swap_pages"
being                * too small.                */               free += swapper_space.nrpages;
               /*                * The code below doesn't account for free space in the
inode                * and dentry slab cache, slab cache fragmentation, inodes
and                * dentries which will become freeable under VM load, etc.                * Lets just hope all these
(complex)factors balance out...                */               free += (dentry_stat.nr_unused * sizeof(struct dentry))
>>
PAGE_SHIFT;               free += (inodes_stat.nr_unused * sizeof(struct inode)) >>
PAGE_SHIFT;
               if(free > pages)                       return 1;               atomic_sub(pages, &vm_committed_space);
           return 0;       }       allowed = total_swap_pages;
 
       if(sysctl_overcommit_memory == 2)       {               /* FIXME - need to add arch hooks to get the bits we
need                 without the higher overhead crap */               si_meminfo(&i);               allowed +=
i.totalram>> 1;       }       if(atomic_read(&vm_committed_space) < allowed)               return 1;       if(charge)
           atomic_sub(pages, &vm_committed_space);       return 0;
 

}
---------
and here's what's in RH9/2.4.20-18 (patch absent):
--------------
int vm_enough_memory(long pages)
{       /* Stupid algorithm to decide if we have enough memory: while        * simple, it hopefully works in most
obviouscases.. Easy to        * fool it, but this should catch most mistakes.        */       /* 23/11/98 NJC: Somewhat
lessstupid version of algorithm,        * which tries to do "TheRightThing".  Instead of using half of        *
(buffers+cache),use the minimum values.  Allow an extra 2%        * of num_physpages for safety margin.        */
 
       unsigned long free;
       /* Sometimes we want to use more memory than we have. */       if (sysctl_overcommit_memory)           return
1;
       /* The page cache contains buffer pages these days.. */       free = atomic_read(&page_cache_size);       free
+=nr_free_pages();       free += nr_swap_pages;
 
       /*        * This double-counts: the nrpages are both in the page-cache        * and in the swapper space. At the
sametime, this compensates        * for the swap-space over-allocation (ie "nr_swap_pages" being        * too small.
   */       free += swapper_space.nrpages;
 
       /*        * The code below doesn't account for free space in the inode        * and dentry slab cache, slab
cachefragmentation, inodes and        * dentries which will become freeable under VM load, etc.        * Lets just hope
allthese (complex) factors balance out...        */       free += (dentry_stat.nr_unused * sizeof(struct dentry)) >>
 
PAGE_SHIFT;       free += (inodes_stat.nr_unused * sizeof(struct inode)) >>
PAGE_SHIFT;
       return free > pages;
}

----- Original Message ----- 
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Andrew Dunstan" <andrew@dunslane.net>
Cc: "Kurt Roeckx" <Q@ping.be>; "Matthew Kirkwood"
<matthew@hairy.beasts.org>; <pgsql-hackers@postgresql.org>
Sent: Saturday, June 14, 2003 5:16 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...


> "Andrew Dunstan" <andrew@dunslane.net> writes:
> > I *know* the latest RH kernel docs *say* they have paranoid mode that
> > supposedly guarantees against OOM - it was me that pointed that out
> > originally :-). I just checked on the latest sources (today it's RH8,
kernel
> > 2.4.20-18.8) to be doubly sure, and can't see the patches.
>
> I think you must be looking in the wrong place.  Red Hat's kernels have
> included the mode 2/3 overcommit logic since RHL 7.3, according to
> what I can find.  (Don't forget Alan Cox works for Red Hat ;-).)
>
> But it is true that it's not in Linus' tree yet.  This may be because
> there are still some loose ends.  The copy of the overcommit document
> in my RHL 8.0 system lists some ToDo items down at the bottom:
>
> To Do
> -----
> o       Account ptrace pages (this is hard)
> o       Disable MAP_NORESERVE in mode 2/3
> o       Account for shared anonymous mappings properly
>         - right now we account them per instance
>
> I have not installed RHL 9 yet --- is the ToDo list any shorter there?
>
> regards, tom lane



Re: Pre-allocation of shared memory ...

From
Lamar Owen
Date:
On Saturday 14 June 2003 16:38, Andrew Dunstan wrote:
> IOW, simply the presence of /proc/sys/vm/overcommit_memory with a value set
> to 0 doesn't guarantee you won't get an OOM kill, AFAICS.

Right.  You need the value to be 2 or 3.  Which means you need Alan's patch to 
do that.

> I *know* the latest RH kernel docs *say* they have paranoid mode that
> supposedly guarantees against OOM - it was me that pointed that out
> originally :-). I just checked on the latest sources (today it's RH8,
> kernel 2.4.20-18.8) to be doubly sure, and can't see the patches. (That
> would be really bad of RH, btw, if I'm correct - saying in your docs you
> support something that you don't)

But note these two lines in the docs with 2.4.20-13.9 (RHL9 errata):
* This describes the overcommit management facility in the latest kernel tree (FIXME: actually it also describes the
stuffthat isnt yet done)
 

Pay double attention to the line that says FIXME.  IOW, they've documented 
stuff that might not be done!

You can try Red Hat's enterprise kernel, but you'll have to build it from 
source.  RHEL AS is available online as source RPMs.

Also understand that the official Red Hat kernel is very close to an Alan Cox 
kernel.  Also, if you really want to get down and dirty testing the kernel, a 
test suite is available to help with that, known as Cerberus.  Configs are 
available specifically tuned to stress-test kernels.  I think Cerberus is on 
Source Forge.

So, make sure you have a kernel that allows overcommit-accounting mode 2 to 
prevent kills on OOM.  Theoretically mode 2 will prevent the possiblity of 
OOM completely.

If I read things right, if you have double swap space mode 0 will not OOM 
nearly as quickly.
-- 
Lamar Owen
WGCR Internet Radio
1 Peter 4:11



Re: Pre-allocation of shared memory ...

From
"Shridhar Daithankar"
Date:
On 14 Jun 2003 at 16:38, Andrew Dunstan wrote:
> Summary: don't take shortcuts looking for this - Read the Source, Luke. It's
> important not to give people false expectations. For now, I'm leaning in
> Tom's direction of advising people to avoid Linux for mission-critical
> situations that could run into an OOM.

While I agree that vanilla linux does not handle the situation gracefully 
enough, anybody running a mission critical application should spec. the machine 
and the demads on the same carefully enough. For certain linux won't start 
doing OOM kill because it started going low on buffer memory. ( At least I hope 
so.)

If on expects to throw uncalculated amount of load on a mission critical box, 
till it reaches swap for every malloc in a strcpy, there are things need to be 
checked before which kernel/OS you are running.

And BTW whas that original comment for vanilla liux or linux in general..:-)


ByeShridhar

--
Adore, v.:    To venerate expectantly.        -- Ambrose Bierce, "The Devil's 
Dictionary"



Re: Pre-allocation of shared memory ...

From
"Andrew Dunstan"
Date:
Alan Cox has written to me thus:

> It got dropped for RH9 and some errata kernels because of clashes between
> the old stuff and the rmap vm and other weird RH patches

andrew

----- Original Message ----- 
From: "Andrew Dunstan" <andrew@dunslane.net>
To: "Tom Lane" <tgl@sss.pgh.pa.us>
Cc: "Kurt Roeckx" <Q@ping.be>; "Matthew Kirkwood"
<matthew@hairy.beasts.org>; <pgsql-hackers@postgresql.org>
Sent: Saturday, June 14, 2003 5:39 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...


> I know he does -  *but* I think it has probably been wiped out by accident
> somewhere along the line (like when they went to 2.4.20?)
>
> Here's what's in RH sources - tell me after you look that I am looking in
> the wrong place. (Or did RH get cute and decide to do this only for the AS
> product?)
>



Re: Pre-allocation of shared memory ...

From
"Jim C. Nasby"
Date:
On Thu, Jun 12, 2003 at 10:10:02PM -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> It is bad to hang the system, but if it reports swap failure, at least
> the admin knows why it failed, rather than killing random processes.
I wonder if it might be better to suspend whatever process is trying to
allocate/write to too much memory. At least then you have some chance of
keeping the system up (obviously you'd need to leave some amount free so
you could login to the box to fix things).
-- 
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: Pre-allocation of shared memory ...

From
"Jim C. Nasby"
Date:
On Fri, Jun 13, 2003 at 12:41:28PM -0400, Bruce Momjian wrote:
> Of course, if you exceed swap, your system hangs.
Are you sure? I ran out of swap once or came damn close, due to a cron
job gone amuck. My clue was starting to see lots of memory allocation
errors. After I fixed what was blocking all the backed-up cron jobs, the
machine ground to a crawl (mmm... system load of 400+ on a dual
PII-375), and X did crash (though I think that's because I tried
switching to a different virtual console), but the machine stayed up and
eventually worked through everything.
-- 
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"