Home > mailing lists

Re: URGENT: Database keeps crashing - suspect damaged RAM - Mailing list pgsql-general

From	Markus Wollny
Subject	Re: URGENT: Database keeps crashing - suspect damaged RAM
Date	August 6, 2002 13:59:33
Msg-id	2266D0630E43BB4290742247C8910575014CE342@dozer.computec.de Whole thread Raw
In response to	URGENT: Database keeps crashing - suspect damaged RAM ("Markus Wollny" <Markus.Wollny@computec.de>)
Responses	Re: URGENT: Database keeps crashing - suspect damaged RAM Re: URGENT: Database keeps crashing - suspect damaged RAM
List	pgsql-general

Tree view

Hi!

    -----Ursprüngliche Nachricht----- 
    Von: Tom Lane 
    Gesendet: Di 06.08.2002 18:59 
    An: Markus Wollny 
    Cc: pgsql-general@postgresql.org 
    Betreff: Re: [GENERAL] URGENT: Database keeps crashing - suspect
damaged RAM 

    "Markus Wollny" <Markus.Wollny@computec.de> writes:
    > So: Is it bad RAM? How can I make sure? What else could it be?

    Have you tried running memtest86?  I've never used that myself
but
    some folks on the list say it works well.

No, I haven't tried that yet, but I'm surely going to do so tomorrow.

    > Here's a small excerpt from the logfile:

    > 2002-08-06 17:36:23 [17296]  DEBUG:  _mdfd_blind_getseg:
couldn't open
    > /var/lib/pgsql/data/base/base/16596/16671: Cannot allocate
memory

    Is it possible that you are running with inadequate swap space,
a small
    data segment limit (ulimit -d), or something else that would
make the
    kernel refuse to give memory to a backend process?

I shouldn't think so; the machine has 2 GB RAM (that was more than
sufficient for the same DB, applications and load on a different
machine) and 4 GB swap:
Disk geometry for /dev/sda: 0.000-51834.000 megabytes
Disk label type: msdos
Minor    Start       End     Type      Filesystem  Flags
1          0.031     15.688  primary   ext3        boot
2         15.688   4118.225  primary   linux-swap
3       4118.225  24599.531  primary   ext3
4      24599.531  51826.882  primary   ext3

Taking a closer look I am a bit confused: I allocated 4GB the swap
partition, as you can see above, but free only reports 2GB? That's
strange, but cannot be the cause, I think, as the working machine has
got just 2 GB swap, too. ulimit is set to "unlimited" and there was RAM
available during load. As a matter of fact, right now free reports:

             total       used       free     shared    buffers
cached
Mem:       2061536    2053816       7720          0       4496
1825620
-/+ buffers/cache:     223700    1837836
Swap:      2097136     124800    1972336

on our fallback-machine, and that's the very same database and very same
application, it is running. When taking a look at total disk usage of
the database, I get a total of 1,8 GB. When I switched to the new
machine, there were about 30-50 open connections, max. connections is
set to 512 on both machines. The crashes occurred immediately after
making the DB accessible to our application, so most of the DB was
definitely not yet in memory. And again - our fallback-machine which has
got no RAID and slower processors can handle the very same DB under the
very same load with no such problems - I never ever encountered this
"cannot allocate memory" error before.

    > 2002-08-06 17:40:53 [16530]  DEBUG:  connection startup failed
(fork
    > failure): Cannot allocate memory
    > 2002-08-06 17:52:50 [16530]  DEBUG:  connection startup failed
(fork
    > failure): Cannot allocate memory

    Still looks like inadequate memory --- but now I'm thinking that
it's a
    system-wide condition, ie, you just plain haven't got enough RAM
for the
    number of processes you're trying to start.

    > 2002-08-06 17:52:54 [16530]  DEBUG:  server process (pid
18237) was
    > terminated by signal 9

    Postgres never issues any kill -9 on itself, but I've heard that
the
    Linux kernel may start killing processes when it's desperately
low on
    memory.

    Other than the signal 9, everything I see in this trace is
either a
    cannot-allocate-memory failure or followup effects from one.
How many
    backends are you trying to start up, anyway?  Might you have a
runaway
    client that keeps opening new backend connections?

Must be something else - the number of connections was not at all high
(<100), the server-load wasn't more than 3.5 (on a 4-processor machine),
there was RAM available at the time, both physical and swap, I haven't
got any surplus daemons running... I think I'll be able to harden the
bad-RAM-issue tomorrow using memtest86.

Thank you!

Regards,

     Markus

pgsql-general by date:

From: Tom Lane
Date: 06 August 2002, 13:01:06
Subject: Re: URGENT: Database keeps crashing - suspect damaged RAM

From: nuno.barros@fc.ul.pt (Nuno Barros)
Date: 06 August 2002, 14:28:06
Subject: View the contents of a database

Re: URGENT: Database keeps crashing - suspect damaged RAM - Mailing list pgsql-general

Previous

Next