Home > mailing lists

Re: URGENT: Database keeps crashing - suspect damaged RAM - Mailing list pgsql-general

From	Jeff Davis
Subject	Re: URGENT: Database keeps crashing - suspect damaged RAM
Date	August 6, 2002 14:30:33
Msg-id	200208061129.18868.list-pgsql-general@empires.org Whole thread Raw
In response to	Re: URGENT: Database keeps crashing - suspect damaged RAM ("Markus Wollny" <Markus.Wollny@computec.de>)
List	pgsql-general

Tree view

Virtual memory problems on linux have certainly happened before; perhaps your
running a kernel that had some major ones. Maybe if you upgraded to 2.4.19?

Regards,
    Jeff Davis

On Tuesday 06 August 2002 11:02 am, Markus Wollny wrote:
> Hi!
>
>     -----Ursprüngliche Nachricht-----
>     Von: Tom Lane
>     Gesendet: Di 06.08.2002 18:59
>     An: Markus Wollny
>     Cc: pgsql-general@postgresql.org
>     Betreff: Re: [GENERAL] URGENT: Database keeps crashing - suspect
> damaged RAM
>
>
>
>     "Markus Wollny" <Markus.Wollny@computec.de> writes:
>
>     > So: Is it bad RAM? How can I make sure? What else could it be?
>
>
>     Have you tried running memtest86?  I've never used that myself
> but
>     some folks on the list say it works well.
>
>
>
> No, I haven't tried that yet, but I'm surely going to do so tomorrow.
>
>
>     > Here's a small excerpt from the logfile:
>
>
>
>     > 2002-08-06 17:36:23 [17296]  DEBUG:  _mdfd_blind_getseg:
>
> couldn't open
>
>     > /var/lib/pgsql/data/base/base/16596/16671: Cannot allocate
>
> memory
>
>     Is it possible that you are running with inadequate swap space,
> a small
>     data segment limit (ulimit -d), or something else that would
> make the
>     kernel refuse to give memory to a backend process?
>
> I shouldn't think so; the machine has 2 GB RAM (that was more than
> sufficient for the same DB, applications and load on a different
> machine) and 4 GB swap:
> Disk geometry for /dev/sda: 0.000-51834.000 megabytes
> Disk label type: msdos
> Minor    Start       End     Type      Filesystem  Flags
> 1          0.031     15.688  primary   ext3        boot
> 2         15.688   4118.225  primary   linux-swap
> 3       4118.225  24599.531  primary   ext3
> 4      24599.531  51826.882  primary   ext3
>
> Taking a closer look I am a bit confused: I allocated 4GB the swap
> partition, as you can see above, but free only reports 2GB? That's
> strange, but cannot be the cause, I think, as the working machine has
> got just 2 GB swap, too. ulimit is set to "unlimited" and there was RAM
> available during load. As a matter of fact, right now free reports:
>
>              total       used       free     shared    buffers
> cached
> Mem:       2061536    2053816       7720          0       4496
> 1825620
> -/+ buffers/cache:     223700    1837836
> Swap:      2097136     124800    1972336
>
> on our fallback-machine, and that's the very same database and very same
> application, it is running. When taking a look at total disk usage of
> the database, I get a total of 1,8 GB. When I switched to the new
> machine, there were about 30-50 open connections, max. connections is
> set to 512 on both machines. The crashes occurred immediately after
> making the DB accessible to our application, so most of the DB was
> definitely not yet in memory. And again - our fallback-machine which has
> got no RAID and slower processors can handle the very same DB under the
> very same load with no such problems - I never ever encountered this
> "cannot allocate memory" error before.
>
>
>     > 2002-08-06 17:40:53 [16530]  DEBUG:  connection startup failed
>
> (fork
>
>     > failure): Cannot allocate memory
>     > 2002-08-06 17:52:50 [16530]  DEBUG:  connection startup failed
>
> (fork
>
>     > failure): Cannot allocate memory
>
>
>     Still looks like inadequate memory --- but now I'm thinking that
> it's a
>     system-wide condition, ie, you just plain haven't got enough RAM
> for the
>     number of processes you're trying to start.
>
>
>     > 2002-08-06 17:52:54 [16530]  DEBUG:  server process (pid
>
> 18237) was
>
>     > terminated by signal 9
>
>
>     Postgres never issues any kill -9 on itself, but I've heard that
> the
>     Linux kernel may start killing processes when it's desperately
> low on
>     memory.
>
>     Other than the signal 9, everything I see in this trace is
> either a
>     cannot-allocate-memory failure or followup effects from one.
> How many
>     backends are you trying to start up, anyway?  Might you have a
> runaway
>     client that keeps opening new backend connections?
>
>
> Must be something else - the number of connections was not at all high
> (<100), the server-load wasn't more than 3.5 (on a 4-processor machine),
> there was RAM available at the time, both physical and swap, I haven't
> got any surplus daemons running... I think I'll be able to harden the
> bad-RAM-issue tomorrow using memtest86.
>
> Thank you!
>
> Regards,
>
>      Markus
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

pgsql-general by date:

From: "pilar"
Date: 06 August 2002, 14:28:18
Subject: older version

From: "Chetan"
Date: 06 August 2002, 14:41:15
Subject: Problem: ALTER TABLE with Jet as provider

Re: URGENT: Database keeps crashing - suspect damaged RAM - Mailing list pgsql-general

Previous

Next