Re: URGENT: Database keeps crashing - suspect damaged RAM - Mailing list pgsql-general
From | Markus Wollny |
---|---|
Subject | Re: URGENT: Database keeps crashing - suspect damaged RAM |
Date | |
Msg-id | 2266D0630E43BB4290742247C8910575014CE342@dozer.computec.de Whole thread Raw |
In response to | URGENT: Database keeps crashing - suspect damaged RAM ("Markus Wollny" <Markus.Wollny@computec.de>) |
Responses |
Re: URGENT: Database keeps crashing - suspect damaged RAM
(Jeff Davis <list-pgsql-general@empires.org>)
Re: URGENT: Database keeps crashing - suspect damaged RAM ("scott.marlowe" <scott.marlowe@ihs.com>) |
List | pgsql-general |
Hi! -----Ursprüngliche Nachricht----- Von: Tom Lane Gesendet: Di 06.08.2002 18:59 An: Markus Wollny Cc: pgsql-general@postgresql.org Betreff: Re: [GENERAL] URGENT: Database keeps crashing - suspect damaged RAM "Markus Wollny" <Markus.Wollny@computec.de> writes: > So: Is it bad RAM? How can I make sure? What else could it be? Have you tried running memtest86? I've never used that myself but some folks on the list say it works well. No, I haven't tried that yet, but I'm surely going to do so tomorrow. > Here's a small excerpt from the logfile: > 2002-08-06 17:36:23 [17296] DEBUG: _mdfd_blind_getseg: couldn't open > /var/lib/pgsql/data/base/base/16596/16671: Cannot allocate memory Is it possible that you are running with inadequate swap space, a small data segment limit (ulimit -d), or something else that would make the kernel refuse to give memory to a backend process? I shouldn't think so; the machine has 2 GB RAM (that was more than sufficient for the same DB, applications and load on a different machine) and 4 GB swap: Disk geometry for /dev/sda: 0.000-51834.000 megabytes Disk label type: msdos Minor Start End Type Filesystem Flags 1 0.031 15.688 primary ext3 boot 2 15.688 4118.225 primary linux-swap 3 4118.225 24599.531 primary ext3 4 24599.531 51826.882 primary ext3 Taking a closer look I am a bit confused: I allocated 4GB the swap partition, as you can see above, but free only reports 2GB? That's strange, but cannot be the cause, I think, as the working machine has got just 2 GB swap, too. ulimit is set to "unlimited" and there was RAM available during load. As a matter of fact, right now free reports: total used free shared buffers cached Mem: 2061536 2053816 7720 0 4496 1825620 -/+ buffers/cache: 223700 1837836 Swap: 2097136 124800 1972336 on our fallback-machine, and that's the very same database and very same application, it is running. When taking a look at total disk usage of the database, I get a total of 1,8 GB. When I switched to the new machine, there were about 30-50 open connections, max. connections is set to 512 on both machines. The crashes occurred immediately after making the DB accessible to our application, so most of the DB was definitely not yet in memory. And again - our fallback-machine which has got no RAID and slower processors can handle the very same DB under the very same load with no such problems - I never ever encountered this "cannot allocate memory" error before. > 2002-08-06 17:40:53 [16530] DEBUG: connection startup failed (fork > failure): Cannot allocate memory > 2002-08-06 17:52:50 [16530] DEBUG: connection startup failed (fork > failure): Cannot allocate memory Still looks like inadequate memory --- but now I'm thinking that it's a system-wide condition, ie, you just plain haven't got enough RAM for the number of processes you're trying to start. > 2002-08-06 17:52:54 [16530] DEBUG: server process (pid 18237) was > terminated by signal 9 Postgres never issues any kill -9 on itself, but I've heard that the Linux kernel may start killing processes when it's desperately low on memory. Other than the signal 9, everything I see in this trace is either a cannot-allocate-memory failure or followup effects from one. How many backends are you trying to start up, anyway? Might you have a runaway client that keeps opening new backend connections? Must be something else - the number of connections was not at all high (<100), the server-load wasn't more than 3.5 (on a 4-processor machine), there was RAM available at the time, both physical and swap, I haven't got any surplus daemons running... I think I'll be able to harden the bad-RAM-issue tomorrow using memtest86. Thank you! Regards, Markus
pgsql-general by date: