Re: URGENT: Database keeps crashing - suspect damaged RAM - Mailing list pgsql-general
From | Jeff Davis |
---|---|
Subject | Re: URGENT: Database keeps crashing - suspect damaged RAM |
Date | |
Msg-id | 200208061129.18868.list-pgsql-general@empires.org Whole thread Raw |
In response to | Re: URGENT: Database keeps crashing - suspect damaged RAM ("Markus Wollny" <Markus.Wollny@computec.de>) |
List | pgsql-general |
Virtual memory problems on linux have certainly happened before; perhaps your running a kernel that had some major ones. Maybe if you upgraded to 2.4.19? Regards, Jeff Davis On Tuesday 06 August 2002 11:02 am, Markus Wollny wrote: > Hi! > > -----Ursprüngliche Nachricht----- > Von: Tom Lane > Gesendet: Di 06.08.2002 18:59 > An: Markus Wollny > Cc: pgsql-general@postgresql.org > Betreff: Re: [GENERAL] URGENT: Database keeps crashing - suspect > damaged RAM > > > > "Markus Wollny" <Markus.Wollny@computec.de> writes: > > > So: Is it bad RAM? How can I make sure? What else could it be? > > > Have you tried running memtest86? I've never used that myself > but > some folks on the list say it works well. > > > > No, I haven't tried that yet, but I'm surely going to do so tomorrow. > > > > Here's a small excerpt from the logfile: > > > > > 2002-08-06 17:36:23 [17296] DEBUG: _mdfd_blind_getseg: > > couldn't open > > > /var/lib/pgsql/data/base/base/16596/16671: Cannot allocate > > memory > > Is it possible that you are running with inadequate swap space, > a small > data segment limit (ulimit -d), or something else that would > make the > kernel refuse to give memory to a backend process? > > I shouldn't think so; the machine has 2 GB RAM (that was more than > sufficient for the same DB, applications and load on a different > machine) and 4 GB swap: > Disk geometry for /dev/sda: 0.000-51834.000 megabytes > Disk label type: msdos > Minor Start End Type Filesystem Flags > 1 0.031 15.688 primary ext3 boot > 2 15.688 4118.225 primary linux-swap > 3 4118.225 24599.531 primary ext3 > 4 24599.531 51826.882 primary ext3 > > Taking a closer look I am a bit confused: I allocated 4GB the swap > partition, as you can see above, but free only reports 2GB? That's > strange, but cannot be the cause, I think, as the working machine has > got just 2 GB swap, too. ulimit is set to "unlimited" and there was RAM > available during load. As a matter of fact, right now free reports: > > total used free shared buffers > cached > Mem: 2061536 2053816 7720 0 4496 > 1825620 > -/+ buffers/cache: 223700 1837836 > Swap: 2097136 124800 1972336 > > on our fallback-machine, and that's the very same database and very same > application, it is running. When taking a look at total disk usage of > the database, I get a total of 1,8 GB. When I switched to the new > machine, there were about 30-50 open connections, max. connections is > set to 512 on both machines. The crashes occurred immediately after > making the DB accessible to our application, so most of the DB was > definitely not yet in memory. And again - our fallback-machine which has > got no RAID and slower processors can handle the very same DB under the > very same load with no such problems - I never ever encountered this > "cannot allocate memory" error before. > > > > 2002-08-06 17:40:53 [16530] DEBUG: connection startup failed > > (fork > > > failure): Cannot allocate memory > > 2002-08-06 17:52:50 [16530] DEBUG: connection startup failed > > (fork > > > failure): Cannot allocate memory > > > Still looks like inadequate memory --- but now I'm thinking that > it's a > system-wide condition, ie, you just plain haven't got enough RAM > for the > number of processes you're trying to start. > > > > 2002-08-06 17:52:54 [16530] DEBUG: server process (pid > > 18237) was > > > terminated by signal 9 > > > Postgres never issues any kill -9 on itself, but I've heard that > the > Linux kernel may start killing processes when it's desperately > low on > memory. > > Other than the signal 9, everything I see in this trace is > either a > cannot-allocate-memory failure or followup effects from one. > How many > backends are you trying to start up, anyway? Might you have a > runaway > client that keeps opening new backend connections? > > > Must be something else - the number of connections was not at all high > (<100), the server-load wasn't more than 3.5 (on a 4-processor machine), > there was RAM available at the time, both physical and swap, I haven't > got any surplus daemons running... I think I'll be able to harden the > bad-RAM-issue tomorrow using memtest86. > > Thank you! > > Regards, > > Markus > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
pgsql-general by date: