Re: URGENT: Database keeps crashing - suspect damaged RAM - Mailing list pgsql-general
From | Markus Wollny |
---|---|
Subject | Re: URGENT: Database keeps crashing - suspect damaged RAM |
Date | |
Msg-id | 2266D0630E43BB4290742247C891057501B13229@dozer.computec.de Whole thread Raw |
In response to | URGENT: Database keeps crashing - suspect damaged RAM ("Markus Wollny" <Markus.Wollny@computec.de>) |
List | pgsql-general |
Hi! I think I'll have to bow down to you gurus - again :) I upgraded to 2.4.16 (there are no RPMs for 2.4.19 and I didn't want to compile from source - yet), and the symptoms have disappeared altogether. Which is strange because, as I already told, the very same config isn't giving me any trouble on a different machine... Anyway: I'll shun 2.4.10 from now on. Regards, Markus > -----Ursprüngliche Nachricht----- > Von: Jeff Davis [mailto:list-pgsql-general@empires.org] > Gesendet: Dienstag, 6. August 2002 20:29 > An: Markus Wollny; Tom Lane > Cc: pgsql-general@postgresql.org > Betreff: Re: [GENERAL] URGENT: Database keeps crashing - > suspect damaged > RAM > > > Virtual memory problems on linux have certainly happened > before; perhaps your > running a kernel that had some major ones. Maybe if you > upgraded to 2.4.19? > > Regards, > Jeff Davis > > On Tuesday 06 August 2002 11:02 am, Markus Wollny wrote: > > Hi! > > > > -----Ursprüngliche Nachricht----- > > Von: Tom Lane > > Gesendet: Di 06.08.2002 18:59 > > An: Markus Wollny > > Cc: pgsql-general@postgresql.org > > Betreff: Re: [GENERAL] URGENT: Database keeps crashing - suspect > > damaged RAM > > > > > > > > "Markus Wollny" <Markus.Wollny@computec.de> writes: > > > > > So: Is it bad RAM? How can I make sure? What else could it be? > > > > > > Have you tried running memtest86? I've never used that myself > > but > > some folks on the list say it works well. > > > > > > > > No, I haven't tried that yet, but I'm surely going to do so > tomorrow. > > > > > > > Here's a small excerpt from the logfile: > > > > > > > > > 2002-08-06 17:36:23 [17296] DEBUG: _mdfd_blind_getseg: > > > > couldn't open > > > > > /var/lib/pgsql/data/base/base/16596/16671: Cannot allocate > > > > memory > > > > Is it possible that you are running with inadequate swap space, > > a small > > data segment limit (ulimit -d), or something else that would > > make the > > kernel refuse to give memory to a backend process? > > > > I shouldn't think so; the machine has 2 GB RAM (that was more than > > sufficient for the same DB, applications and load on a different > > machine) and 4 GB swap: > > Disk geometry for /dev/sda: 0.000-51834.000 megabytes > > Disk label type: msdos > > Minor Start End Type Filesystem Flags > > 1 0.031 15.688 primary ext3 boot > > 2 15.688 4118.225 primary linux-swap > > 3 4118.225 24599.531 primary ext3 > > 4 24599.531 51826.882 primary ext3 > > > > Taking a closer look I am a bit confused: I allocated 4GB the swap > > partition, as you can see above, but free only reports 2GB? That's > > strange, but cannot be the cause, I think, as the working > machine has > > got just 2 GB swap, too. ulimit is set to "unlimited" and > there was RAM > > available during load. As a matter of fact, right now free reports: > > > > total used free shared buffers > > cached > > Mem: 2061536 2053816 7720 0 4496 > > 1825620 > > -/+ buffers/cache: 223700 1837836 > > Swap: 2097136 124800 1972336 > > > > on our fallback-machine, and that's the very same database > and very same > > application, it is running. When taking a look at total > disk usage of > > the database, I get a total of 1,8 GB. When I switched to the new > > machine, there were about 30-50 open connections, max. > connections is > > set to 512 on both machines. The crashes occurred immediately after > > making the DB accessible to our application, so most of the DB was > > definitely not yet in memory. And again - our > fallback-machine which has > > got no RAID and slower processors can handle the very same > DB under the > > very same load with no such problems - I never ever encountered this > > "cannot allocate memory" error before. > > > > > > > 2002-08-06 17:40:53 [16530] DEBUG: connection startup failed > > > > (fork > > > > > failure): Cannot allocate memory > > > 2002-08-06 17:52:50 [16530] DEBUG: connection startup failed > > > > (fork > > > > > failure): Cannot allocate memory > > > > > > Still looks like inadequate memory --- but now I'm thinking that > > it's a > > system-wide condition, ie, you just plain haven't got enough RAM > > for the > > number of processes you're trying to start. > > > > > > > 2002-08-06 17:52:54 [16530] DEBUG: server process (pid > > > > 18237) was > > > > > terminated by signal 9 > > > > > > Postgres never issues any kill -9 on itself, but I've heard that > > the > > Linux kernel may start killing processes when it's desperately > > low on > > memory. > > > > Other than the signal 9, everything I see in this trace is > > either a > > cannot-allocate-memory failure or followup effects from one. > > How many > > backends are you trying to start up, anyway? Might you have a > > runaway > > client that keeps opening new backend connections? > > > > > > Must be something else - the number of connections was not > at all high > > (<100), the server-load wasn't more than 3.5 (on a > 4-processor machine), > > there was RAM available at the time, both physical and > swap, I haven't > > got any surplus daemons running... I think I'll be able to > harden the > > bad-RAM-issue tomorrow using memtest86. > > > > Thank you! > > > > Regards, > > > > Markus > > > > > > ---------------------------(end of > broadcast)--------------------------- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to > majordomo@postgresql.org) > >
pgsql-general by date: