Re: URGENT: Database keeps crashing - suspect damaged RAM - Mailing list pgsql-general
From | Markus Wollny |
---|---|
Subject | Re: URGENT: Database keeps crashing - suspect damaged RAM |
Date | |
Msg-id | 2266D0630E43BB4290742247C891057501B1321C@dozer.computec.de Whole thread Raw |
In response to | URGENT: Database keeps crashing - suspect damaged RAM ("Markus Wollny" <Markus.Wollny@computec.de>) |
List | pgsql-general |
Oh - and I forgot to mention: The crashes only occur when there is load on the machine. No load - no crashes. But then, that wouldn't be any surprise, as it wouldn't make use of a lot of RAM without any load... Regards, Markus > -----Ursprüngliche Nachricht----- > Von: Markus Wollny > Gesendet: Dienstag, 6. August 2002 18:38 > An: pgsql-general@postgresql.org > Betreff: [GENERAL] URGENT: Database keeps crashing - suspect > damaged RAM > > > Hello! > > I just installed PostgreSQL 7.2.1 on SuSE 7.3, 4xPIIIXEON 550MHz, 2GB > RAM, 5x18GB SCSI RAID. The OS was freshly installed, after that I > compiled and installed PostgreSQL from source (./configure > --prefix=/opt/pgsql/ --with-perl --enable-odbc --enable-locale > --enable-syslog). I copied the settings in postgresql.conf > etc. from an > identical machine running the identical platform. Then I imported a > database to the new installation. The import seems to be > successfull, I > didn't get any errors during import. A subsequent vacuum analyze did > finish without anything out of the ordinary. > > Just a few minutes after this vacuum analyze, the database crashed for > the first time. It keeps crashing every now and then - every > one or two > minutes. > > What puzzles me is the fact that this very same machine was running > Oracle 8i on Win2k more or less flawlessly just up to a few > hours before > - more or less meaning that we never really noticed anything > much out of > the ordinary. There might have been some minor issues after a > RAM-upgrade from 1 GB to 2 GB just a week ago, but looking back it's > hard to say if that could be due to bad RAM or just some bad > code which > we've sorted out (or disposed of) by now. As the machine is already > running Linux and PostgreSQL it's quite impossible to prove > my suspicion > by going back to Oracle and having a closer look. > > What I'd like to know is if I need to look any further than > RAM - shall > I just chuck the new modules out of the machine? Or is there > some other > issue that could cause this behaviour? I am quite sure that I > didn't do > anything wrong during installation, configuration and import and the > same application code is running without errors on a different machine > at this very moment. I don't like the "record with zero length" and > "Cannot allocate memory"-bits in the logfile at all, let > alone the "was > terminated by signal 9"-thingy. > > So: Is it bad RAM? How can I make sure? What else could it be? > > Here's a small excerpt from the logfile: > > 2002-08-06 17:31:38 [17063] DEBUG: Pages 0: Changed 0, > Empty 0; Tup 0: > Vac 0, Keep 0, UnUsed 0. > Total CPU 0.00s/0.00u sec elapsed 0.00 sec. > 2002-08-06 17:36:23 [17296] DEBUG: _mdfd_blind_getseg: couldn't open > /var/lib/pgsql/data/base/base/16596/16671: Cannot allocate memory > 2002-08-06 17:36:24 [17296] FATAL 2: cannot write block 13387 of > 16596/16671 blind: Cannot allocate memory > 2002-08-06 17:36:24 [16530] DEBUG: server process (pid 17296) exited > with exit code 2 > 2002-08-06 17:36:24 [16530] DEBUG: terminating any other > active server > processes > 2002-08-06 17:36:24 [17081] NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend > died abnormally and possibly corrupted shared memory. > I have rolled back the current transaction and am > going to terminate your database system connection and exit. > [...] > 2002-08-06 17:36:24 [16530] DEBUG: all server processes terminated; > reinitializing shared memory and semaphores > 2002-08-06 17:36:24 [17298] DEBUG: database system was > interrupted at > 2002-08-06 17:31:21 CEST > 2002-08-06 17:36:24 [17298] DEBUG: checkpoint record is at > 0/325D7C78 > 2002-08-06 17:36:24 [17298] DEBUG: redo record is at > 0/325D7C78; undo > record is at 0/0; shutdown FALSE > 2002-08-06 17:36:24 [17298] DEBUG: next transaction id: 2270; next > oid: 901292 > 2002-08-06 17:36:24 [17298] DEBUG: database system was not properly > shut down; automatic recovery in progress > 2002-08-06 17:36:24 [17298] DEBUG: redo starts at 0/325D7CB8 > 2002-08-06 17:36:25 [17298] DEBUG: ReadRecord: record with > zero length > at 0/326E16C4 > 2002-08-06 17:36:25 [17298] DEBUG: redo done at 0/326E16A0 > 2002-08-06 17:36:30 [17298] DEBUG: database system is ready > 2002-08-06 17:40:53 [16530] DEBUG: connection startup failed (fork > failure): Cannot allocate memory > 2002-08-06 17:52:50 [16530] DEBUG: connection startup failed (fork > failure): Cannot allocate memory > 2002-08-06 17:52:54 [16530] DEBUG: server process (pid 18237) was > terminated by signal 9 > 2002-08-06 17:52:54 [16530] DEBUG: terminating any other > active server > processes > 2002-08-06 17:52:54 [18234] NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend > died abnormally and possibly corrupted shared memory. > I have rolled back the current transaction and am > going to terminate your database system connection and exit. > [...] > 2002-08-06 17:52:57 [18253] FATAL 1: The database system is in > recovery mode > 2002-08-06 17:52:57 [18255] FATAL 1: The database system is in > recovery mode > 2002-08-06 17:52:57 [18254] FATAL 1: The database system is in > recovery mode > 2002-08-06 17:52:57 [18235] NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend > died abnormally and possibly corrupted shared memory. > I have rolled back the current transaction and am > going to terminate your database system connection and exit. > Please reconnect to the database system and repeat your query. > 2002-08-06 17:52:57 [18256] FATAL 1: The database system is in > recovery mode > 2002-08-06 17:52:57 [18257] FATAL 1: The database system is in > recovery mode > 2002-08-06 17:52:57 [18258] FATAL 1: The database system is in > recovery mode > 2002-08-06 17:52:57 [16530] DEBUG: all server processes terminated; > reinitializing shared memory and semaphores > 2002-08-06 17:52:57 [18260] FATAL 1: The database system is starting > up > 2002-08-06 17:52:57 [18259] DEBUG: database system was > interrupted at > 2002-08-06 17:51:38 CEST > 2002-08-06 17:52:57 [18259] DEBUG: checkpoint record is at > 0/32991848 > 2002-08-06 17:52:57 [18259] DEBUG: redo record is at > 0/3297F4D8; undo > record is at 0/0; shutdown FALSE > 2002-08-06 17:52:57 [18259] DEBUG: next transaction id: 3704; next > oid: 909484 > 2002-08-06 17:52:57 [18259] DEBUG: database system was not properly > shut down; automatic recovery in progress > 2002-08-06 17:52:57 [18259] DEBUG: redo starts at 0/3297F4D8 > 2002-08-06 17:52:57 [18261] FATAL 1: The database system is starting > up > 2002-08-06 17:52:58 [18259] DEBUG: ReadRecord: record with > zero length > at 0/32BF0278 > 2002-08-06 17:52:58 [18259] DEBUG: redo done at 0/32BF0254 > 2002-08-06 17:52:59 [18262] FATAL 1: The database system is starting > up > 2002-08-06 17:53:00 [18259] DEBUG: database system is ready > 2002-08-06 17:54:24 [16530] DEBUG: connection startup failed (fork > failure): Cannot allocate memory > 2002-08-06 17:54:31 [16530] DEBUG: server process (pid 18283) was > terminated by signal 9 > 2002-08-06 17:54:31 [16530] DEBUG: terminating any other > active server > processes > 2002-08-06 17:54:31 [18275] NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend > died abnormally and possibly corrupted shared memory. > I have rolled back the current transaction and am > going to terminate your database system connection and exit. > Please reconnect to the database system and repeat your query. > [...] > 2002-08-06 17:54:32 [16530] DEBUG: all server processes terminated; > reinitializing shared memory and semaphores > 2002-08-06 17:54:32 [18296] DEBUG: database system was > interrupted at > 2002-08-06 17:53:00 CEST > 2002-08-06 17:54:32 [18296] DEBUG: checkpoint record is at > 0/32BF0278 > 2002-08-06 17:54:32 [18296] DEBUG: redo record is at > 0/32BF0278; undo > record is at 0/0; shutdown TRUE > 2002-08-06 17:54:32 [18296] DEBUG: next transaction id: 4456; next > oid: 909484 > 2002-08-06 17:54:32 [18296] DEBUG: database system was not properly > shut down; automatic recovery in progress > 2002-08-06 17:54:32 [18296] DEBUG: redo starts at 0/32BF02B8 > 2002-08-06 17:54:32 [18296] DEBUG: ReadRecord: record with > zero length > at 0/32F0B3C0 > 2002-08-06 17:54:32 [18296] DEBUG: redo done at 0/32F0B39C > 2002-08-06 17:54:34 [18297] FATAL 1: The database system is starting > up > 2002-08-06 17:54:34 [18298] FATAL 1: The database system is starting > up > 2002-08-06 17:54:34 [18299] FATAL 1: The database system is starting > up > 2002-08-06 17:54:34 [18300] FATAL 1: The database system is starting > up > 2002-08-06 17:54:34 [18296] DEBUG: database system is ready > 2002-08-06 17:57:35 [16530] DEBUG: connection startup failed (fork > failure): Cannot allocate memory > 2002-08-06 17:57:54 [16530] DEBUG: server process (pid 18366) was > terminated by signal 9 > 2002-08-06 17:57:54 [16530] DEBUG: terminating any other > active server > processes > 2002-08-06 17:57:54 [18368] NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend > died abnormally and possibly corrupted shared memory. > I have rolled back the current transaction and am > going to terminate your database system connection and exit. > Please reconnect to the database system and repeat your query. > 2002-08-06 17:57:56 [18409] DEBUG: ReadRecord: record with > zero length > at 0/3338749C > 2002-08-06 17:57:58 [18425] FATAL 1: The database system is starting > up > 2002-08-06 17:57:58 [18409] DEBUG: database system is ready > 2002-08-06 17:58:53 [18432] NOTICE: RelationBuildDesc: can't open > idx_bm_user_id: Cannot allocate memory > 2002-08-06 17:59:00 [18443] FATAL 1: cannot open > pg_attribute: Cannot > allocate memory > 2002-08-06 17:59:01 [16530] DEBUG: connection startup failed (fork > failure): Cannot allocate memory > 2002-08-06 17:59:01 [16530] DEBUG: server process (pid 18436) was > terminated by signal 9 > 2002-08-06 17:59:01 [16530] DEBUG: terminating any other > active server > processes > 2002-08-06 17:59:03 [18510] DEBUG: ReadRecord: record with > zero length > at 0/336E9970 > 2002-08-06 18:00:15 [16530] DEBUG: connection startup failed (fork > failure): Cannot allocate memory > 2002-08-06 18:00:17 [18589] DEBUG: ReadRecord: record with > zero length > at 0/33A7C194 > > Thank you for your kind assistance! > > Regards, > > Markus Wollny > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html >
pgsql-general by date: