Thread: Postgresql Server Restart continuously
Hello you out there, I'm having some strange problem with a server postgresql 7.4.3, some times the server crashes and restarts inmediatly, heres is the error message catch from the log file ERROR: cache lookup failed for namespace 105183855 LOG: server process (PID 3942) exited with exit code 1 LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and e xit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and e xit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and e xit, because another server process exited abnormally and possibly corrupted shared memory. . . . . LOG: all server processes terminated; reinitializing LOG: could not open file "/data2/datos/postmaster.pid": No such file or directory LOG: database system was interrupted at 2004-08-26 09:58:21 COT LOG: checkpoint record is at 24/6F7B343C LOG: redo record is at 24/6F73C3C8; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 5006358; next OID: 176076757 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 24/6F73C3C8 LOG: record with zero length at 24/6FC94B6C LOG: redo done at 24/6FC94B48 LOG: recycled transaction log file "000000240000006C" LOG: removing transaction log file "000000240000006D" LOG: removing transaction log file "000000240000006E" LOG: database system is ready Heres is the output of the ipcs command ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x0052e2c1 65536 postgres 600 278257664 63 ------ Semaphore Arrays -------- key semid owner perms nsems 0x00000000 32768 apache 600 1 0x00000000 65537 apache 600 1 0x0052e2c1 786434 postgres 600 17 0x0052e2c2 819203 postgres 600 17 0x0052e2c3 851972 postgres 600 17 0x0052e2c4 884741 postgres 600 17 0x0052e2c5 917510 postgres 600 17 0x0052e2c6 950279 postgres 600 17 0x0052e2c7 983048 postgres 600 17 0x0052e2c8 1015817 postgres 600 17 0x0052e2c9 1048586 postgres 600 17 0x0052e2ca 1081355 postgres 600 17 ------ Message Queues -------- key msqid owner perms used-bytes messages I've notice that every time this happens postgresql idicates a lookup error pointing to a namespace but I don't know what object is. ERROR: cache lookup failed for namespace 105183855 ERROR: cache lookup failed for namespace 185104342 I've looked at the manual for some advice or action to take when this kind of thing happends but I couldn't find anything (or maybe the answer is rigth there but I just can't see it). Surfing the web someone post a messages indicating to reindex database when the cache lookup failed happend...Is this a real solution...? Here are some params that I've modified from the postgresql.conf file max_connections = 150 shared_buffers = 32768 sort_mem = 2048 vacumm_mem = 32568 max_fsm_pages = 200000 max_fsm_relations = 200 max_files_per_process = 10000 wal_buffers = 256 checkpoint_segments = 10 checkpoint_timeout = 600 effective_cache_size = 10000 random_page_cost = 2 kernel.shmmax = 4000000000 kernelshmall = 4000000000 Those values are high but the hardware plataform is roboust (I guess) Dell power edge 6600, 16Gb RAM, SCSI RAID 5 (200Gb total), 4 cpus. Do you think that this values a correct or maybe one of those are the origin of the problem...? Thanks in advance, Alvaro
alvaro@audifarma.com.co writes: > I'm having some strange problem with a server postgresql 7.4.3, some times > the server crashes and restarts inmediatly, heres is the error message > catch from the log file > ERROR: cache lookup failed for namespace 105183855 > LOG: server process (PID 3942) exited with exit code 1 I have a suspicion that this has something to do with trying to delete temp tables during backend exit, but it's not going to be possible to find it without a lot more detail. You might try turning on query logging so you can see exactly what the failed process did before crashing. regards, tom lane
Hi, Thanks a lot to all of you for your help, after long nigths looking at the log files for a possible cause, we find out that this error condition refered to some kind of problem with the SCSI array controller (hardware) that was not reported by the watchdog program installed by the hardware provider, -the control panel always indicated that everything was ok, we run some special tests and nothing unusual was reported by that time...-, and last week the server completly froze and we were able to see an error message pointing to that specific device, our provider replace the controller immediately, by now the error reported by the postgresql server does not appear anymore. Special Thanks to Mr. Tom Lane for your concern since we post a message for first time. Alvaro alvaro ( at ) audifarma ( dot ) com ( dot ) co writes: > I'm having some strange problem with a server postgresql 7.4.3, some times > the server crashes and restarts inmediatly, heres is the error message > catch from the log file > ERROR: cache lookup failed for namespace 105183855 > LOG: server process (PID 3942) exited with exit code 1 I have a suspicion that this has something to do with trying to delete temp tables during backend exit, but it's not going to be possible to find it without a lot more detail. You might try turning on query logging so you can see exactly what the failed process did before crashing. regards, tom lane regards, tom lane