Thread: Postgresql Server Restart continuously

Postgresql Server Restart continuously

From
alvaro@audifarma.com.co
Date:
Hello you out there,

I'm having some strange problem with a server postgresql 7.4.3, some times
the server crashes and restarts inmediatly, heres is the error message
catch from the log file

ERROR:  cache lookup failed for namespace 105183855
LOG:  server process (PID 3942) exited with exit code 1
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and e
xit, because another server process exited abnormally and possibly
corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and e
xit, because another server process exited abnormally and possibly
corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and e
xit, because another server process exited abnormally and possibly
corrupted shared memory.
.
.
.
.
LOG:  all server processes terminated; reinitializing
LOG:  could not open file "/data2/datos/postmaster.pid": No such file or
directory
LOG:  database system was interrupted at 2004-08-26 09:58:21 COT
LOG:  checkpoint record is at 24/6F7B343C
LOG:  redo record is at 24/6F73C3C8; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 5006358; next OID: 176076757
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at 24/6F73C3C8
LOG:  record with zero length at 24/6FC94B6C
LOG:  redo done at 24/6FC94B48
LOG:  recycled transaction log file "000000240000006C"
LOG:  removing transaction log file "000000240000006D"
LOG:  removing transaction log file "000000240000006E"
LOG:  database system is ready


Heres is the output of the ipcs command

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x0052e2c1 65536      postgres  600        278257664  63

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0x00000000 32768      apache    600        1
0x00000000 65537      apache    600        1
0x0052e2c1 786434     postgres  600        17
0x0052e2c2 819203     postgres  600        17
0x0052e2c3 851972     postgres  600        17
0x0052e2c4 884741     postgres  600        17
0x0052e2c5 917510     postgres  600        17
0x0052e2c6 950279     postgres  600        17
0x0052e2c7 983048     postgres  600        17
0x0052e2c8 1015817    postgres  600        17
0x0052e2c9 1048586    postgres  600        17
0x0052e2ca 1081355    postgres  600        17

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages

I've notice that every time this happens postgresql idicates a lookup
error pointing to a namespace but I don't know what object is.

ERROR:  cache lookup failed for namespace 105183855
ERROR:  cache lookup failed for namespace 185104342

I've looked at the manual for some advice or action to take when this kind
of thing happends but I couldn't find anything (or maybe the answer is
rigth there but I just can't see it).

Surfing the web someone post a messages indicating to reindex database
when the cache lookup failed happend...Is this  a real solution...?

Here are some params that I've modified from the postgresql.conf file

max_connections       = 150
shared_buffers        = 32768
sort_mem              = 2048
vacumm_mem            = 32568
max_fsm_pages         = 200000
max_fsm_relations     = 200
max_files_per_process = 10000
wal_buffers           = 256
checkpoint_segments   = 10
checkpoint_timeout    = 600
effective_cache_size  = 10000
random_page_cost      = 2


kernel.shmmax = 4000000000
kernelshmall  = 4000000000


Those values are high but the hardware plataform is roboust (I guess)

Dell power edge 6600, 16Gb RAM, SCSI RAID 5 (200Gb total), 4 cpus.

Do you think that this values a correct or maybe one of those are the
origin of the problem...?

Thanks in advance,

Alvaro








Re: Postgresql Server Restart continuously

From
Tom Lane
Date:
alvaro@audifarma.com.co writes:
> I'm having some strange problem with a server postgresql 7.4.3, some times
> the server crashes and restarts inmediatly, heres is the error message
> catch from the log file

> ERROR:  cache lookup failed for namespace 105183855
> LOG:  server process (PID 3942) exited with exit code 1

I have a suspicion that this has something to do with trying to delete
temp tables during backend exit, but it's not going to be possible to
find it without a lot more detail.  You might try turning on query
logging so you can see exactly what the failed process did before
crashing.

            regards, tom lane

Re: Postgresql Server Restart continuously

From
alvaro@audifarma.com.co
Date:
Hi,

Thanks a lot to all of you for your help, after long nigths looking at the
log files for a possible cause, we find out that this error condition
refered to some kind of problem with the SCSI array controller (hardware)
that was not reported by the watchdog program installed by the hardware
provider, -the control panel always indicated that everything was ok, we
run some special tests and nothing unusual was reported by that time...-,
and last week the server completly froze and we were able to see an error
message pointing to that specific device, our provider replace the
controller immediately, by now the error reported by the postgresql server
does not appear anymore.

Special Thanks to Mr. Tom Lane for your concern since we post a message
for first time.


Alvaro



alvaro ( at ) audifarma ( dot ) com ( dot ) co writes:
> I'm having some strange problem with a server postgresql 7.4.3, some times
> the server crashes and restarts inmediatly, heres is the error message
> catch from the log file

> ERROR:  cache lookup failed for namespace 105183855
> LOG:  server process (PID 3942) exited with exit code 1

I have a suspicion that this has something to do with trying to delete
temp tables during backend exit, but it's not going to be possible to
find it without a lot more detail.  You might try turning on query
logging so you can see exactly what the failed process did before
crashing.

                         regards, tom lane


            regards, tom lane