Thread: postgres (zombie)
I am sending this mail on hackers list because the bug list seems to be dissapeared ! Something strange start happening on my PostgreSQL server : Linux RedHat 5.2 i386 Pentium machine 64 Mb RAM PostgreSQL 6.4.2 official release there are a number of maximum 6 users working simultaneously but not so hard on the database that isn't so big (2 Mb dumped). the clients are Tcl/Tk programs. 3 clients are accesing server from a local network, 3 or 4 clients are accesing server through a serial 115 kb line through a CISCO. Till now, everything went ok, but sometimes, in the last few days, I found some postgres (<zombie>) processes and when every client is logging out, another postgres <zombie> process appears. I had to kill -SIGTERM the master, wait for 5 or 6 seconds and then restart it again. When 1 postgres <zombie> process is appearing, the current working clients can work ahead, no problem at all. But newer connections aren't accepted. ======= I am not sure, but I think that the serial line is broked sometimes and the client-server communication has small interrupts. Could it be possible that these problems hang up postgresql so bad ? Constantin Teodorescu FLEX Consulting Braila, ROMANIA
Constantin Teodorescu <teo@flex.ro> writes: > Till now, everything went ok, but sometimes, in the last few days, I > found some postgres (<zombie>) processes and when every client is > logging out, another postgres <zombie> process appears. I had to kill > -SIGTERM the master, wait for 5 or 6 seconds and then restart it again. > When 1 postgres <zombie> process is appearing, the current working > clients can work ahead, no problem at all. But newer connections aren't > accepted. This sounds like the postmaster process has gotten hung up somehow --- it's not responding to incoming connection requests, nor is it noticing SIGCHLD (signal that one of its child processes exited --- the zombies are there because the postmaster hasn't done a wait() to reap them). I've never seen this myself, but it sure sounds like a bug. Next time you see the condition, would you kill the postmaster with a signal that will produce a coredump (SIGABRT or SIGSEGV should work) and extract a backtrace from the core file? That will give us more to go on. Note it will help if you've compiled the backend with -g ... and don't throw away the corefile, we may need to ask more questions. regards, tom lane