Thread: defunct postmasters
Hi I am running postgres7.1 on redhat 6.2 and my database has gone belly up. I know i am not supposed to "kill -9 " the postmaster, but it has become completely unresponsive. pgsql just hangs as does stopping with the rc.d script. ps output looks like this, and top shows the machine to be almost completely idle. postgres 29214 1 0 Apr11 ? 00:00:02 [postmaster] postgres 29350 29214 0 Apr11 ? 00:01:02 [postmaster <defunct>] postgres 29351 29214 19 Apr11 ? 19:11:41 [postmaster <defunct>] postgres 29352 29214 0 Apr11 ? 00:00:00 [postmaster <defunct>] postgres 31002 29214 15 Apr30 ? 17:07:27 [postmaster <defunct>] postgres 31003 29214 0 Apr30 ? 00:01:06 [postmaster <defunct>] postgres 7726 29214 0 May10 ? 00:01:28 [postmaster <defunct>] postgres 7727 29214 1 May10 ? 00:12:07 [postmaster <defunct>] I have tried sending the postmaster a regular "kill" without success. Does anyone have a suggestion for how to restart postgres short of a kill -9. There are not any clients connected anymore, so I assume that it is as quiet as it can get. thanks, Philip PS I don't know why this happened, but the only theory I have is that I am running with -i to allow jdbc connections and I had port scanned the machine with nmap shortly before noticing that I could no longer connect. Maybe just coincidence as I don't know if I could connect before portscanning or not, but I have seen other daemons crash after being port scanned.
Philip Crotwell <crotwell@seis.sc.edu> writes: > I am running postgres7.1 on redhat 6.2 and my database has gone belly up. > I know i am not supposed to "kill -9 " the postmaster, but it has become > completely unresponsive. pgsql just hangs as does stopping with the > rc.d script. Actually, kill -9 should be perfectly safe in PG 7.1; it was only earlier releases that didn't like it. But before you do that, would you attach to the top postmaster process (29214) with gdb and get a stack trace? > PS I don't know why this happened, but the only theory I have is that I am > running with -i to allow jdbc connections and I had port scanned the > machine with nmap shortly before noticing that I could no longer connect. Hmm, would you see if that's repeatable? regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday 11 May 2001 10:26, Philip Crotwell wrote: > PS I don't know why this happened, but the only theory I have is that I am > running with -i to allow jdbc connections and I had port scanned the > machine with nmap shortly before noticing that I could no longer connect. > Maybe just coincidence as I don't know if I could connect before > portscanning or not, but I have seen other daemons crash after being > port scanned. Can somebody say 'denial-of-service?' I knew you could. I'm going to test this one here and see what happens. A port scan should not do this to postmaster. - -- Lamar Owen WGCR Internet Radio 1 Peter 4:11 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE6/AVG5kGGI8vV9eERAjXuAKCA/MY5pmzBY+8SvfXz8Um/RbXWJgCeKCCq rwYqYHFrt4Ir+lcGm7e0Iwk= =iTA5 -----END PGP SIGNATURE-----
Hi Not sure if this is helpful, but... Am I doing this correctly, anything else to try before "pulling the plug"? thanks, PHilip # gdb postmaster 29214 GNU gdb 19991004 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... postmaster: No such file or directory. /usr/local/src/29214: No such file or directory. Attaching to Pid 29214 0x4013da02 in ?? () (gdb) bt #0 0x4013da02 in ?? () #1 0x80e07b1 in ?? () #2 0x80e0239 in ?? () #3 0x80dfdb3 in ?? () #4 0x80c3fa5 in ?? () #5 0x400a39cb in ?? () (gdb) info frame Stack level 0, frame at 0xbffff400: eip = 0x4013da02; saved eip 0x80e07b1 called by frame at 0xbffff414 Arglist at 0xbffff400, args: Locals at 0xbffff400, Previous frame's sp is 0x0 Saved registers: ebp at 0xbffff400, eip at 0xbffff404 On Fri, 11 May 2001, Tom Lane wrote: > Philip Crotwell <crotwell@seis.sc.edu> writes: > > I am running postgres7.1 on redhat 6.2 and my database has gone belly up. > > > I know i am not supposed to "kill -9 " the postmaster, but it has become > > completely unresponsive. pgsql just hangs as does stopping with the > > rc.d script. > > Actually, kill -9 should be perfectly safe in PG 7.1; it was only > earlier releases that didn't like it. But before you do that, > would you attach to the top postmaster process (29214) with gdb > and get a stack trace? > > > PS I don't know why this happened, but the only theory I have is that I am > > running with -i to allow jdbc connections and I had port scanned the > > machine with nmap shortly before noticing that I could no longer connect. > > Hmm, would you see if that's repeatable? > > regards, tom lane > ------------------------------------------------------------------------ Philip Crotwell (803)777-0955 (803)777-0906 fax crotwell@seis.sc.edu ------------------------------------------------------------------------
Hi Once more, this time with feeling :) Sorry, not a regular user of gdb, but I figured out my error, does this help? Anything else before kill -9? thanks, PHilip # gdb postmaster 29214 GNU gdb 19991004 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... /usr/local/pgsql/bin/29214: No such file or directory. Attaching to program: /usr/local/pgsql/bin/postmaster, Pid 29214 Reading symbols from /lib/libcrypt.so.1...done. Reading symbols from /lib/libresolv.so.2...done. Reading symbols from /lib/libnsl.so.1...done. Reading symbols from /lib/libdl.so.2...done. Reading symbols from /lib/libm.so.6...done. Reading symbols from /lib/libc.so.6...done. Reading symbols from /lib/ld-linux.so.2...done. Reading symbols from /lib/libnss_files.so.2...done. 0x4013da02 in __libc_accept () from /lib/libc.so.6 (gdb) bt #0 0x4013da02 in __libc_accept () from /lib/libc.so.6 #1 0x80c34b9 in StreamConnection () #2 0x80e07b1 in ConnCreate () #3 0x80e0239 in ServerLoop () #4 0x80dfdb3 in PostmasterMain () #5 0x80c3fa5 in main () #6 0x400a39cb in __libc_start_main (main=0x80c3ec0 <main>, argc=4, argv=0xbffffb14, init=0x80651d0 <_init>, fini=0x813697c <_fini>, rtld_fini=0x4000ae60 <_dl_fini>, stack_end=0xbffffb0c) at ../sysdeps/generic/libc-start.c:92 (gdb) info frame Stack level 0, frame at 0xbffff400: eip = 0x4013da02 in __libc_accept; saved eip 0x80e07b1 (FRAMELESS), called by frame at 0xbffff400 source language unknown. Arglist at 0xbffff400, args: Locals at 0xbffff400, Previous frame's sp is 0x0 Saved registers: ebp at 0xbffff400, eip at 0xbffff404 (gdb) On Fri, 11 May 2001, Tom Lane wrote: > Philip Crotwell <crotwell@seis.sc.edu> writes: > > I am running postgres7.1 on redhat 6.2 and my database has gone belly up. > > > I know i am not supposed to "kill -9 " the postmaster, but it has become > > completely unresponsive. pgsql just hangs as does stopping with the > > rc.d script. > > Actually, kill -9 should be perfectly safe in PG 7.1; it was only > earlier releases that didn't like it. But before you do that, > would you attach to the top postmaster process (29214) with gdb > and get a stack trace? > > > PS I don't know why this happened, but the only theory I have is that I am > > running with -i to allow jdbc connections and I had port scanned the > > machine with nmap shortly before noticing that I could no longer connect. > > Hmm, would you see if that's repeatable? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html >
> (gdb) bt > #0 0x4013da02 in __libc_accept () from /lib/libc.so.6 > #1 0x80c34b9 in StreamConnection () > #2 0x80e07b1 in ConnCreate () > #3 0x80e0239 in ServerLoop () > #4 0x80dfdb3 in PostmasterMain () > #5 0x80c3fa5 in main () > #6 0x400a39cb in __libc_start_main (main=0x80c3ec0 <main>, argc=4, Hmph. Waiting to accept a connection that's evidently not coming through. Maybe that portscan did cause this. I'd say go ahead and kill it, and after restarting try another portscan to see if that really does cause the problem. regards, tom lane
Well, I killed it. Oddly giving a "stop" to the rc script hung after I did the kill -9. Not sure if that matters. At any rate, I restarted, and the database seems just fine after repeated port scans. So either that was just coincidental, or it was a combination of the port scan and something else??? I will keep an eye on things, and repost if it locks up again after a port scan, but for now I seem to be ok. thanks for your help, Philip On Fri, 11 May 2001, Tom Lane wrote: > > (gdb) bt > > #0 0x4013da02 in __libc_accept () from /lib/libc.so.6 > > #1 0x80c34b9 in StreamConnection () > > #2 0x80e07b1 in ConnCreate () > > #3 0x80e0239 in ServerLoop () > > #4 0x80dfdb3 in PostmasterMain () > > #5 0x80c3fa5 in main () > > #6 0x400a39cb in __libc_start_main (main=0x80c3ec0 <main>, argc=4, > > Hmph. Waiting to accept a connection that's evidently not coming > through. Maybe that portscan did cause this. > > I'd say go ahead and kill it, and after restarting try another portscan > to see if that really does cause the problem. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > ------------------------------------------------------------------------ Philip Crotwell (803)777-0955 (803)777-0906 fax crotwell@seis.sc.edu ------------------------------------------------------------------------