Thread: defunct postmasters

defunct postmasters

From

Philip Crotwell

Date:

11 May 2001, 13:51:04

Hi

I am running postgres7.1 on redhat 6.2 and my database has gone belly up.

I know i am not supposed to "kill -9 " the postmaster, but it has become
completely unresponsive. pgsql just hangs as does stopping with the
rc.d script.

ps output looks like this, and top shows the machine to be almost
completely idle.
postgres 29214     1  0 Apr11 ?        00:00:02 [postmaster]
postgres 29350 29214  0 Apr11 ?        00:01:02 [postmaster <defunct>]
postgres 29351 29214 19 Apr11 ?        19:11:41 [postmaster <defunct>]
postgres 29352 29214  0 Apr11 ?        00:00:00 [postmaster <defunct>]
postgres 31002 29214 15 Apr30 ?        17:07:27 [postmaster <defunct>]
postgres 31003 29214  0 Apr30 ?        00:01:06 [postmaster <defunct>]
postgres  7726 29214  0 May10 ?        00:01:28 [postmaster <defunct>]
postgres  7727 29214  1 May10 ?        00:12:07 [postmaster <defunct>]

I have tried sending the postmaster a regular "kill" without success.

Does anyone have a suggestion for how to restart postgres short of a
kill -9.

There are not any clients connected anymore, so I assume that it is as
quiet as it can get.

thanks,
Philip

PS I don't know why this happened, but the only theory I have is that I am
running with -i to allow jdbc connections and I had port scanned the
machine with nmap shortly before noticing that I could no longer connect.
Maybe just coincidence as I don't know if I could connect before
portscanning or not, but I have seen other daemons crash after being
port scanned.

Re: defunct postmasters

From

Tom Lane

Date:

11 May 2001, 15:12:48

Philip Crotwell <crotwell@seis.sc.edu> writes:
> I am running postgres7.1 on redhat 6.2 and my database has gone belly up.

> I know i am not supposed to "kill -9 " the postmaster, but it has become
> completely unresponsive. pgsql just hangs as does stopping with the
> rc.d script.

Actually, kill -9 should be perfectly safe in PG 7.1; it was only
earlier releases that didn't like it.  But before you do that,
would you attach to the top postmaster process (29214) with gdb
and get a stack trace?

> PS I don't know why this happened, but the only theory I have is that I am
> running with -i to allow jdbc connections and I had port scanned the
> machine with nmap shortly before noticing that I could no longer connect.

Hmm, would you see if that's repeatable?

            regards, tom lane

Re: defunct postmasters

From

Lamar Owen

Date:

11 May 2001, 15:57:04

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Friday 11 May 2001 10:26, Philip Crotwell wrote:
> PS I don't know why this happened, but the only theory I have is that I am
> running with -i to allow jdbc connections and I had port scanned the
> machine with nmap shortly before noticing that I could no longer connect.
> Maybe just coincidence as I don't know if I could connect before
> portscanning or not, but I have seen other daemons crash after being
> port scanned.

Can somebody say 'denial-of-service?'  I knew you could.

I'm going to test this one here and see what happens.  A port scan should not
do this to postmaster.
- --
Lamar Owen
WGCR Internet Radio
1 Peter 4:11
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6/AVG5kGGI8vV9eERAjXuAKCA/MY5pmzBY+8SvfXz8Um/RbXWJgCeKCCq
rwYqYHFrt4Ir+lcGm7e0Iwk=
=iTA5
-----END PGP SIGNATURE-----

Re: defunct postmasters

From

Philip Crotwell

Date:

11 May 2001, 17:00:02

Hi

Not sure if this is helpful, but...
Am I doing this correctly, anything else to try before "pulling the plug"?
thanks,
PHilip


# gdb postmaster 29214
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux"...

postmaster: No such file or directory.


/usr/local/src/29214: No such file or directory.
Attaching to Pid 29214
0x4013da02 in ?? ()
(gdb) bt
#0  0x4013da02 in ?? ()
#1  0x80e07b1 in ?? ()
#2  0x80e0239 in ?? ()
#3  0x80dfdb3 in ?? ()
#4  0x80c3fa5 in ?? ()
#5  0x400a39cb in ?? ()
(gdb) info frame
Stack level 0, frame at 0xbffff400:
 eip = 0x4013da02; saved eip 0x80e07b1
 called by frame at 0xbffff414
 Arglist at 0xbffff400, args:
 Locals at 0xbffff400, Previous frame's sp is 0x0
 Saved registers:
  ebp at 0xbffff400, eip at 0xbffff404


On Fri, 11 May 2001, Tom Lane wrote:

> Philip Crotwell <crotwell@seis.sc.edu> writes:
> > I am running postgres7.1 on redhat 6.2 and my database has gone belly up.
>
> > I know i am not supposed to "kill -9 " the postmaster, but it has become
> > completely unresponsive. pgsql just hangs as does stopping with the
> > rc.d script.
>
> Actually, kill -9 should be perfectly safe in PG 7.1; it was only
> earlier releases that didn't like it.  But before you do that,
> would you attach to the top postmaster process (29214) with gdb
> and get a stack trace?
>
> > PS I don't know why this happened, but the only theory I have is that I am
> > running with -i to allow jdbc connections and I had port scanned the
> > machine with nmap shortly before noticing that I could no longer connect.
>
> Hmm, would you see if that's repeatable?
>
>             regards, tom lane
>

------------------------------------------------------------------------
Philip Crotwell   (803)777-0955  (803)777-0906 fax  crotwell@seis.sc.edu
------------------------------------------------------------------------

Re: defunct postmasters

From

Philip Crotwell

Date:

11 May 2001, 17:03:49

Hi

Once more, this time with feeling :)
Sorry, not a regular user of gdb, but I figured out my error, does this
help?

Anything else before kill -9?

thanks,
PHilip

# gdb postmaster 29214
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux"...

/usr/local/pgsql/bin/29214: No such file or directory.
Attaching to program: /usr/local/pgsql/bin/postmaster, Pid 29214
Reading symbols from /lib/libcrypt.so.1...done.
Reading symbols from /lib/libresolv.so.2...done.
Reading symbols from /lib/libnsl.so.1...done.
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
Reading symbols from /lib/libnss_files.so.2...done.
0x4013da02 in __libc_accept () from /lib/libc.so.6
(gdb) bt
#0  0x4013da02 in __libc_accept () from /lib/libc.so.6
#1  0x80c34b9 in StreamConnection ()
#2  0x80e07b1 in ConnCreate ()
#3  0x80e0239 in ServerLoop ()
#4  0x80dfdb3 in PostmasterMain ()
#5  0x80c3fa5 in main ()
#6  0x400a39cb in __libc_start_main (main=0x80c3ec0 <main>, argc=4,
argv=0xbffffb14, init=0x80651d0 <_init>,
    fini=0x813697c <_fini>, rtld_fini=0x4000ae60 <_dl_fini>,
stack_end=0xbffffb0c) at ../sysdeps/generic/libc-start.c:92
(gdb) info frame
Stack level 0, frame at 0xbffff400:
 eip = 0x4013da02 in __libc_accept; saved eip 0x80e07b1
 (FRAMELESS), called by frame at 0xbffff400
 source language unknown.
 Arglist at 0xbffff400, args:
 Locals at 0xbffff400, Previous frame's sp is 0x0
 Saved registers:
  ebp at 0xbffff400, eip at 0xbffff404
(gdb)



On Fri, 11 May 2001, Tom Lane wrote:

> Philip Crotwell <crotwell@seis.sc.edu> writes:
> > I am running postgres7.1 on redhat 6.2 and my database has gone belly up.
>
> > I know i am not supposed to "kill -9 " the postmaster, but it has become
> > completely unresponsive. pgsql just hangs as does stopping with the
> > rc.d script.
>
> Actually, kill -9 should be perfectly safe in PG 7.1; it was only
> earlier releases that didn't like it.  But before you do that,
> would you attach to the top postmaster process (29214) with gdb
> and get a stack trace?
>
> > PS I don't know why this happened, but the only theory I have is that I am
> > running with -i to allow jdbc connections and I had port scanned the
> > machine with nmap shortly before noticing that I could no longer connect.
>
> Hmm, would you see if that's repeatable?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

Re: defunct postmasters

From

Tom Lane

Date:

11 May 2001, 17:05:23

> (gdb) bt
> #0  0x4013da02 in __libc_accept () from /lib/libc.so.6
> #1  0x80c34b9 in StreamConnection ()
> #2  0x80e07b1 in ConnCreate ()
> #3  0x80e0239 in ServerLoop ()
> #4  0x80dfdb3 in PostmasterMain ()
> #5  0x80c3fa5 in main ()
> #6  0x400a39cb in __libc_start_main (main=0x80c3ec0 <main>, argc=4,

Hmph.  Waiting to accept a connection that's evidently not coming
through.  Maybe that portscan did cause this.

I'd say go ahead and kill it, and after restarting try another portscan
to see if that really does cause the problem.

            regards, tom lane

Re: defunct postmasters

From

Philip Crotwell

Date:

11 May 2001, 19:55:13

Well, I killed it. Oddly giving a "stop" to the rc script hung after I did
the kill -9. Not sure if that matters.

At any rate, I restarted, and the database seems just fine after repeated
port scans. So either that was just coincidental, or it was a
combination of the port scan and something else???

I will keep an eye on things, and repost if it locks up again after a port
scan, but for now I seem to be ok.

thanks for your help,
Philip

On Fri, 11 May 2001, Tom Lane wrote:

> > (gdb) bt
> > #0  0x4013da02 in __libc_accept () from /lib/libc.so.6
> > #1  0x80c34b9 in StreamConnection ()
> > #2  0x80e07b1 in ConnCreate ()
> > #3  0x80e0239 in ServerLoop ()
> > #4  0x80dfdb3 in PostmasterMain ()
> > #5  0x80c3fa5 in main ()
> > #6  0x400a39cb in __libc_start_main (main=0x80c3ec0 <main>, argc=4,
>
> Hmph.  Waiting to accept a connection that's evidently not coming
> through.  Maybe that portscan did cause this.
>
> I'd say go ahead and kill it, and after restarting try another portscan
> to see if that really does cause the problem.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

------------------------------------------------------------------------
Philip Crotwell   (803)777-0955  (803)777-0906 fax  crotwell@seis.sc.edu
------------------------------------------------------------------------