Re: Postmaster hangs - Mailing list pgsql-bugs

From Karen Pease
Subject Re: Postmaster hangs
Date
Msg-id 1256549324.25178.37.camel@localhost.localdomain
Whole thread Raw
In response to Re: Postmaster hangs  (Craig Ringer <craig@postnewspapers.com.au>)
Responses Re: Postmaster hangs
List pgsql-bugs
I did my best to follow the gdb instructions.  I ran:

gdb -p 2852

Then connected entered the logging statements, then ran "cont", then
ctrl-c'ed it a couple times.  I got:

Program received signal SIGINT, Interrupt.
0x001e6416 in __kernel_vsyscall ()
(gdb) bt
#0  0x001e6416 in __kernel_vsyscall ()
#1  0x00c7939d in ___newselect_nocancel () from /lib/libc.so.6
#2  0x081dbaf9 in ?? ()
#3  0x081dd20a in PostmasterMain ()
#4  0x08190f96 in main ()
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x001e6416 in __kernel_vsyscall ()
(gdb) bt
#0  0x001e6416 in __kernel_vsyscall ()
#1  0x00c7939d in ___newselect_nocancel () from /lib/libc.so.6
#2  0x081dbaf9 in ?? ()
#3  0x081dd20a in PostmasterMain ()
#4  0x08190f96 in main ()
(gdb) quit

The jammed httpd processes, by your commandline, are:

[root@chmmr dbscripts]# ps ax -o pid,ppid,stat,wchan:50,cmd | grep -i
http
 3376     1 D
start_this_handle                                  /usr/sbin/httpd
 3379     1 D
start_this_handle                                  /usr/sbin/httpd
 3381     1 D
start_this_handle                                  /usr/sbin/httpd
 4147     1 D
start_this_handle                                  /usr/sbin/httpd
 4539     1 D
start_this_handle                                  /usr/sbin/httpd
 5484     1 D
start_this_handle                                  /usr/sbin/httpd
11100     1 D
start_this_handle                                  /usr/sbin/httpd
14882     1 D
start_this_handle                                  /usr/sbin/httpd

These cannot be killed by kill -9.  Example:

[root@chmmr dbscripts]# kill -9 3376
[root@chmmr dbscripts]# ps ax -o pid,ppid,stat,wchan:50,cmd | grep -i
http
 3376     1 D
start_this_handle                                  /usr/sbin/httpd
 3379     1 D
start_this_handle                                  /usr/sbin/httpd
 3381     1 D
start_this_handle                                  /usr/sbin/httpd
 4147     1 D
start_this_handle                                  /usr/sbin/httpd
 4539     1 D
start_this_handle                                  /usr/sbin/httpd
 5484     1 D
start_this_handle                                  /usr/sbin/httpd
11100     1 D
start_this_handle                                  /usr/sbin/httpd
14882     1 D
start_this_handle                                  /usr/sbin/httpd

As mentioned, I can kill postmaster.  But I can't restart it without a
reboot; it hangs:

[root@chmmr dbscripts]# ps -ef | grep -i postm
postgres  2852     1  0 Oct25 ?        00:00:00 /usr/bin/postmaster -p
5432 -D /var/lib/pgsql/data
root     15115 14844  0 04:23 pts/0    00:00:00 grep -i postm
[root@chmmr dbscripts]# /etc/init.d/postgresql stop
Stopping postgresql service: ^C^C                          [FAILED]
[root@chmmr dbscripts]#
[root@chmmr dbscripts]# killall -9 postmaster
[root@chmmr dbscripts]# ps -ef | grep -i postm
root     15183 14844  0 04:24 pts/0    00:00:00 grep -i postm
[root@chmmr dbscripts]# /etc/init.d/postgresql restart
Stopping postgresql service: ^C^C                          [FAILED]
^C
[root@chmmr dbscripts]# /etc/init.d/postgresql start
^C

I have no better luck using pg_ctl directly versus using the postgresql
control script.

Again I hope this helps.  Thanks!

    - Karen

On Mon, 2009-10-26 at 17:07 +0800, Craig Ringer wrote:
> Karen Pease wrote:
> > kill -9 does kill postmaster (or at least seems to).  But I can't figure
> > out a way to get it restarted without a reboot -- I don't know what I'm
> > missing.  The Fedora postgres restart scripts don't do the trick, and I
> > couldn't get it to work with pg_ctl either.
>
> It'd help to know where the postmaster was stuck, and if possible where
> the backend you were using is stuck.
>
> A backtrace from gdb can be handy for this.
>
> http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
>
> > kill -9 doesn't work on the locked up httpd processes.  So that has to
> > have the system restarted.
>
> If `kill -9' isn't working they're probably in uninterruptable sleep in
> the kernel.
>
> You can find out what they're sleeping in with `ps':
>
>   ps ax -o pid,ppid,stat,wchan:50,cmd
>
> (Filter for just the postmaster and postgres processes if you want)
>
> > Both filesystems are EXT-4.
>
> That's interesting given the issues you're having...
>
> --
> Craig Ringer

pgsql-bugs by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Postmaster hangs
Next
From: Craig Ringer
Date:
Subject: Re: Postmaster hangs