Re: Postmaster hangs - Mailing list pgsql-bugs
From | Karen Pease |
---|---|
Subject | Re: Postmaster hangs |
Date | |
Msg-id | 1256549324.25178.37.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Postmaster hangs (Craig Ringer <craig@postnewspapers.com.au>) |
Responses |
Re: Postmaster hangs
|
List | pgsql-bugs |
I did my best to follow the gdb instructions. I ran: gdb -p 2852 Then connected entered the logging statements, then ran "cont", then ctrl-c'ed it a couple times. I got: Program received signal SIGINT, Interrupt. 0x001e6416 in __kernel_vsyscall () (gdb) bt #0 0x001e6416 in __kernel_vsyscall () #1 0x00c7939d in ___newselect_nocancel () from /lib/libc.so.6 #2 0x081dbaf9 in ?? () #3 0x081dd20a in PostmasterMain () #4 0x08190f96 in main () (gdb) cont Continuing. ^C Program received signal SIGINT, Interrupt. 0x001e6416 in __kernel_vsyscall () (gdb) bt #0 0x001e6416 in __kernel_vsyscall () #1 0x00c7939d in ___newselect_nocancel () from /lib/libc.so.6 #2 0x081dbaf9 in ?? () #3 0x081dd20a in PostmasterMain () #4 0x08190f96 in main () (gdb) quit The jammed httpd processes, by your commandline, are: [root@chmmr dbscripts]# ps ax -o pid,ppid,stat,wchan:50,cmd | grep -i http 3376 1 D start_this_handle /usr/sbin/httpd 3379 1 D start_this_handle /usr/sbin/httpd 3381 1 D start_this_handle /usr/sbin/httpd 4147 1 D start_this_handle /usr/sbin/httpd 4539 1 D start_this_handle /usr/sbin/httpd 5484 1 D start_this_handle /usr/sbin/httpd 11100 1 D start_this_handle /usr/sbin/httpd 14882 1 D start_this_handle /usr/sbin/httpd These cannot be killed by kill -9. Example: [root@chmmr dbscripts]# kill -9 3376 [root@chmmr dbscripts]# ps ax -o pid,ppid,stat,wchan:50,cmd | grep -i http 3376 1 D start_this_handle /usr/sbin/httpd 3379 1 D start_this_handle /usr/sbin/httpd 3381 1 D start_this_handle /usr/sbin/httpd 4147 1 D start_this_handle /usr/sbin/httpd 4539 1 D start_this_handle /usr/sbin/httpd 5484 1 D start_this_handle /usr/sbin/httpd 11100 1 D start_this_handle /usr/sbin/httpd 14882 1 D start_this_handle /usr/sbin/httpd As mentioned, I can kill postmaster. But I can't restart it without a reboot; it hangs: [root@chmmr dbscripts]# ps -ef | grep -i postm postgres 2852 1 0 Oct25 ? 00:00:00 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data root 15115 14844 0 04:23 pts/0 00:00:00 grep -i postm [root@chmmr dbscripts]# /etc/init.d/postgresql stop Stopping postgresql service: ^C^C [FAILED] [root@chmmr dbscripts]# [root@chmmr dbscripts]# killall -9 postmaster [root@chmmr dbscripts]# ps -ef | grep -i postm root 15183 14844 0 04:24 pts/0 00:00:00 grep -i postm [root@chmmr dbscripts]# /etc/init.d/postgresql restart Stopping postgresql service: ^C^C [FAILED] ^C [root@chmmr dbscripts]# /etc/init.d/postgresql start ^C I have no better luck using pg_ctl directly versus using the postgresql control script. Again I hope this helps. Thanks! - Karen On Mon, 2009-10-26 at 17:07 +0800, Craig Ringer wrote: > Karen Pease wrote: > > kill -9 does kill postmaster (or at least seems to). But I can't figure > > out a way to get it restarted without a reboot -- I don't know what I'm > > missing. The Fedora postgres restart scripts don't do the trick, and I > > couldn't get it to work with pg_ctl either. > > It'd help to know where the postmaster was stuck, and if possible where > the backend you were using is stuck. > > A backtrace from gdb can be handy for this. > > http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD > > > kill -9 doesn't work on the locked up httpd processes. So that has to > > have the system restarted. > > If `kill -9' isn't working they're probably in uninterruptable sleep in > the kernel. > > You can find out what they're sleeping in with `ps': > > ps ax -o pid,ppid,stat,wchan:50,cmd > > (Filter for just the postmaster and postgres processes if you want) > > > Both filesystems are EXT-4. > > That's interesting given the issues you're having... > > -- > Craig Ringer
pgsql-bugs by date: