Re: Postmaster hangs - Mailing list pgsql-bugs

From Craig Ringer
Subject Re: Postmaster hangs
Date
Msg-id 1256624664.1709.80.camel@wallace.localnet
Whole thread Raw
In response to Re: Postmaster hangs  (Karen Pease <meme@daughtersoftiresias.org>)
Responses Re: Postmaster hangs
List pgsql-bugs
On Tue, 2009-10-27 at 00:50 -0500, Karen Pease wrote:
> > OK, so there's nothing shrieklingly obviously wrong with what the
> > postmaster is up to. But what about the backend that's stopped
> > responding? Try connecting gdb to that "postgres" process once it's
> > stopped responding and get a backtrace from that.
> >
>
> Okay -- I started up a psql instance, which immediately locks up.  I
> then attached gdb to it and got this:
>
> (gdb) cont
> Continuing.
> ^C
> Program received signal SIGINT, Interrupt.
> 0x00fe2416 in __kernel_vsyscall ()

You didn't actually request a backtrace (bt), so all it shows is the top
stack frame. That doesn't tell us anything except that it's busy in a
system call in the kernel.

> > You can find out a bit more about what the kernel is doing using the
> > "magic" keyboard sequence "ALT-SysRQ-T" from a vconsole (not under X).
>
> Nothing happened.  Nothing useful in dmesg -- certainly no stacktraces.

Your kernel might not have the "magic sysrq key" enabled. Run:

  sudo sysctl -w kernel.sysrq=1

and try again. Note that on some systems with weird keyboards you might
have to hold the "Fn" key (if you have one) or disable "F-Lock" (if you
have it) to get SysRq to be recognised. The print screen / PrtScn key is
usually shared with SysRq even if it's not marked as such.

Hmm, it looks like the SysRq magic key sequences even work under X. I
didn't think they did, but on my system here hitting alt-sysrq-t under
X11 dumps a bunch of task trace data in to /var/log/kern.log (Ubuntu
system), including task info and some general info on the CPU states.

Oct 27 14:20:19 wallace kernel: [13668207.501781] postgres      S 28e389e2     0  3152  31105
Oct 27 14:20:19 wallace kernel: [13668207.501781]  f352bcac 00200086 c3634358 28e389e2 00029346 00000000 c069a340
c07d2180
Oct 27 14:20:19 wallace kernel: [13668207.501781]  c51fe480 c51fe6f8 c3634300 ffffffff c3634300 00000000 c3634300
c51fe6f8
Oct 27 14:20:19 wallace kernel: [13668207.501781]  f352bca8 c01398c8 c90e0000 7fffffff c90e01a8 f352bcf8 c050ec7d
f352bcc8
Oct 27 14:20:19 wallace kernel: [13668207.501781] Call Trace:
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c01398c8>] ? enqueue_task_fair+0x68/0x70
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c050ec7d>] schedule_timeout+0xad/0xe0
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c0156aea>] ? prepare_to_wait+0x3a/0x70
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c04aaf38>] unix_stream_data_wait+0x88/0xe0
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c0156890>] ? autoremove_wake_function+0x0/0x50
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c04ac631>] unix_stream_recvmsg+0x311/0x490
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c02b0990>] ? apparmor_socket_recvmsg+0x10/0x20
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c04356e4>] sock_recvmsg+0xf4/0x120
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c0156890>] ? autoremove_wake_function+0x0/0x50
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c01c3bb7>] ? __mem_cgroup_uncharge_common+0x137/0x170
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c01a39b8>] ? __dec_zone_page_state+0x18/0x20
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c01b0411>] ? page_remove_rmap+0x61/0x130
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c012eb00>] ? kunmap_atomic+0x50/0x60
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c043599c>] sys_recvfrom+0x7c/0xd0
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c019c305>] ? __pagevec_free+0x25/0x30
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c019eee0>] ? release_pages+0x160/0x1a0
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c02d2bfd>] ? rb_erase+0xcd/0x150
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c0435a26>] sys_recv+0x36/0x40
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c0435be7>] sys_socketcall+0x1b7/0x2b0
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c014617b>] ? sys_gettimeofday+0x2b/0x70
Oct 27 14:20:19 wallace kernel: [13668207.501781]  [<c0109eef>] sysenter_do_call+0x12/0x2f


BTW, if you do get kernel task trace information and you decide to
redact it, it'd be good to include at least all your `postgres'
instances, all `httpd' instances, and the summary information at the
end.

--
Craig Ringer

pgsql-bugs by date:

Previous
From: Karen Pease
Date:
Subject: Re: Postmaster hangs
Next
From: "Hemanta Kumar Biswal"
Date:
Subject: BUG #5138: Failed to get System metrics for terminal services:87