Re: (Never?) Kill Postmaster? - Mailing list pgsql-general

From Christian Schröder
Subject Re: (Never?) Kill Postmaster?
Date
Msg-id 4730833A.7030608@deriva.de
Whole thread Raw
In response to Re: (Never?) Kill Postmaster?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: (Never?) Kill Postmaster?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: (Never?) Kill Postmaster?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom Lane wrote:
> What we can be reasonably certain of is that that backend wasn't
> reaching any CHECK_FOR_INTERRUPTS() macros.  Whether it was hung up
> waiting for something, or caught in a tight loop somewhere, is
> impossible to say without more data than we have.  AFAIR the OP didn't
> even mention whether the backend appeared to be consuming CPU cycles
> (which'd be a pretty fair tip about which of those to believe, but still
> not enough to guess *where* the problem is). A gdb backtrace would tell
> us more.
>

It happened again! I'm not sure if I should be happy because we can now
maybe find the cause of the problem, or should be worried because it's
our productive database ... At least the process doesn't seem to consume
cpu (it doesn't show up in "top"), so I won't kill it this time, but
instead try to get all information that you guys need.
What I already did was an strace with the following result:

db2:/home/pgsql/data # strace -p 7129
Process 7129 attached - interrupt to quit
futex(0x994000, FUTEX_WAIT, 2, NULL)    = -1 EINTR (Interrupted system call)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigreturn(0x2)                       = -1 EINTR (Interrupted system call)
futex(0x994000, FUTEX_WAIT, 2, NULL

That interrupt will have been the script that tries to remove long-time
queries. The same lines seem to repeat over and over again.

Then I attached a gdb to the process and printed out a backtrace:

db2:/home/pgsql/data # gdb --pid=7129
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
Attaching to process 7129
Reading symbols from /usr/local/pgsql_8.2.5/bin/postgres...done.
Using host libthread_db library "/lib64/libthread_db.so.1".
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/local/pgsql_8.2.5/lib/plpgsql.so...done.
Loaded symbols for /usr/local/pgsql_8.2.5/lib/plpgsql.so
Reading symbols from /usr/local/pgsql_8.2.5/lib/plperl.so...done.
Loaded symbols for /usr/local/pgsql_8.2.5/lib/plperl.so
Reading symbols from
/usr/lib/perl5/5.8.8/x86_64-linux-thread-multi/CORE/libperl.so...done.
Loaded symbols for
/usr/lib/perl5/5.8.8/x86_64-linux-thread-multi/CORE/libperl.so
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 47248855881456 (LWP 7129)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from
/usr/lib/perl5/5.8.8/x86_64-linux-thread-multi/auto/Opcode/Opcode.so...done.
Loaded symbols for
/usr/lib/perl5/5.8.8/x86_64-linux-thread-multi/auto/Opcode/Opcode.so
0x00002af904809a68 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00002af904809a68 in __lll_mutex_lock_wait () from
/lib64/libpthread.so.0
#1  0x00002af904806e88 in pthread_rwlock_rdlock () from
/lib64/libpthread.so.0
#2  0x00002af8fb13de23 in _nl_find_msg () from /lib64/libc.so.6
#3  0x00002af8fb13ec83 in __dcigettext () from /lib64/libc.so.6
#4  0x00002af8fb186f0b in strerror_r () from /lib64/libc.so.6
#5  0x00002af8fb186d33 in strerror () from /lib64/libc.so.6
#6  0x00000000005f4daa in expand_fmt_string ()
#7  0x00000000005f6d14 in errmsg ()
#8  0x00000000005182cc in internal_flush ()
#9  0x00000000005183b6 in internal_putbytes ()
#10 0x000000000051841c in pq_putmessage ()
#11 0x00000000005199c4 in pq_endmessage ()
#12 0x0000000000440c6a in printtup ()
#13 0x00000000004fc1b8 in ExecutorRun ()
#14 0x0000000000580451 in PortalRunSelect ()
#15 0x0000000000581446 in PortalRun ()
#16 0x000000000057d625 in exec_simple_query ()
#17 0x000000000057ea72 in PostgresMain ()
#18 0x0000000000558218 in ServerLoop ()
#19 0x0000000000558db8 in PostmasterMain ()
#20 0x000000000051a213 in main ()

Do you need anything else? Can you still tell what's happening?

Regards,
    Christian

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer



pgsql-general by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Copy the database..
Next
From: "Scott Marlowe"
Date:
Subject: Re: Copy the database..