System in Recovery Mode But No Activity - Mailing list pgsql-general

From John Cheng
Subject System in Recovery Mode But No Activity
Date
Msg-id a18a22ec0806201812i5bdd7533ta62a60a1e08f2058@mail.gmail.com
Whole thread Raw
Responses Re: System in Recovery Mode But No Activity  ("Scott Marlowe" <scott.marlowe@gmail.com>)
List pgsql-general
We had a run away process on our database box that used up all the
physical and all the virtual memory (swap). This caused the RedHat
Linux oom-killer to kill many processes, including some Postgres ones.
Postgres went into a funky state after that time:

2008-06-20 14:19:10 CDT [unknown] LOG:  invalid length of startup packet
2008-06-20 14:20:50 CDT  ERROR:  canceling autovacuum task
2008-06-20 14:20:50 CDT  CONTEXT:  automatic vacuum of table
"lms_nna.pg_catalog.pg_listener"
2008-06-20 14:21:48 CDT  ERROR:  canceling autovacuum task
2008-06-20 14:21:49 CDT  CONTEXT:  automatic vacuum of table
"lms_infiniti.pg_catalog.pg_listener"
2008-06-20 14:27:33 CDT  WARNING:  worker took too long to start; cancelled
2008-06-20 14:45:53 CDT  LOG:  server process (PID 22435) was
terminated by signal 9: Killed
...
2008-06-20 14:50:28 CDT <dbname> DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly
corrupted shared memory.
2008-06-20 14:50:28 CDT <dbname> HINT:  In a moment you should be able
to reconnect to the database and repeat your command.

...
2008-06-20 14:50:36 CDT <dbname> FATAL:  the database system is in recovery mode

Now any attempt to connect to the database yield the message "the
database system is in recovery mode".

The processes that are still running are:

postgres 31103  0.0  0.0 149024 492 ?        S    Apr19   8:18
/opt/lms/64/postgres/8.3/bin/postmaster -p 6544 -D /pgdata/8.3/data
postgres 31109  0.0  0.0 65408  104 ?        Ss   Apr19   1:54  \_
postgres: logger process
postgres 32645  0.0  0.0 152708 564 ?        Ss   Jun15   3:14  \_
postgres: slony <databasename> 172.16.172.246(41114) idle in
transaction
postgres  1179  0.0  0.0 152708 348 ?        Ss   Jun15   3:30  \_
postgres: slony  <databasename> 172.16.172.246(41138) idle in
transaction

I executed strace and gdb on each process except for 31109 (the logger process)

Process 31103 attached - interrupt to quit # Main postmaster
select(6, [3 5], NULL, NULL, {15, 795000} <unfinished ...>
Process 31103 detached
[postgres@126053-db2 64]$ strace -p 32645 # connection #1
Process 32645 attached - interrupt to quit
futex(0x3954d32930, FUTEX_WAIT, 2, NULL <unfinished ...>
Process 32645 detached
[postgres@126053-db2 64]$ strace -p 1179 # connection #2
Process 1179 attached - interrupt to quit
futex(0x3954d32930, FUTEX_WAIT, 2, NULL <unfinished ...>
Process 1179 detached

I then used gdb to gather more information:

#
# For main postmaster process
#

[postgres@126053-db2 64]$ gdb 31103
GNU gdb Red Hat Linux (6.3.0.0-1.153.el4_6.2rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...31103: No such
file or directory.

[postgres@126053-db2 64]$ gdb /opt/lms/postgres-8.3_64/bin/postmaster 31103
This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging
symbols found)
Using host libthread_db library "/lib64/tls/libthread_db.so.1".

Attaching to program: /opt/lms/64/postgres/8.3/bin/postmaster, process 31103
Reading symbols from /usr/lib64/libxslt.so.1...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libxslt.so.1
Reading symbols from /usr/lib64/libxml2.so.2...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libxml2.so.2
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/tls/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/tls/libc.so.6...
(no debugging symbols found)...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /lib64/tls/libpthread.so.0...(no debugging
symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 182894088192 (LWP 31103)]
Loaded symbols for /lib64/tls/libpthread.so.0
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libnss_files.so.2

0x0000003954bc0125 in __select_nocancel () from /lib64/tls/libc.so.6

#
# For connection no. 1
#

[postgres@126053-db2 64]$ gdb /opt/lms/postgres-8.3_64/bin/postmaster 32645
This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging
symbols found)
Using host libthread_db library "/lib64/tls/libthread_db.so.1".

Attaching to program: /opt/lms/64/postgres/8.3/bin/postmaster, process 32645
Reading symbols from /usr/lib64/libxslt.so.1...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libxslt.so.1
Reading symbols from /usr/lib64/libxml2.so.2...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libxml2.so.2
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/tls/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/tls/libc.so.6...
(no debugging symbols found)...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /lib64/tls/libpthread.so.0...(no debugging
symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 182894088192 (LWP 32645)]
Loaded symbols for /lib64/tls/libpthread.so.0
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /opt/lms/64/postgres/8.3/lib/plpgsql.so...
(no debugging symbols found)...done.
Loaded symbols for /opt/lms/64/postgres/8.3/lib/plpgsql.so
Reading symbols from /opt/lms/64/postgres/8.3/lib/xxid.so...done.
Loaded symbols for /opt/lms/64/postgres/8.3/lib/xxid.so
0x0000003954bd28db in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6

#
# For connection no. 3
#

[postgres@126053-db2 64]$ gdb /opt/lms/postgres-8.3_64/bin/postmaster 1179
This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging
symbols found)
Using host libthread_db library "/lib64/tls/libthread_db.so.1".

Attaching to program: /opt/lms/64/postgres/8.3/bin/postmaster, process 1179
Reading symbols from /usr/lib64/libxslt.so.1...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libxslt.so.1
Reading symbols from /usr/lib64/libxml2.so.2...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libxml2.so.2
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/tls/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/tls/libc.so.6...
(no debugging symbols found)...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /usr/lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /lib64/tls/libpthread.so.0...(no debugging
symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 182894088192 (LWP 1179)]
Loaded symbols for /lib64/tls/libpthread.so.0
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /opt/lms/64/postgres/8.3/lib/plpgsql.so...
(no debugging symbols found)...done.
Loaded symbols for /opt/lms/64/postgres/8.3/lib/plpgsql.so
Reading symbols from /opt/lms/64/postgres/8.3/lib/xxid.so...done.
Loaded symbols for /opt/lms/64/postgres/8.3/lib/xxid.so
0x0000003954bd28db in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6

I think the fact that a process used up all the available memory
(physical and virtual) caused Postgres to go into a weird state. Now
it will not respond to kill, or pg_ctl for shutdown. Would the right
thing to do be using kill -9 to stop the server?


--
- John L Cheng

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Timezone issue - Is it me or is this a massive bug?
Next
From: "Scott Marlowe"
Date:
Subject: Re: System in Recovery Mode But No Activity