Thread: FATAL: the database system is in recovery mode

FATAL: the database system is in recovery mode

From
Ron St-Pierre
Date:
We're developing a java app and are using postgres as the database. On
our dev server I started the app, closed it, but the java process was
still open so I killed it, which caused the above error. I've had to do
this in the past but have not had this happen before. I've searched the
archives and found a message/reply from Andrew Sullivan about this which
implies that it may be more a file system problem. We're using
postgresql 7.4.1 on a debian box (and yes, ext2). And fsync is turned on.

Aside from not killing processes which hang, is there anything I can do
to avoid this problem in the future?

Thanks
Ron


Re: FATAL: the database system is in recovery mode

From
Ron St-Pierre
Date:
Doug McNaught wrote:

>Ron St-Pierre <rstpierre@syscor.com> writes:
>
>
>
>>We're developing a java app and are using postgres as the database. On
>>our dev server I started the app, closed it, but the java process was
>>still open so I killed it, which caused the above error. I've had to
>>do this in the past but have not had this happen before. I've searched
>>the archives and found a message/reply from Andrew Sullivan about this
>>which implies that it may be more a file system problem. We're using
>>postgresql 7.4.1 on a debian box (and yes, ext2). And fsync is turned
>>on.
>>
>>
>
>Killing a client process shouldn't cause fatal errors on the
>server--the most you should get is "client unexpectedly closed
>connection".  Something else is wrong.
>
>
>
>>Aside from not killing processes which hang, is there anything I can
>>do to avoid this problem in the future?
>>
>>
>
>What is the actual problem with Postgres?  Did the server die?  Is it
>refusing to start up?  If you can give more complete logs that would
>be helpful.
>
>-Doug
>
>
>
It looks as if it hung up trying to write some customer preferences to
the db. Here's the snippet from /var/log/messages:

Jul  7 09:32:58 imp postgres[32351]: [550-1] LOG:  statement: UPDATE
customer.clientPreferences SET preferences='<?xml version="1.0"
encoding="UTF-8"?>
Jul  7 09:32:58 imp postgres[32351]: [550-2] ^I<java version="1.4.2_03"
class="java.beans.XMLDecoder">
Jul  7 09:32:58 imp postgres[32351]: [550-3] ^I <object
class="java.util.HashMap">
Jul  7 09:32:58 imp postgres[32351]: [550-4] ^I  <void method="put">

.... thousand(s) lines of XML ...

Jul  7 09:32:58 imp postgres[32351]: [550-1626] ^I        <void
method="put">
Jul  7 09:32:58 imp postgres[26784]: [448-1] LOG:  server process (PID
32349) was terminated by signal 11
Jul  7 09:32:58 imp postgres[26784]: [449-1] LOG:  terminating any other
active server processes
Jul  7 09:32:58 imp postgres[32352]: [567-1] WARNING:  terminating
connection because of crash of another server process
Jul  7 09:32:58 imp postgres[32350]: [482-1] WARNING:  terminating
connection because of crash of another server process
Jul  7 09:32:58 imp postgres[32348]: [450-1] WARNING:  terminating
connection because of crash of another server process
Jul  7 09:32:58 imp postgres[32352]: [567-2] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server
Jul  7 09:32:58 imp postgres[32352]: [567-3]  process exited abnormally
and possibly corrupted shared memory.
Jul  7 09:32:58 imp postgres[32350]: [482-2] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server
Jul  7 09:32:58 imp postgres[32348]: [450-2] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server
Jul  7 09:32:58 imp postgres[32352]: [567-4] HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
Jul  7 09:32:58 imp postgres[32350]: [482-3]  process exited abnormally
and possibly corrupted shared memory.
Jul  7 09:32:58 imp postgres[32350]: [482-4] HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
Jul  7 09:32:58 imp postgres[32348]: [450-3]  process exited abnormally
and possibly corrupted shared memory.
Jul  7 09:32:58 imp postgres[32348]: [450-4] HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
Jul  7 09:32:58 imp postgres[32351]: [551-1] ERROR:  terminating
connection because of crash of another server process
                                                                     ^ ^
^ ^ ^
Jul  7 09:32:58 imp postgres[32351]: [551-2] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server
Jul  7 09:32:58 imp postgres[32351]: [551-3]  process exited abnormally
and possibly corrupted shared memory.
Jul  7 09:32:58 imp postgres[32351]: [551-4] HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
Jul  7 09:32:58 imp postgres[19223]: [432-1] WARNING:  terminating
connection because of crash of another server process
Jul  7 09:32:58 imp postgres[19223]: [432-2] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server
 ... more ...
Jul  7 09:32:58 imp postgres[3802]: [295-3]  process exited abnormally
and possibly corrupted shared memory.
Jul  7 09:32:58 imp postgres[3802]: [295-4] HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
Jul  7 10:02:24 imp -- MARK --
Jul  7 10:22:24 imp -- MARK --
Jul  7 10:42:24 imp -- MARK --
Jul  7 11:02:24 imp -- MARK --
Jul  7 11:22:24 imp -- MARK --
Jul  7 11:42:24 imp -- MARK --
Jul  7 12:02:24 imp -- MARK --
Jul  7 12:22:24 imp -- MARK --
Jul  7 12:42:24 imp -- MARK --
Jul  7 13:02:24 imp -- MARK --
Jul  7 13:22:24 imp -- MARK --
Jul  7 13:42:24 imp -- MARK --
Jul  7 14:00:36 imp postgres[32351]: [552-1] LOG:  could not receive
data from client: Connection reset by peer
Jul  7 14:00:36 imp postgres[32351]: [553-1] LOG:  unexpected EOF on
client connection
Jul  7 14:00:36 imp postgres[26784]: [450-1] LOG:  all server processes
terminated; reinitializing
Jul  7 14:00:36 imp postgres[7962]: [451-1] LOG:  database system was
interrupted at 2004-07-07 09:31:44 PDT
Jul  7 14:00:36 imp postgres[7962]: [452-1] LOG:  checkpoint record is
at 5E/7DB31300
Jul  7 14:00:36 imp postgres[7962]: [453-1] LOG:  redo record is at
5E/7DB31300; undo record is at 0/0; shutdown FALSE
Jul  7 14:00:36 imp postgres[7962]: [454-1] LOG:  next transaction ID:
21787900; next OID: 125814322
Jul  7 14:00:36 imp postgres[7962]: [455-1] LOG:  database system was
not properly shut down; automatic recovery in progress
Jul  7 14:00:36 imp postgres[7962]: [456-1] LOG:  record with zero
length at 5E/7DB31340
Jul  7 14:00:36 imp postgres[7962]: [457-1] LOG:  redo is not required
Jul  7 14:00:38 imp postgres[7962]: [458-1] LOG:  database system is ready

^ ^ ^ ^
I just checked it again, and as you can see from the logs it has
restarted. Looking at the times in the logs, it crashed at 09:32:58 and
restarted at 14:00:36, but there's nothing scheduled in cron that I can
see which would cause the restart.  Odd. Normally I would have restarted
it manually with pg_ctl but I wanted to see what would happen so I left it.

Anyway my original question still stands, any idea how to avoid this
problem in the future?

Thanks
Ron


Re: FATAL: the database system is in recovery mode

From
Doug McNaught
Date:
Ron St-Pierre <rstpierre@syscor.com> writes:

> Jul  7 09:32:58 imp postgres[26784]: [448-1] LOG:  server process (PID
> 32349) was terminated by signal 11

This is the critical event--one of the backends got a segfault.  This
often means bad hardware--either your RAM is faulty, or something got
corrupted on disk and caused a bad pointer reference.  Or maybe an
overheating CPU, if a fan failed.

That said, it's also possible that it's a Postgres bug, but it's more
likely to be hardware...

> Anyway my original question still stands, any idea how to avoid this
> problem in the future?

Test your hardware and replace any bad components.

-Doug