Re: The database system is in recovery mode - Mailing list pgsql-admin

From Andrew Sullivan
Subject Re: The database system is in recovery mode
Date
Msg-id 20030502141444.GC13419@libertyrms.info
Whole thread Raw
In response to The database system is in recovery mode  (Trevor Astrope <astrope@e-corp.net>)
List pgsql-admin
On Thu, May 01, 2003 at 06:24:03PM -0400, Trevor Astrope wrote:
>  Could this be the linux kernel randomly killing processes under heavy
> load issue?

Not from the look of things.  See below.

> System is postgresql 7.2.1 on redhat 7.2. Here's the logs:

You should really upgrade at least to 7.2.4 (no dump required).
7.2.1 has some nasty bugs.

> 2003-05-01 16:54:08 DEBUG:  server process (pid 2599) was
> terminated by signal 11
                       ^^

That's not signal 9, so it's not the kernel.  Sig 11 is SIGSEV on
Linux, which probably means some sort of memory problem.  Are you
suing ECC RAM for your database?  You should.  In any case, the first
thing I'd do is run memtest86 on it.


> 2003-05-01 16:54:08 DEBUG:  terminating any other active server processes
> 2003-05-01 16:54:08 NOTICE:  Message from PostgreSQL backend:
>         The Postmaster has informed me that some other backend
>         died abnormally and possibly corrupted shared memory.
>         I have rolled back the current transaction and am
>         going to terminate your database system connection and exit.
>         Please reconnect to the database system and repeat your query.
>
> After a bunch of these, the database goes in recovery mode:

That's what it's supposed to do.  It's what WAL buys you.

> I presume this is rerunning the WAL? Is the message serious...could there
> be database corruption or just lost transactions?

Neither, assuming you have good hardware and you're using fsync.  WAL
is there precisely to make the system crash safe.  (Of course, if
it's sitting on an ext2 partition and the system goes down hard, you
have a different batch of problems.  But WAL+fsync protects you from
postmaster crashes, and machine crashes if your filesystem is
crash-safe.)

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


pgsql-admin by date:

Previous
From: JEANARTHUR@EUROVOX.FR
Date:
Subject: problem after an hd failure
Next
From: Tom Lane
Date:
Subject: Re: problem after an hd failure