Thread: recovery do not finish

recovery do not finish

From

Benedek Laszlo

Date:

11 July 2008, 17:35:25

Hello,

I have a serious problem with a production database.
We had a no disk space left on device problem, and postgres did not stop, so it was killed ( kill -9 )
we made free space and reboot, but postgres not start properly.
We have waited for more than 2 hours, but psql still says that the system is starting up.
( Db size is about 20 gb )
the os is Ubuntu 4.0.3-1ubuntu5
postgres is 8.1.4

pg_controldata says the following :
pg_control version number:            812
Catalog version number:               200510211
Database system identifier:           5006307211022564835
Database cluster state:               in recovery
pg_control last modified:             Fri 11 Jul 2008 09:38:54 PM CEST
Current log file ID:                  48
Next log file segment:                205
Latest checkpoint location:           30/C6AECCAC
Prior checkpoint location:            30/C693404C
Latest checkpoint's REDO location:    30/C6AE62A8
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          1441774700
Latest checkpoint's NextOID:          25908
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Time of latest checkpoint:            Fri 11 Jul 2008 05:22:49 PM CEST
Maximum data alignment:               4
Database block size:                  8192
Blocks per segment of large relation: 131072
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Date/time type storage:               64-bit integers
Maximum length of locale name:        128
LC_COLLATE:                           en_US.UTF-8
LC_CTYPE:                             en_US.UTF-8

In the log file we have the following lines :
DEBUG: TZ "W-SU" scores 0: at 1074121200 2004-01-15 02:00:00 std versus 2004-01-15 00:00:00 std
DEBUG: TZ "Zulu" scores 0: at 1074121200 2004-01-14 23:00:00 std versus 2004-01-15 00:00:00 std
LOG: could not load root certificate file "root.crt": No SSL error reported
DETAIL: Will not verify client certificates.
DEBUG: invoking IpcMemoryCreate(size=10461184)
DEBUG: max_safe_fds = 984, usable_fds = 1000, already_open = 6
LOG: database system was interrupted while in recovery at 2008-07-11 19:08:09 CEST
HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
LOG: checkpoint record is at 30/C6AECCAC
LOG: redo record is at 30/C6AE62A8; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 1441774700; next OID: 25908
LOG: next MultiXactId: 1; next MultiXactOffset: 0
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 30/C6AE62A8
LOG: connection received: host=[local]
LOG: incomplete startup packet
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: forked new backend, pid=3760 socket=7
DEBUG: reaping dead processes
DEBUG: server process (PID 3760) exited with exit code 0
LOG: connection received: host=[local]
DEBUG: forked new backend, pid=3763 socket=7
FATAL: the database system is starting up
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: reaping dead processes
DEBUG: server process (PID 3763) exited with exit code 0
LOG: connection received: host=[local]
FATAL: the database system is starting up
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: forked new backend, pid=3766 socket=7
DEBUG: reaping dead processes
DEBUG: server process (PID 3766) exited with exit code 0
LOG: connection received: host=[local]
FATAL: the database system is starting up
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)

Any help is appreciated

________________________________________________________
VÁSÁROLNA? SEGÍTÜNK! Igényeljen hitelkártyát online és élvezze Ön is a vásárlás szabadságát!

Re: recovery do not finish

From

Alvaro Herrera

Date:

11 July 2008, 17:55:45

Benedek Laszlo wrote:
> Hello,
>
> I have a serious problem with a production database.
> We had a no disk space left on device problem, and postgres did not stop, so it was killed ( kill -9 )
> we made free space and reboot, but postgres not start properly.
> We have waited for more than 2 hours, but psql still says that the system is starting up.
> ( Db size is about 20 gb )
> the os is Ubuntu 4.0.3-1ubuntu5
> postgres is 8.1.4

There are known bugs in the recovery code; IIRC some of them may cause
the recovery to hang.  Update to 8.1.13 and try again.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: recovery do not finish

From

Simon Riggs

Date:

11 July 2008, 18:24:57

On Fri, 2008-07-11 at 16:53 -0400, Alvaro Herrera wrote:
> Benedek Laszlo wrote:
> > Hello,
> >
> > I have a serious problem with a production database.
> > We had a no disk space left on device problem, and postgres did not stop, so it was killed ( kill -9 )
> > we made free space and reboot, but postgres not start properly.
> > We have waited for more than 2 hours, but psql still says that the system is starting up.
> > ( Db size is about 20 gb )
> > the os is Ubuntu 4.0.3-1ubuntu5
> > postgres is 8.1.4

2 hours is a long time for a crash recovery. What does the log say?

> There are known bugs in the recovery code; IIRC some of them may cause
> the recovery to hang.

Just to clarify: There are no known bugs in recovery code at the highest
maintenance release level in any version of PostgreSQL after 8.0 on
Linux, possibly before that also. So upgrade, yes, very soon.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Re: recovery do not finish

From

Greg Smith

Date:

11 July 2008, 21:44:04

On Fri, 11 Jul 2008, Benedek Laszlo wrote:

> We have waited for more than 2 hours, but psql still says that the system is starting up.

In addition to the suggestions you've gotten to try a later version
instead, I'd add that you should consider running vmstat in another window
while the recovery is going on.  Recovery should show a steady flow of
data moving to and from disk with a fair amount of CPU usage.  If that
stops for a while, it's probably stuck because of a recovery bug or some
other issue, rather than still processing such that you should wait for
it.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD