Thread: recovery do not finish
Hello,
I have a serious problem with a production database.
We had a no disk space left on device problem, and postgres did not stop, so it was killed ( kill -9 )
we made free space and reboot, but postgres not start properly.
We have waited for more than 2 hours, but psql still says that the system is starting up.
( Db size is about 20 gb )
the os is Ubuntu 4.0.3-1ubuntu5
postgres is 8.1.4
pg_controldata says the following :
pg_control version number: 812
Catalog version number: 200510211
Database system identifier: 5006307211022564835
Database cluster state: in recovery
pg_control last modified: Fri 11 Jul 2008 09:38:54 PM CEST
Current log file ID: 48
Next log file segment: 205
Latest checkpoint location: 30/C6AECCAC
Prior checkpoint location: 30/C693404C
Latest checkpoint's REDO location: 30/C6AE62A8
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 1441774700
Latest checkpoint's NextOID: 25908
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Time of latest checkpoint: Fri 11 Jul 2008 05:22:49 PM CEST
Maximum data alignment: 4
Database block size: 8192
Blocks per segment of large relation: 131072
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Date/time type storage: 64-bit integers
Maximum length of locale name: 128
LC_COLLATE: en_US.UTF-8
LC_CTYPE: en_US.UTF-8
In the log file we have the following lines :
DEBUG: TZ "W-SU" scores 0: at 1074121200 2004-01-15 02:00:00 std versus 2004-01-15 00:00:00 std
DEBUG: TZ "Zulu" scores 0: at 1074121200 2004-01-14 23:00:00 std versus 2004-01-15 00:00:00 std
LOG: could not load root certificate file "root.crt": No SSL error reported
DETAIL: Will not verify client certificates.
DEBUG: invoking IpcMemoryCreate(size=10461184)
DEBUG: max_safe_fds = 984, usable_fds = 1000, already_open = 6
LOG: database system was interrupted while in recovery at 2008-07-11 19:08:09 CEST
HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
LOG: checkpoint record is at 30/C6AECCAC
LOG: redo record is at 30/C6AE62A8; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 1441774700; next OID: 25908
LOG: next MultiXactId: 1; next MultiXactOffset: 0
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 30/C6AE62A8
LOG: connection received: host=[local]
LOG: incomplete startup packet
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: forked new backend, pid=3760 socket=7
DEBUG: reaping dead processes
DEBUG: server process (PID 3760) exited with exit code 0
LOG: connection received: host=[local]
DEBUG: forked new backend, pid=3763 socket=7
FATAL: the database system is starting up
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: reaping dead processes
DEBUG: server process (PID 3763) exited with exit code 0
LOG: connection received: host=[local]
FATAL: the database system is starting up
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: forked new backend, pid=3766 socket=7
DEBUG: reaping dead processes
DEBUG: server process (PID 3766) exited with exit code 0
LOG: connection received: host=[local]
FATAL: the database system is starting up
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
Any help is appreciated
________________________________________________________
VÁSÁROLNA? SEGÍTÜNK! Igényeljen hitelkártyát online és élvezze Ön is a vásárlás szabadságát!
I have a serious problem with a production database.
We had a no disk space left on device problem, and postgres did not stop, so it was killed ( kill -9 )
we made free space and reboot, but postgres not start properly.
We have waited for more than 2 hours, but psql still says that the system is starting up.
( Db size is about 20 gb )
the os is Ubuntu 4.0.3-1ubuntu5
postgres is 8.1.4
pg_controldata says the following :
pg_control version number: 812
Catalog version number: 200510211
Database system identifier: 5006307211022564835
Database cluster state: in recovery
pg_control last modified: Fri 11 Jul 2008 09:38:54 PM CEST
Current log file ID: 48
Next log file segment: 205
Latest checkpoint location: 30/C6AECCAC
Prior checkpoint location: 30/C693404C
Latest checkpoint's REDO location: 30/C6AE62A8
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 1441774700
Latest checkpoint's NextOID: 25908
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Time of latest checkpoint: Fri 11 Jul 2008 05:22:49 PM CEST
Maximum data alignment: 4
Database block size: 8192
Blocks per segment of large relation: 131072
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Date/time type storage: 64-bit integers
Maximum length of locale name: 128
LC_COLLATE: en_US.UTF-8
LC_CTYPE: en_US.UTF-8
In the log file we have the following lines :
DEBUG: TZ "W-SU" scores 0: at 1074121200 2004-01-15 02:00:00 std versus 2004-01-15 00:00:00 std
DEBUG: TZ "Zulu" scores 0: at 1074121200 2004-01-14 23:00:00 std versus 2004-01-15 00:00:00 std
LOG: could not load root certificate file "root.crt": No SSL error reported
DETAIL: Will not verify client certificates.
DEBUG: invoking IpcMemoryCreate(size=10461184)
DEBUG: max_safe_fds = 984, usable_fds = 1000, already_open = 6
LOG: database system was interrupted while in recovery at 2008-07-11 19:08:09 CEST
HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
LOG: checkpoint record is at 30/C6AECCAC
LOG: redo record is at 30/C6AE62A8; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 1441774700; next OID: 25908
LOG: next MultiXactId: 1; next MultiXactOffset: 0
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 30/C6AE62A8
LOG: connection received: host=[local]
LOG: incomplete startup packet
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: forked new backend, pid=3760 socket=7
DEBUG: reaping dead processes
DEBUG: server process (PID 3760) exited with exit code 0
LOG: connection received: host=[local]
DEBUG: forked new backend, pid=3763 socket=7
FATAL: the database system is starting up
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: reaping dead processes
DEBUG: server process (PID 3763) exited with exit code 0
LOG: connection received: host=[local]
FATAL: the database system is starting up
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: forked new backend, pid=3766 socket=7
DEBUG: reaping dead processes
DEBUG: server process (PID 3766) exited with exit code 0
LOG: connection received: host=[local]
FATAL: the database system is starting up
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
Any help is appreciated
________________________________________________________
VÁSÁROLNA? SEGÍTÜNK! Igényeljen hitelkártyát online és élvezze Ön is a vásárlás szabadságát!
Benedek Laszlo wrote: > Hello, > > I have a serious problem with a production database. > We had a no disk space left on device problem, and postgres did not stop, so it was killed ( kill -9 ) > we made free space and reboot, but postgres not start properly. > We have waited for more than 2 hours, but psql still says that the system is starting up. > ( Db size is about 20 gb ) > the os is Ubuntu 4.0.3-1ubuntu5 > postgres is 8.1.4 There are known bugs in the recovery code; IIRC some of them may cause the recovery to hang. Update to 8.1.13 and try again. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Fri, 2008-07-11 at 16:53 -0400, Alvaro Herrera wrote: > Benedek Laszlo wrote: > > Hello, > > > > I have a serious problem with a production database. > > We had a no disk space left on device problem, and postgres did not stop, so it was killed ( kill -9 ) > > we made free space and reboot, but postgres not start properly. > > We have waited for more than 2 hours, but psql still says that the system is starting up. > > ( Db size is about 20 gb ) > > the os is Ubuntu 4.0.3-1ubuntu5 > > postgres is 8.1.4 2 hours is a long time for a crash recovery. What does the log say? > There are known bugs in the recovery code; IIRC some of them may cause > the recovery to hang. Just to clarify: There are no known bugs in recovery code at the highest maintenance release level in any version of PostgreSQL after 8.0 on Linux, possibly before that also. So upgrade, yes, very soon. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Fri, 11 Jul 2008, Benedek Laszlo wrote: > We have waited for more than 2 hours, but psql still says that the system is starting up. In addition to the suggestions you've gotten to try a later version instead, I'd add that you should consider running vmstat in another window while the recovery is going on. Recovery should show a steady flow of data moving to and from disk with a fair amount of CPU usage. If that stops for a while, it's probably stuck because of a recovery bug or some other issue, rather than still processing such that you should wait for it. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD