Hosed PostGreSQL Installation - Mailing list pgsql-hackers

From Pete St. Onge
Subject Hosed PostGreSQL Installation
Date
Msg-id 20020921015454.U31893@moria.seul.org
Whole thread Raw
Responses Re: Hosed PostGreSQL Installation  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
As a result of some disk errors on another drive, an admin in our group
brought down the server hosting our pgsql databases with a kill -KILL
after having gone to runlevel 1 and finding the postmaster process still
running. No surprise, our installation was hosed in the process. 

After talking on #postgresql with klamath for about an hour or so to
work through the issue (many thanks!), it was suggested that I send
the info to this list.

Currently, PostGreSQL will no longer start, and gives this error.

bash-2.05$ /usr/bin/pg_ctl  -D $PGDATA -p /usr/bin/postmaster start
postmaster successfully started
bash-2.05$ DEBUG:  database system shutdown was interrupted at
2002-09-19 22:59:54 EDT
DEBUG:  open(logfile 0 seg 0) failed: No such file or directory
DEBUG:  Invalid primary checkPoint record
DEBUG:  open(logfile 0 seg 0) failed: No such file or directory
DEBUG:  Invalid secondary checkPoint record
FATAL 2:  Unable to locate a valid CheckPoint record
/usr/bin/postmaster: Startup proc 11735 exited with status 512 - abort


Our setup is vanilla Red Hat 7.2, having pretty much all of the
postgresql-*-7.1.3-2 packages installed. Klamath asked if I had disabled
fsync in postgresql.conf, and the only non-default (read: non-commented)
setting in the file is: `tcpip_socket = true`


Klamath suggested that I run pg_controldata:

bash-2.05$ ./pg_controldata 
pg_control version number:            71
Catalog version number:               200101061
Database state:                       SHUTDOWNING
pg_control last modified:             Thu Sep 19 22:59:54 2002
Current log file id:                  0
Next log file segment:                1
Latest checkpoint location:           0/1739A0
Prior checkpoint location:            0/1718F0
Latest checkpoint's REDO location:    0/1739A0
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's StartUpID:        21
Latest checkpoint's NextXID:          615
Latest checkpoint's NextOID:          18720
Time of latest checkpoint:            Thu Sep 19 22:49:42 2002
Database block size:                  8192
Blocks per segment of large relation: 131072
LC_COLLATE:                           en_US
LC_CTYPE:                             en_US


If I look into the pg_xlog directory, I see this:

sh-2.05$ cd pg_xlog/
bash-2.05$ ls -l
total 32808
-rw-------    1 postgres postgres 16777216 Sep 20 23:13 0000000000000002
-rw-------    1 postgres postgres 16777216 Sep 19 22:09 000000020000007E


There is one caveat. The installation resides on a partition of its own:
/dev/hda3        17259308   6531140   9851424  40% /var/lib/pgsql/data

fdisk did not report errors for this partition at boot time after the
forced shutdown, however.

This installation serves a university research project, and although
most of the code / schemas are in development (and should be in cvs by
rights), I can't confirm that all projects have indeed done that. So any
advice, ideas or suggestions on how the data and / or schemas can be
recovered would be greatly appreciated.

Many thanks!

-- pete

P.S.: I've been using pgsql for about four years now, and it played a
big role during my grad work. In fact, the availability of pgsql was one
of the reasons why I was able to complete and graduate. Many thanks for
such a great database!


-- 
Pete St. Onge
Research Associate, Computational Biologist, UNIX Admin
Banting and Best Institute of Medical Research
Program in Bioinformatics and Proteomics
University of Toronto
http://www.utoronto.ca/emililab/       pete@seul.org


pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Conversion Questions
Next
From: "Shridhar Daithankar"
Date:
Subject: Re: Improving speed of copy