Recovering from database corruption using WAL-logs - Mailing list pgsql-general

From Kristian Klette via RT
Subject Recovering from database corruption using WAL-logs
Date
Msg-id 20090120092138.GK18606@samfundet.no
Whole thread Raw
List pgsql-general
Hi!

Last autumn we discovered a case of database corruption in our databases
(missing rows with foreign keys pointed at them). At the time we ran version
PostgreSQL 8.3.1.  We upgraded our postgres to 8.3.4, but somehow the
restoration from backups got forgotten. Early this year things where
remembered, allthough the backups of the base from before the corruption was
rotated out, and is now lost. We do however have a backup from the night before
we first discovered the corruption, but we are unsure about its state (we do
not know if the database was corrupted at the time of this backup).

We do however, out of sheer luck (maybe), have all our WAL-logs from may 2008
to the present.

So we figured we'd give WAL-replay a shot at the backup mention earlier,
following the documentation (24.3.3). And it seemed to work for a while,
replaying a good lot of logs but it stops at the same file, everytime, with
this message:

LOG: restored log file "00000001000000EB000000A1" from archive
LOG: invalid contrecord length 4674 in log file 235, segment 161, offset 8192
LOG: redo done at EB/A1001AB4
LOG: last completed transaction was at log time 2008-09-29 20:12:05.551693+02»
LOG: restored log file "00000001000000EB000000A1" from archive
scp: /home/pgbackup/merged/00000002.history: No such file or directory

The file in question is dated the day after we discovered the corruption, and
its not the last in that timeline (we only have one timeline).  The WAL-log
shows no external signs of brokenness in my eyes, as its the same size as the
rest and created at the same interval. We can provide this file, if it would
help figure out whats wrong in any way.

We also tried setting the recovery_target to a time before the "last completed
transaction" time, and various other target times right up to right after the
time of the backup, but it still tries to play all the files, and fails on the
same one.

At this time we are quite stuck on this problem, so we're really hoping for
some insight into this, even though its our own fault for not managing to
restore an even earlier backup and replaying from that.

As mentioned, we'd be happy to provide any more information that might help us
recover our database.

Sincerly,
Kristian Klette

--
Kristian Klette
«Programs for sale: Fast, Reliable, Cheap: choose two.»

pgsql-general by date:

Previous
From: Kristian Klette via RT
Date:
Subject: Recovering from database corruption using WAL-logs
Next
From: Andreas Wenk
Date:
Subject: import sql dump with psql - language creation throws error