Someone flipped a breaker switch, and evidently triggered
corruption in one of our major clusters:
$ cat server_log.Mon
postmaster successfully started
2005-06-06 14:31:11.950 [20124] LOG: database system was interrupted being in recovery at 2005-06-06 14:29:01 EDT
This probably means that some data blocks are corrupted
and you will have to use the last backup for recovery.
2005-06-06 14:31:11.950 [20124] LOG: checkpoint record is at EF/EBB7AFC8
2005-06-06 14:31:11.950 [20124] LOG: redo record is at EF/EBA91EF0; undo record is at 0/0; shutdown FALSE
2005-06-06 14:31:11.950 [20124] LOG: next transaction id: 577477594; next oid: 89750885
2005-06-06 14:31:11.951 [20124] LOG: database system was not properly shut down; automatic recovery in progress
2005-06-06 14:31:11.952 [20124] LOG: redo starts at EF/EBA91EF0
2005-06-06 14:31:11.984 [20124] PANIC: Invalid page header in block 22376 of 79189398
2005-06-06 14:31:12.275 [20121] LOG: startup process (pid 20124) was terminated by signal 6
2005-06-06 14:31:12.275 [20121] LOG: aborting startup due to startup process failure
We have backups from 10 hours earlier, but the obvious
question: what, if anything, can I do now to salvage those
10 hours of data from this?
I guess I could zero the block? I'm a little uncertain since
I don't know what I'm zeroing (pg_database? pg_class?), and
can't start up to see what that relfilenode maps to...
Going to look at it with pg_filedump, maybe oid2filename or
whatever that utility is...
Thanks,
Ed