Late yesterday afternoon our DB server went down hard. we tried to re-start
and it went into recovery mode to recover transaction history and failed.
Notable error was:
FATAL: failed to re-find parent key in index "257969064" for split pages
8366/12375
If you look this error up, it indicates issues with the transaction logs and
the inability to recover due to
corrupt or missing transaction logs.
The solution is to:
1. Back up the DB files
2. Run pg_resetxlogs (this might produce corruption due inconsistent data)
3. Dump the DB
4. Reload the DB
Unfortunately this solution is not practical in our case for multiple
reasons.
- We do not have the space for steps 1 & 3
- The time required for steps 1, 3, and 4 is approximately 1 week per step
(3 weeks total) since our database size is approximately 5.4TB
If the data is in an inconsistent state, are there other alternative
solutions, such as finding the index specified in the FATAL error and
somehow dropping it?
and does anyone know what circumstances/conditions might corrupt or cause a
transaction log to go missing.
thanks