"Freddie Burgess" <fburgess@radiantblue.com> writes:
> Late yesterday afternoon our DB server went down hard. we tried to re-start
> and it went into recovery mode to recover transaction history and failed.
Oh dear. I'll spare you the lecture about running up-to-date minor
releases, because this doesn't seem to exactly match any known-and-fixed
bug in 8.4.x, but you will reconsider that policy no?
> Notable error was:
> FATAL: failed to re-find parent key in index "257969064" for split pages
> 8366/12375
> If you look this error up, it indicates issues with the transaction logs and
> the inability to recover due to corrupt or missing transaction logs.
Don't know where you read that, but this has nothing to do with
transaction logs. It appears to indicate pre-existing corruption in a
btree index, specifically that there is no downlink in the next index
level up pointing to an index page that just got split according to the
WAL event stream. Unfortunately the WAL replay environment is too
limited to give you a very good fix on *which* index, unless you have
some other way of knowing what relfilenode 257969064 corresponds to.
What exactly happened when your DB server "went down hard"? It seems
somewhat likely that that involved filesystem corruption that WAL replay
was unable to fix. Since the replay got as far as here, the next index
level up doesn't contain obvious corruption, it just lacks an entry that
should be there. The most likely inference is that there's a stale page
where the downlink entry should have been, which suggests strongly that
fsync failed to write a page when it was supposed to (leading to
Postgres supposing that the index page was safely down on disk and could
be forgotten about, when it was not).
That's all a rather roundabout way of saying that you'd be well advised
to take a hard look at your filesystem configuration. These symptoms
are very consistent with the theory that the filesystem is not honoring
fsync requests properly.
> If the data is in an inconsistent state, are there other alternative
> solutions, such as finding the index specified in the FATAL error and
> somehow dropping it?
You could try modifying the source code to reduce that error to a
WARNING temporarily, but I'm not sure how far it will get you. The
concern here is that if one page wasn't written when it was supposed to
be, there are probably others too. And most likely not all of them
are index pages.
Anyway, if it appears that you're not far away from the end of the xlog,
yeah I'd try disabling error tests until it gets to the end of xlog.
Once the DB is up you can determine which indexes were mentioned in the
messages (look at pg_class.relfilenode) and drop or reindex them.
A preferable solution, if you still had the WAL logs from the previous
checkpoint cycle, would be to back up the WAL pointer to the preceding
checkpoint and replay forward from there. (This is essentially betting
that all the corruption stems from writes that failed to get to disk
just before the last checkpoint.) However, unless you've been running a
PITR archiving setup, you probably haven't got that option.
regards, tom lane