Dirk Lutzebaeck and I just spent a tense couple of hours trying to
figure out why a large database Down Under wasn't coming up after being
reloaded from a base backup plus PITR recovery. The symptoms were that
the recovery went fine, but backend processes would fail at startup or
soon after with "could not open relation XX/XX/XX: No such file" type of
errors.
The answer that ultimately emerged was that they'd been running a
nightly maintenance script that did REINDEX SYSTEM (among other things
I suppose). The PITR base backup included pg_internal.init files that
were appropriate when it was taken, and the PITR recovery process did
nothing whatsoever to update 'em :-(. So incoming backends picked up
init files with obsolete relfilenode values.
We don't actually need to *update* the file, per se, we only need to
remove it if no longer valid --- the next incoming backend will rebuild
it. I could see fixing this by making WAL recovery run around and zap
all the .init files (only problem is to find 'em), or we could add a new
kind of WAL record saying "remove the .init file for database XYZ"
to be emitted whenever someone removes the active one. Thoughts?
Meanwhile, if you're trying to recover from a PITR backup and it's not
working, try removing any pg_internal.init files you can find.
regards, tom lane