Berge Schwebs =?utf-8?Q?Bj=C3=B8rlo?= <berge@trivini.no> writes:
> Recently, WAL archiving begain failing on the test which checks wether the
> file exists. This first occured two hours after an incident where someone
> edited pg_hba.conf and left it with permissions denying Postgres read access
> to it. Upon SIGHUP the cluster naturally shut down.
FWIW, versions later than 8.3 don't "naturally shut down" for that;
they'll just keep running with the old settings.
> It was discovered
> promptly, and according to this person, there were "some processes named
> postgres still running". He ran "/etc/init.d/postgresql-8.3 start" anyway,
> which brought up the cluster:
If there were old backends still running then the postmaster should not
have started. I have a nasty feeling that you have one of the start
scripts that takes it upon itself to blow away the postmaster.pid file,
which is a necessary part of the interlock that prevents that from
happening. If that happened, you would have had some old backends
running with one idea of the current xlog location, and some other
backends running with another idea of the current xlog location, and
it would not have taken long for the database to get completely
scrambled :-(. The duplicated WAL segment file would be an unsurprising
consequence of that, but I'm much more worried about what happened to
your data because of duplicate XID numbers. Have you seen any evidence
of data corruption on the master database?
regards, tom lane