Quoting Tom Lane <tgl@sss.pgh.pa.us>:
> Mischa Sandberg <mischa_sandberg@telus.net> writes:
> > Quoting Tom Lane <tgl@sss.pgh.pa.us>:
> >> I'd bet that the pg_ctl status part is failing. I get exit status
> 1
> >> from it if there's no server running.
>
> > Yes, that was part of the problem with the original startup
> script;
> > postmaster hadn't even gotten as far as writing postmaster.pid,
> > I guess. But pg_ctl status returning 1 could also mean that that
> the
> > server had come up, hit a critical problem and exited. Hence my
> problem;
> > this has to detect server failure, reliably, as well.
>
> You could sleep for a second or so *before* you start looking for
> the
> pidfile.
The systems are under erratic load, due to concurrent
cpu and diskio spikes around start-up time.
1-2 secs is not enough to be a guarantee :-(
Probably not explaining the issues well;
caught between two constraints that aren't really pg's problem;
and wide clusters with automated admin, variable hardware
and spikes of db restarts are no doubt an oddball edge case.
There are workarounds; was hoping for something
clean and obvious (to all but me).
Switching back to tailing the log files and moving on.
Thanks everyone.
--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.