Quoting Tom Lane <tgl@sss.pgh.pa.us>:
> Mischa Sandberg <mischa_sandberg@telus.net> writes:
> > Perhaps it's my "test for DB ready" that's the problem?
>
> > + while pg_ctl status && ! psql -l; do sleep 1; done >/dev/null
> 2>&1
>
> I'd bet that the pg_ctl status part is failing. I get exit status 1
> from it if there's no server running.
Yes, that was part of the problem with the original startup script;
postmaster hadn't even gotten as far as writing postmaster.pid,
I guess. But pg_ctl status returning 1 could also mean that that the
server had come up, hit a critical problem and exited. Hence my problem;
this has to detect server failure, reliably, as well.
BTW the example with (start,status,psql,createlang) failing just
happened, to my surprise, on my dev box -- fairly fast and lightly
loaded. On loaded, unattended systems, it happened consistently.
............
In another vein, another place where there are consistent
failures is in the sequence:
createlang ... -d template1 plpgsql
createdb $PGDATABASE
<app>
The failure can happen on createdb ("template1 is busy")
or on <app>; and most frequently on the systems with overloaded disks.
My hacky response is to separate those steps with:
psql -qc checkpoint template1
which consistently makes the problem go away; but what is
the problem, exactly, that this is tripping over??
Anyway, thanks for the comments.
--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.