Thread: How to wait until startup completes

How to wait until startup completes

From
Gary Horton
Date:
I'm starting up postgresql with this command line:

/usr/bin/setpgrp ${POSTGRESQL_HOME}/bin/pg_ctl -w -o "-i" start

...and there are two things about this that raise a question. First, we
use the setpgrp because, although pg_ctl documentation (7.3.4) states
that it can be used for  "properly detaching from the terminal and
process group", we have not found this to work as expected. In
particular, when a ctl-C is issued from the same tty as was used for the
pg_ctl command (without the setpgrp), the associated postmaster
processes are killed (with a fast shutdown, i.e. they receive the SIGINT
signal from the ctl-C). Prefacing pg_ctl with setpgrp addresses this
problem for us. So, the first question would be, what am I not
understanding here?

More importantly, we want to start up the database "completely" before
any client connections are attempted. This matters to us in our
build/test environment where we have complete control over client
connections; we're only trying to be certain the database is ready
before launching a series of builds/tests that don't initialize properly
due to the "FATAL: The database system is starting up" messages, which
are received on each attempted client connection, and wasting a whole
build-test cycle as a result. According the pg_ctl documentation again,
we're told that the -w flag will cause pg_ctl to "Wait for the start or
shutdown to complete", and we hoped this would effectively cause pg_ctl
to block until the database startup is "complete"; however this too does
not work as expected. We are testing for "startup-is-complete" with the
following:

until ${POSTGRESQL_HOME}/bin/psql -l > /dev/null 2>&1 ||
${POSTGRESQL_HOME}/bin/psql -l -d template1 -U $PORTAL_DB_OWNER

...and it happens that /sometimes/ this test succeeds and yet subsequent
client connections fail due to the "FATAL: The database system is
starting up" condition.

 From this, a whole series of questions comes up in my mind:

- is this the proper idiom to test for database startup being "complete"
(i.e. "complete" means that the "FATAL: The database system is starting
up" error messages should not occur)?
- If not, what test should we be using instead?
- If so, what could be happening here?
- is there a better way to start up postresql to achieve what we want?
- is setpgrp somehow getting in the way of the -w flag working as desired?

Thanks for any insights -
Gary Horton


Re: How to wait until startup completes

From
Tom Lane
Date:
Gary Horton <Gary.Horton@Sun.COM> writes:
> [ assorted startup problems ]

You did not say what platform this is on, nor which Postgres version
you are running.  Tsk tsk.

As for the setpgrp business, that doesn't sound real unreasonable.
I use nohup for that purpose, and it seems to work fine on all the
platforms I use, but perhaps on yours setpgrp is the best incantation.
(Of course, for ordinary production work you should be launching the
postmaster from an init script and not from a manual command at all...)

The -w-doesn't-wait-long-enough bit needs investigation.  There are
known failure modes for -w, like setting up your access permissions
so that pg_ctl can't log in, but AFAIK that results in waiting till
timeout not in falling through immediately.  Are any messages produced
when you do this?  If you don't see anything, try running the script
with -x to see what it's doing exactly.

            regards, tom lane

Re: How to wait until startup completes

From
Gary Horton
Date:
Tom Lane wrote:
Gary Horton <Gary.Horton@Sun.COM> writes: 
[ assorted startup problems ]   
You did not say what platform this is on, nor which Postgres version
you are running.  Tsk tsk. 
Actually I did mention 7.3.4 Postgres but obviously I didn't do it clearly. I really did mean to mention that OS is Solaris 8 and 9...
As for the setpgrp business, that doesn't sound real unreasonable.
I use nohup for that purpose, and it seems to work fine on all the
platforms I use, but perhaps on yours setpgrp is the best incantation. 
Right, nohup deals with SIGHUP but you need setpgrp to deal with SIGINT from controlling tty...
(Of course, for ordinary production work you should be launching the
postmaster from an init script and not from a manual command at all...) 
Agreed, but we also want to support our users doing admin on our product, which may involve bouncing database server.
The -w-doesn't-wait-long-enough bit needs investigation.  There are
known failure modes for -w, like setting up your access permissions
so that pg_ctl can't log in, but AFAIK that results in waiting till
timeout not in falling through immediately.  Are any messages produced
when you do this?  If you don't see anything, try running the script
with -x to see what it's doing exactly.
No messages, no smoking gun. If you mean running the sh script with -x, it's really not complicated enough to warrant that - I've added echo statements to confirm that it's just falling through. But let's be sure we're saying the same thing: I would expect pg_ctl -w to /block/ until database is started, in other words by the time the pg_ctl command "returns", then checking database status with psql -l should succeed...are we on the same page?

Thanks for your time, Tom -
-Gary

Re: How to wait until startup completes

From
Tom Lane
Date:
Gary Horton <Gary.Horton@Sun.COM> writes:
> Tom Lane wrote:
>> The -w-doesn't-wait-long-enough bit needs investigation.

> No messages, no smoking gun. If you mean running the sh script with -x,
> it's really not complicated enough to warrant that - I've added echo
> statements to confirm that it's just falling through. But let's be sure
> we're saying the same thing: I would expect pg_ctl -w to /block/ until
> database is started, in other words by the time the pg_ctl command
> "returns", then checking database status with psql -l should
> succeed...are we on the same page?

Yeah ... in fact, if you read the script, what it does is loop until a
psql -l succeeds ... so why wouldn't your following instance also
succeed?

            regards, tom lane

Re: How to wait until startup completes

From
Gary Horton
Date:
Tom Lane wrote:
Yeah ... in fact, if you read the script, what it does is loop until a
psql -l succeeds ... so why wouldn't your following instance also
succeed?
		regards, tom lane 
Ah, I think that you mean to run pg_ctl with a -x option (not my own sh script). I didn't realize I could do this -- I'll give it a shot --

Thx again Tom -
-Gary