Thread: How to wait until startup completes
I'm starting up postgresql with this command line: /usr/bin/setpgrp ${POSTGRESQL_HOME}/bin/pg_ctl -w -o "-i" start ...and there are two things about this that raise a question. First, we use the setpgrp because, although pg_ctl documentation (7.3.4) states that it can be used for "properly detaching from the terminal and process group", we have not found this to work as expected. In particular, when a ctl-C is issued from the same tty as was used for the pg_ctl command (without the setpgrp), the associated postmaster processes are killed (with a fast shutdown, i.e. they receive the SIGINT signal from the ctl-C). Prefacing pg_ctl with setpgrp addresses this problem for us. So, the first question would be, what am I not understanding here? More importantly, we want to start up the database "completely" before any client connections are attempted. This matters to us in our build/test environment where we have complete control over client connections; we're only trying to be certain the database is ready before launching a series of builds/tests that don't initialize properly due to the "FATAL: The database system is starting up" messages, which are received on each attempted client connection, and wasting a whole build-test cycle as a result. According the pg_ctl documentation again, we're told that the -w flag will cause pg_ctl to "Wait for the start or shutdown to complete", and we hoped this would effectively cause pg_ctl to block until the database startup is "complete"; however this too does not work as expected. We are testing for "startup-is-complete" with the following: until ${POSTGRESQL_HOME}/bin/psql -l > /dev/null 2>&1 || ${POSTGRESQL_HOME}/bin/psql -l -d template1 -U $PORTAL_DB_OWNER ...and it happens that /sometimes/ this test succeeds and yet subsequent client connections fail due to the "FATAL: The database system is starting up" condition. From this, a whole series of questions comes up in my mind: - is this the proper idiom to test for database startup being "complete" (i.e. "complete" means that the "FATAL: The database system is starting up" error messages should not occur)? - If not, what test should we be using instead? - If so, what could be happening here? - is there a better way to start up postresql to achieve what we want? - is setpgrp somehow getting in the way of the -w flag working as desired? Thanks for any insights - Gary Horton
Gary Horton <Gary.Horton@Sun.COM> writes: > [ assorted startup problems ] You did not say what platform this is on, nor which Postgres version you are running. Tsk tsk. As for the setpgrp business, that doesn't sound real unreasonable. I use nohup for that purpose, and it seems to work fine on all the platforms I use, but perhaps on yours setpgrp is the best incantation. (Of course, for ordinary production work you should be launching the postmaster from an init script and not from a manual command at all...) The -w-doesn't-wait-long-enough bit needs investigation. There are known failure modes for -w, like setting up your access permissions so that pg_ctl can't log in, but AFAIK that results in waiting till timeout not in falling through immediately. Are any messages produced when you do this? If you don't see anything, try running the script with -x to see what it's doing exactly. regards, tom lane
Tom Lane wrote:
Thanks for your time, Tom -
-Gary
Actually I did mention 7.3.4 Postgres but obviously I didn't do it clearly. I really did mean to mention that OS is Solaris 8 and 9...Gary Horton <Gary.Horton@Sun.COM> writes:[ assorted startup problems ]You did not say what platform this is on, nor which Postgres version you are running. Tsk tsk.
Right, nohup deals with SIGHUP but you need setpgrp to deal with SIGINT from controlling tty...As for the setpgrp business, that doesn't sound real unreasonable. I use nohup for that purpose, and it seems to work fine on all the platforms I use, but perhaps on yours setpgrp is the best incantation.
Agreed, but we also want to support our users doing admin on our product, which may involve bouncing database server.(Of course, for ordinary production work you should be launching the postmaster from an init script and not from a manual command at all...)
No messages, no smoking gun. If you mean running the sh script with -x, it's really not complicated enough to warrant that - I've added echo statements to confirm that it's just falling through. But let's be sure we're saying the same thing: I would expect pg_ctl -w to /block/ until database is started, in other words by the time the pg_ctl command "returns", then checking database status with psql -l should succeed...are we on the same page?The -w-doesn't-wait-long-enough bit needs investigation. There are known failure modes for -w, like setting up your access permissions so that pg_ctl can't log in, but AFAIK that results in waiting till timeout not in falling through immediately. Are any messages produced when you do this? If you don't see anything, try running the script with -x to see what it's doing exactly.
Thanks for your time, Tom -
-Gary
Gary Horton <Gary.Horton@Sun.COM> writes: > Tom Lane wrote: >> The -w-doesn't-wait-long-enough bit needs investigation. > No messages, no smoking gun. If you mean running the sh script with -x, > it's really not complicated enough to warrant that - I've added echo > statements to confirm that it's just falling through. But let's be sure > we're saying the same thing: I would expect pg_ctl -w to /block/ until > database is started, in other words by the time the pg_ctl command > "returns", then checking database status with psql -l should > succeed...are we on the same page? Yeah ... in fact, if you read the script, what it does is loop until a psql -l succeeds ... so why wouldn't your following instance also succeed? regards, tom lane
Tom Lane wrote:
Thx again Tom -
-Gary
Ah, I think that you mean to run pg_ctl with a -x option (not my own sh script). I didn't realize I could do this -- I'll give it a shot --Yeah ... in fact, if you read the script, what it does is loop until a psql -l succeeds ... so why wouldn't your following instance also succeed? regards, tom lane
Thx again Tom -
-Gary