Thread: What is the best and easiest implementation to reliably wait for the completion of startup?

Hello,

I've encountered a problem of PostgreSQL startup, and I can think of a 
simple solution for that. However, I don't yet have much knowledge about 
PostgreSQL implementation, I'd like to ask you about what is the best and 
easiest solution. If it is easy for me to work on during my spare time at 
home, I'm willing to implement the patch.

[problem]
I can't reliably wait for the completion of PostgreSQL startup. I want 
pg_ctl to wait until the server completes startup and accepts connections.

Yes, we have "-w" and "-t wait_second" options of pg_ctl. However, what 
value should I specify to -t? I have to specify much time, say 3600 seconds, 
in case the startup processing takes long for crash recovery or archive 
recovery.

The bad thing is that pg_ctl continues to wait until the specified duration 
passes, even if postgres fails to start. For example, it is naturally 
desirable for pg_ctl to terminate when postgresql.conf contains a syntax 
error.


[solution idea]
Use unnamed pipes for postmaster to notify pg_ctl of the completion of 
startup. That is:

pg_ctl's steps:
1. create a pair of unnamed pipes.
2. starts postgres.
3. read the pipe, waiting for a startup completion message from postmaster.

postmaster's steps:
1. inherit a pair of unnamed pipes from pg_ctl.
2. do startup processing.
3. write a startup completion message to the pipe, then closes the pipe.

I'm wondering if this is correct and easy. One concern is whether postmaster 
can inherit pipes through system() call.

Please give me your ideas. Of course, I would be very happy if some 
experienced community member could address this problem.

And finally, do you think this should be handled as a bug, or an improvement 
in 9.2?

Regards
MauMau



"MauMau" <maumau307@gmail.com> writes:
> The bad thing is that pg_ctl continues to wait until the specified duration 
> passes, even if postgres fails to start. For example, it is naturally 
> desirable for pg_ctl to terminate when postgresql.conf contains a syntax 
> error.

Hmm, I thought we'd fixed this in the last go-round of pg_ctl wait
revisions, but testing proves it does not work desirably in HEAD:
not only does pg_ctl wait till its timeout elapses, but it then reports
"server started" even though the server didn't start.  That's clearly a
bug :-(

I think your proposal of a pipe-based solution might be overkill though.
Seems like it would be sufficient for pg_ctl to give up if it doesn't
see the postmaster.pid file present within a couple of seconds of
postmaster startup.  I don't really want to add logic to the postmaster
to have the sort of reporting protocol you propose, because not
everybody uses pg_ctl to start the postmaster.  In any case, we need a
fix in 9.1 ...
        regards, tom lane


From: "Tom Lane" <tgl@sss.pgh.pa.us>
> "MauMau" <maumau307@gmail.com> writes:
>> The bad thing is that pg_ctl continues to wait until the specified 
>> duration
>> passes, even if postgres fails to start. For example, it is naturally
>> desirable for pg_ctl to terminate when postgresql.conf contains a syntax
>> error.
>
> Hmm, I thought we'd fixed this in the last go-round of pg_ctl wait
> revisions, but testing proves it does not work desirably in HEAD:
> not only does pg_ctl wait till its timeout elapses, but it then reports
> "server started" even though the server didn't start.  That's clearly a
> bug :-(
>
> I think your proposal of a pipe-based solution might be overkill though.
> Seems like it would be sufficient for pg_ctl to give up if it doesn't
> see the postmaster.pid file present within a couple of seconds of
> postmaster startup.  I don't really want to add logic to the postmaster
> to have the sort of reporting protocol you propose, because not
> everybody uses pg_ctl to start the postmaster.  In any case, we need a
> fix in 9.1 ...

Yes, I was a bit afraid the pipe-based fix might be overkill, too, so I was 
wondering if there might be a more easy solution.

"server started"... I missed it. That's certainly a bug, as you say.

I was also considering the postmaster.pid-based solution exactly as you 
suggest, but that has a problem -- how many seconds do we assume for "a 
couple of seconds"? If the system load is temporarily so high that 
postmaster takes many seconds to create postmaster.pid, pg_ctl mistakenly 
thinks that postmaster failed to start. I know this is a hypothetical rare 
case. I don't like touching the postmaster logic and complicating it, but 
logical correctness needs to come first (Japanese users are very severe).

Another problem with postmaster.pid-based solution happens after postmaster 
crashes. When postmaster crashes, postmaster.pid is left. If the pid in 
postmaster.pid is allocated to some non-postgres process and that process 
remains, pg_ctl misjudges that postmaster is starting up, and waits for long 
time.

Regards
MauMau