Re: What is the best and easiest implementation to reliably wait for the completion of startup? - Mailing list pgsql-hackers

From MauMau
Subject Re: What is the best and easiest implementation to reliably wait for the completion of startup?
Date
Msg-id 45EA0477951744379296D472397BCDD4@maumau
Whole thread Raw
In response to Re: What is the best and easiest implementation to reliably wait for the completion of startup?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
From: "Tom Lane" <tgl@sss.pgh.pa.us>
> "MauMau" <maumau307@gmail.com> writes:
>> The bad thing is that pg_ctl continues to wait until the specified 
>> duration
>> passes, even if postgres fails to start. For example, it is naturally
>> desirable for pg_ctl to terminate when postgresql.conf contains a syntax
>> error.
>
> Hmm, I thought we'd fixed this in the last go-round of pg_ctl wait
> revisions, but testing proves it does not work desirably in HEAD:
> not only does pg_ctl wait till its timeout elapses, but it then reports
> "server started" even though the server didn't start.  That's clearly a
> bug :-(
>
> I think your proposal of a pipe-based solution might be overkill though.
> Seems like it would be sufficient for pg_ctl to give up if it doesn't
> see the postmaster.pid file present within a couple of seconds of
> postmaster startup.  I don't really want to add logic to the postmaster
> to have the sort of reporting protocol you propose, because not
> everybody uses pg_ctl to start the postmaster.  In any case, we need a
> fix in 9.1 ...

Yes, I was a bit afraid the pipe-based fix might be overkill, too, so I was 
wondering if there might be a more easy solution.

"server started"... I missed it. That's certainly a bug, as you say.

I was also considering the postmaster.pid-based solution exactly as you 
suggest, but that has a problem -- how many seconds do we assume for "a 
couple of seconds"? If the system load is temporarily so high that 
postmaster takes many seconds to create postmaster.pid, pg_ctl mistakenly 
thinks that postmaster failed to start. I know this is a hypothetical rare 
case. I don't like touching the postmaster logic and complicating it, but 
logical correctness needs to come first (Japanese users are very severe).

Another problem with postmaster.pid-based solution happens after postmaster 
crashes. When postmaster crashes, postmaster.pid is left. If the pid in 
postmaster.pid is allocated to some non-postgres process and that process 
remains, pg_ctl misjudges that postmaster is starting up, and waits for long 
time.

Regards
MauMau



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [COMMITTERS] pgsql: Allow ALTER TABLE name {OF type | NOT OF}.
Next
From: "MauMau"
Date:
Subject: Re: How can I check the treatment of bug fixes?