Re: [HACKERS] jacana hung after failing to acquire random number - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] jacana hung after failing to acquire random number
Date
Msg-id 26907.1481556920@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] jacana hung after failing to acquire random number  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 12/12/2016 03:40 PM, Andrew Dunstan wrote:
>> Should one or more of these errors be fatal? Or should we at least get
>> pg_regress to try to shut down the postmaster if it can't connect after
>> 120 seconds?

> Making it fatal, i.e. bringing down the server, doesn't seem like an 
> improvement. If the failure is transient, you don't want to kill the 
> whole server, when one connection attempt fails.

> It would be nice to fail earlier if it's permanently failing, though. 
> Like, if someone does "rm /dev/urandom". Perhaps we should perform one 
> pg_strong_random() call at postmaster startup, and if that fails, refuse 
> to start up.

That's sort of contradictory.  If you're worried about transient failures,
allowing a single failed try to cause postmaster startup failure isn't the
way to make things more robust.  Giving up after a bunch of failed tries
over a very short interval isn't much better.

I'm not sure how hard we need to work here.  The case at hand seems
to be one of simply not having gotten the bugs out of the initial
implementation, so maybe we shouldn't read too much into it.

I do agree that the buildfarm needs to be more robust against broken
postmasters, because finding bugs is its raison d' etre.  But I'm not
convinced that it's a good idea to have the postmaster itself conclude
that there's something wrong with its configured random-number source.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] background sessions
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] exposing wait events for non-backends (was: Trackingwait event for latches)