On 3.3.2014 22:31, Andres Freund wrote:
> On 2014-03-03 16:28:21 -0500, Tom Lane wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> Since we have seen this kind of failure reported many times, I
>>> wonder if it'd make sense to check specifically for inability to
>>> resolve localhost, if only to save troubleshooters' time.
>>
>> Right now, you only get a failure of the pgstats subsystem, which
>> is logged. I don't think we can do much more than that unless you
>> want to make it a postmaster-refuses-to-start case, which seems
>> like not a net improvement.
>
> I'd actually say that'd be an improvement. I've, a long time ago,
> spent several hours debugging a case of this, it's nontrivial for a
> beginner. And a normal PG install simply won't work properly without
> pgstat these days, so refusing to startup seems reasonable.
I'm not sure whether that'd be an improvement or not - whether it's
better to log the issue but start the database (and face the issues
later - maybe weeks or months, when the message we logged is lost). Or
fail promptly and force them to fix the actual issue.
A failure to start pgstats subsystem however means
(a) no stats collector process, and thus no autovacuum/autoanalyze
(b) no relpages/reltuples in pg_class or any other statistics (unless
running analyze explicitly, which people don't due as they rely on
autovacuum/autoanalyze)
(c) no transaction wraparound handling (again, no autovacuum running)
So I'd probably vote for failing right away, and mentioning a working
localhost resolution as a requirement in the docs.
If that's unacceptable, maybe it'd be a good idea to modify the
functions backing pg_stat_* views to fail with ERROR, i.e.
if (pgStatSock == PGINVALID_SOCKET)
elog(ERROR, "statistics collector is not running");
because right now it's going to wait for 'pgstat wait timeout'.
regards
Tomas