Andrew Dunstan wrote:
>
>
> Tom Lane wrote:
>
>> I see one occurrence in the 8.1 branch on hyena, but the failure
>> probability seems to have jumped way up in HEAD since we put in the
>> C-coded pg_regress. This lends weight to the idea that it's a
>> timing-related issue, because pg_regress.c is presumably much faster
>> at forking off a parallel gang of psqls than the shell script was;
>> and it's hard to see what else about the pg_regress change could be
>> affecting the psqls' ability to connect once forked.
>>
>> We probably need to get some Solaris experts involved in diagnosing
>> what's happening. Judging by the buildfarm results you should be able
>> to replicate it fairly easily by doing "make installcheck-parallel"
>> repeatedly.
>>
>
> I will refer this to those experts - my Solaris-fu is a tad rusty these
> days.
How Tom mentioned, problem is in the size of TCP connection queue
(parameter tcp_conn_req_max_q). Default is 128 in solaris 10. Second
limit is twice number of backends. See ./backend/libpq/pqcomm.c
/* * Select appropriate accept-queue length limit.
PG_SOMAXCONN is only * intended to provide a clamp on the request on
platforms where an * overly large request provokes a kernel error (are
there any?). */ maxconn = MaxBackends * 2; if (maxconn > PG_SOMAXCONN)
maxconn = PG_SOMAXCONN;
err = listen(fd, maxconn);
However what happened? I think that following scenarios occurred.
Postmaster listen only in one process and there are many clients run
really parallel. T2000 server has 32 threads ( 8 core and each has 4
threads). These clients generate more TCP/IP request at one time, than
postmaster is able accepted.
Zdenek