Re: Intermittent "make check" failures on hyena - Mailing list pgsql-hackers

From Zdenek Kotala
Subject Re: Intermittent "make check" failures on hyena
Date
Msg-id 44D72CE8.6020005@sun.com
Whole thread Raw
In response to Re: Intermittent "make check" failures on hyena  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Intermittent "make check" failures on hyena
List pgsql-hackers
Andrew Dunstan wrote:
> 
> 
> Tom Lane wrote:
> 
>> I see one occurrence in the 8.1 branch on hyena, but the failure
>> probability seems to have jumped way up in HEAD since we put in the
>> C-coded pg_regress.  This lends weight to the idea that it's a
>> timing-related issue, because pg_regress.c is presumably much faster
>> at forking off a parallel gang of psqls than the shell script was;
>> and it's hard to see what else about the pg_regress change could be
>> affecting the psqls' ability to connect once forked.
>>
>> We probably need to get some Solaris experts involved in diagnosing
>> what's happening.  Judging by the buildfarm results you should be able
>> to replicate it fairly easily by doing "make installcheck-parallel"
>> repeatedly.
>>
> 
> I will refer this to those experts - my Solaris-fu is a tad rusty these 
> days.

How Tom mentioned, problem is in the size of TCP connection queue 
(parameter tcp_conn_req_max_q). Default is 128 in solaris 10. Second 
limit is twice number of backends. See ./backend/libpq/pqcomm.c
                  /*                 * Select appropriate accept-queue length limit. 
PG_SOMAXCONN is only                 * intended to provide a clamp on the request on 
platforms where an                 * overly large request provokes a kernel error (are 
there any?).                 */                maxconn = MaxBackends * 2;                if (maxconn > PG_SOMAXCONN)
                   maxconn = PG_SOMAXCONN;
 
                err = listen(fd, maxconn);


However what happened? I think that following scenarios occurred. 
Postmaster listen only in one process and there are many clients run 
really parallel. T2000 server has 32 threads ( 8 core and each has 4 
threads). These clients generate more TCP/IP request at one time, than 
postmaster is able accepted.


Zdenek


pgsql-hackers by date:

Previous
From: Mario Weilguni
Date:
Subject: Another Ltree/GiST problem
Next
From: Tom Lane
Date:
Subject: Re: proposal for 8.3: Simultaneous assignment for PL/pgSQL