Re: SOMAXCONN (was Re: Solaris source code) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: SOMAXCONN (was Re: Solaris source code)
Date
Msg-id 22152.995125131@sss.pgh.pa.us
Whole thread Raw
In response to Re: SOMAXCONN (was Re: Solaris source code)  (mlw <markw@mohawksoft.com>)
Responses Re: Re: SOMAXCONN (was Re: Solaris source code)
List pgsql-hackers
mlw <markw@mohawksoft.com> writes:
> Tom Lane wrote:
>>> Passing listen(5) would probably be sufficient for Postgres.
>> 
>> It demonstrably is not sufficient.  Set it that way in pqcomm.c
>> and run the parallel regression tests.  Watch them fail.

> That's interesting, I would not have guessed that. I have written a number of
> server applications which can handle, litterally, over a thousand
> connection/operations a second, which only has a listen(5).

The problem should be considerably reduced in latest sources, since
as of a week or three ago, the top postmaster process' outer loop is
basically just accept() and fork() --- client authentication is now
handled after the fork, instead of before.  Still, we now know that
(a) SOMAXCONN is a lie on many systems, and (b) values as small as 5
are pushing our luck, even though it might not fail so easily anymore.

The state of affairs in current sources is that the listen queue
parameter is MIN(MaxBackends * 2, PG_SOMAXCONN), where PG_SOMAXCONN
is a constant defined in config.h --- it's 10000, hence a non-factor,
by default, but could be reduced if you have a kernel that doesn't
cope well with large listen-queue requests.  We probably won't know
if there are any such systems until we get some field experience with
the new code, but we could have "configure" select a platform-dependent
value if we find such problems.

I believe that this is fine and doesn't need any further tweaking,
pending field experience.  What's still open for discussion is Nathan's
thought that the postmaster ought to stop issuing accept() calls once
it has so many children that it will refuse to fork any more.  I was
initially against that, but on further reflection I think it might be
a good idea after all, because of another recent change related to the
authenticate-after-fork change.  Since the top postmaster doesn't really
know which children have become working backends and which are still
engaged in authentication dialogs, it cannot enforce the MaxBackends
limit directly.  Instead, MaxBackends is checked when the child process
is done with authentication and is trying to join the PROC pool in
shared memory.  The postmaster will spawn up to 2 * MaxBackends child
processes before refusing to spawn more --- this allows there to be
up to MaxBackends children engaged in auth dialog but not yet working
backends.  (It's reasonable to allow extra children since some may fail
the auth dialog, or an extant backend may have quit by the time they
finish auth dialog.  Whether 2*MaxBackends is the best choice is
debatable, but that's what we're using at the moment.)

Furthermore, we intend to install a pretty tight timeout on the overall
time spent in auth phase (a few seconds I imagine, although we haven't
yet discussed that number either).

Given this setup, if the postmaster has reached its max-children limit
then it can be quite certain that at least some of those children will
quit within approximately the auth timeout interval.  Therefore, not
accept()ing is a state that will probably *not* persist for long enough
to cause the new clients to timeout.  By not accept()ing at a time when
we wouldn't fork, we can convert the behavior clients see at peak load
from quick rejection into a short delay before authentication dialog.

Of course, if you are at MaxBackends working backends, then the new
client is still going to get a "too many clients" error; all we have
accomplished with the change is to expend a fork() and an authentication
cycle before issuing the error.  So if the intent is to reduce overall
system load then this isn't necessarily an improvement.

IIRC, the rationale for using 2*MaxBackends as the maximum child count
was to make it unlikely that the postmaster would refuse to fork; given
a short auth timeout it's unlikely that as many as MaxBackends clients
will be engaged in auth dialog at any instant.  So unless we tighten
that max child count considerably, holding off accept() at max child
count is unlikely to change the behavior under any but worst-case
scenarios anyway.  And in a worst-case scenario, shedding load by
rejecting connections quickly is probably just what you want to do.

So, having thought that through, I'm still of the opinion that holding
off accept is of little or no benefit to us.  But it's not as simple
as it looks at first glance.  Anyone have a different take on what the
behavior is likely to be?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bernard Frankpitt
Date:
Subject: Re: Re: [PATCH] To remove EXTEND INDEX
Next
From: Tom Lane
Date:
Subject: Re: Planned changes to pg_am catalog