Re: SOMAXCONN (was Re: Solaris source code) - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: SOMAXCONN (was Re: Solaris source code) |
Date | |
Msg-id | 22152.995125131@sss.pgh.pa.us Whole thread Raw |
In response to | Re: SOMAXCONN (was Re: Solaris source code) (mlw <markw@mohawksoft.com>) |
Responses |
Re: Re: SOMAXCONN (was Re: Solaris source code)
|
List | pgsql-hackers |
mlw <markw@mohawksoft.com> writes: > Tom Lane wrote: >>> Passing listen(5) would probably be sufficient for Postgres. >> >> It demonstrably is not sufficient. Set it that way in pqcomm.c >> and run the parallel regression tests. Watch them fail. > That's interesting, I would not have guessed that. I have written a number of > server applications which can handle, litterally, over a thousand > connection/operations a second, which only has a listen(5). The problem should be considerably reduced in latest sources, since as of a week or three ago, the top postmaster process' outer loop is basically just accept() and fork() --- client authentication is now handled after the fork, instead of before. Still, we now know that (a) SOMAXCONN is a lie on many systems, and (b) values as small as 5 are pushing our luck, even though it might not fail so easily anymore. The state of affairs in current sources is that the listen queue parameter is MIN(MaxBackends * 2, PG_SOMAXCONN), where PG_SOMAXCONN is a constant defined in config.h --- it's 10000, hence a non-factor, by default, but could be reduced if you have a kernel that doesn't cope well with large listen-queue requests. We probably won't know if there are any such systems until we get some field experience with the new code, but we could have "configure" select a platform-dependent value if we find such problems. I believe that this is fine and doesn't need any further tweaking, pending field experience. What's still open for discussion is Nathan's thought that the postmaster ought to stop issuing accept() calls once it has so many children that it will refuse to fork any more. I was initially against that, but on further reflection I think it might be a good idea after all, because of another recent change related to the authenticate-after-fork change. Since the top postmaster doesn't really know which children have become working backends and which are still engaged in authentication dialogs, it cannot enforce the MaxBackends limit directly. Instead, MaxBackends is checked when the child process is done with authentication and is trying to join the PROC pool in shared memory. The postmaster will spawn up to 2 * MaxBackends child processes before refusing to spawn more --- this allows there to be up to MaxBackends children engaged in auth dialog but not yet working backends. (It's reasonable to allow extra children since some may fail the auth dialog, or an extant backend may have quit by the time they finish auth dialog. Whether 2*MaxBackends is the best choice is debatable, but that's what we're using at the moment.) Furthermore, we intend to install a pretty tight timeout on the overall time spent in auth phase (a few seconds I imagine, although we haven't yet discussed that number either). Given this setup, if the postmaster has reached its max-children limit then it can be quite certain that at least some of those children will quit within approximately the auth timeout interval. Therefore, not accept()ing is a state that will probably *not* persist for long enough to cause the new clients to timeout. By not accept()ing at a time when we wouldn't fork, we can convert the behavior clients see at peak load from quick rejection into a short delay before authentication dialog. Of course, if you are at MaxBackends working backends, then the new client is still going to get a "too many clients" error; all we have accomplished with the change is to expend a fork() and an authentication cycle before issuing the error. So if the intent is to reduce overall system load then this isn't necessarily an improvement. IIRC, the rationale for using 2*MaxBackends as the maximum child count was to make it unlikely that the postmaster would refuse to fork; given a short auth timeout it's unlikely that as many as MaxBackends clients will be engaged in auth dialog at any instant. So unless we tighten that max child count considerably, holding off accept() at max child count is unlikely to change the behavior under any but worst-case scenarios anyway. And in a worst-case scenario, shedding load by rejecting connections quickly is probably just what you want to do. So, having thought that through, I'm still of the opinion that holding off accept is of little or no benefit to us. But it's not as simple as it looks at first glance. Anyone have a different take on what the behavior is likely to be? regards, tom lane
pgsql-hackers by date: