Re: pgbench: could not connect to server: Resource temporarily unavailable - Mailing list pgsql-performance

From Thomas Munro
Subject Re: pgbench: could not connect to server: Resource temporarily unavailable
Date
Msg-id CA+hUKGKPyXKf2jrnSUMKc8XvRTYs+kkiZY9GA6nMdMUgLG6EaQ@mail.gmail.com
Whole thread Raw
In response to Re: pgbench: could not connect to server: Resource temporarily unavailable  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Mon, Aug 22, 2022 at 12:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Yeah retrying doesn't seem that nice.  +1 for a bit of documentation,
> > which I guess belongs in the server tuning part where we talk about
> > sysctls, perhaps with a link somewhere near max_connections?  More
> > recent Linux kernels bumped it to 4096 by default so I doubt it'll
> > come up much in the future, though.
>
> Hmm.  It'll be awhile till the 128 default disappears entirely
> though, especially if assorted BSDen use that too.  Probably
> worth the trouble to document.

I could try to write a doc patch if you aren't already on it.

> > Note that we also call listen()
> > with a backlog value capped to our own PG_SOMAXCONN which is 1000.  I
> > doubt many people benchmark with higher numbers of connections but
> > it'd be nicer if it worked when you do...
>
> Actually it's 10000.  Still, I wonder if we couldn't just remove
> that limit now that we've desupported a bunch of stone-age kernels.
> It's hard to believe any modern kernel can't defend itself against
> silly listen-queue requests.

Oh, right.  Looks like that was just  paranoia in commit 153f4006763,
back when you got away from using the (very conservative) SOMAXCONN
macro.  Looks like that was 5 on ancient systems going back to the
original sockets stuff, and later 128 was a popular number.  Yeah I'd
say +1 for removing our cap.  I'm pretty sure every system will
internally cap whatever value we pass in if it doesn't like it, as
POSIX explicitly says it can freely do with this "hint".

The main thing I learned today is that Linux's connect(AF_UNIX)
implementation doesn't refuse connections when the listen backlog is
full, unlike other OSes.  Instead, for blocking sockets, it sleeps and
wakes with everyone else to fight over space.  I *guess* for
non-blocking sockets that introduced a small contradiction -- there
isn't the state space required to give you a working EINPROGRESS with
the same sort of behaviour (if you reified a secondary queue for that
you might as well make the primary one larger...), but they also
didn't want to give you ECONNREFUSED just because you're non-blocking,
so they went with EAGAIN, because you really do need to call again
with the sockaddr.  The reason I wouldn't want to call it again is
that I guess it'd be a busy CPU burning loop until progress can be
made, which isn't nice, and failing with "Resource temporarily
unavailable" to the user does in fact describe the problem, if
somewhat vaguely.  Hmm, maybe we could add a hint to the error,
though?



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgbench: could not connect to server: Resource temporarily unavailable
Next
From: Tom Lane
Date:
Subject: Re: pgbench: could not connect to server: Resource temporarily unavailable