Hi,
On 2013-06-17 16:16:22 +0200, Andres Freund wrote:
> When postgres on linux receives connection on a high rate client
> connections sometimes error out with:
> could not send data to server: Transport endpoint is not connected
> could not send startup packet: Transport endpoint is not connected
>
> To reproduce start something like on a server with sufficiently high
> max_connections:
> pgbench -h /tmp -p 5440 -T 10 -c 400 -j 400 -n -f /tmp/simplequery.sql
>
> Now that's strange since that error should happen at connect(2) time,
> not when sending the startup packet. Some investigation led me to
> fe-secure.c's PQConnectPoll:
> So, we're accepting EWOULDBLOCK as a valid return value for
> connect(2). Which it isn't. EAGAIN in contrast is on some BSDs and on
> linux. Unfortunately POSIX allows those two to share the same value...
>
> My manpage tells me:
> EAGAIN No more free local ports or insufficient entries in the routing cache. For
> AF_INET see the description of
> /proc/sys/net/ipv4/ip_local_port_range ip(7)
> for information on how to increase the number of local
> ports.
>
> So, the problem is that we took a failed connection as having been
> initially successfull but in progress.
>
> Not accepting EWOULDBLOCK in the above if() results in:
> could not connect to server: Resource temporarily unavailable
> Is the server running locally and accepting
> connections on Unix domain socket "/tmp/.s.PGSQL.5440"?
>
> which makes more sense.
>
> Trivial patch attached.
Could I convince a committer to NACK or commit & backpatch that patch?
It has come up before:
http://www.postgresql.org/message-id/CAMnJ+Beq0hCBuTY_=nz=ru0U-No543_RAEunLVSAYU8tugd6NA@mail.gmail.com
possibly also:
http://lists.pgfoundry.org/pipermail/pgpool-general/2007-March/000595.html
Greetings,
Andres Freund
-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services