On 07/03/2012 04:26 PM, Pawel S. Veselov wrote:
> That's the thing, no segfaults (dmesg), nothing in the server logs.
>
> It may as well be some sort of an anti-fork-bomb measure, only judging
> by the fact that with enough attempts, things do clear out, though I
> wish there would be some indication of that, and I'm still confused
> about the error code being ENOTCONN.
>
I've managed to produce the endpoint not connected errors with a little
test I wrote here. Only once so far and only during an abnormal test run
where I signalled the test workers as they were starting up, so that's
not really very helpful.
I have no problem using a little Python test program to create 800
connections in about a second. It forks some workers (100 by default)
which grab enough connections each to reach the target connection count.
Ooh, handy. I just triggered it again now. The "Transport endpoint is
not connected" messages were intermixed with some "FATAL: sorry, too
many clients already" messages. The PostgreSQL log is full of FATAL:
sorry, too many clients already" messages intermixed with "LOG:
unexpected EOF on client connection" messages. Again it was an abnormal
run where I signalled my workers mid way through startup.
Interesting, that. I've never seen it on a run where I don't send a
signal. You know what that makes me think? You're using a multithreaded
approach, and there's something going wrong in your app's innards. Yes,
that's a lot of hot air and handwaving, but it fits - you're getting an
error saying that psql is trying to operate on a socket that isn't there.
The fact that there's nothing in the system logs or Pg logs just adds
weight to that. I'm guessing you have a threading bug, possibly signal
related.
--
Craig Ringer