Thread: Bug (and fix): leaks of TCP connections when connected to a <7.4 server
Hello, I experienced some TCP connection leaks when using PGSQL JDBC driver 7.4 (build 214) to connect to a 7.3.4 server. The symptoms are that when performing a netstat on the client machine, many connections were in the CLOSE_WAIT state. The problem is that the driver tries to connect using v3 protocol and when it sees that the server doesn't understand it, it opens a new connection (PGStream) to the server without closing the previous one: AbstractJdbc1Connection.java line 304 and on: if (l_elen > 30000) { //if the error length is > than 30000 we assume this is really a v2 protocol //server so try again with a v2 connection //need to create a new connection and try again try { pgStream = new PGStream(p_host, p_port); } catch (ConnectException cex) A quick fix is to do a pgStream.close(); before instantiating a new PGStream, but I don't know very well the sources and it might have other side effects. To circumvent the problem while waiting for an eventual new version of the driver, I added "?compatible=7.3" to the connection URL... I hope it'll improve the driver ;-) Sylvain Laurent
On Mon, 21 Jun 2004, Laurent Sylvain wrote: > I experienced some TCP connection leaks when using PGSQL JDBC driver 7.4 > (build 214) to connect to a 7.3.4 server. > The symptoms are that when performing a netstat on the client machine, many > connections were in the CLOSE_WAIT state. I'm not much of an expert on TCP, so could you give me a little more background on this? I've duplicated the behavior you describe and I believe your fix is correct, but I'd like to understand this better. First why is this a problem, the CLOSE_WAIT entries disappear rather quickly. Second the number of CLOSE_WAIT items is nothing compared to the number of TIME_WAIT entries, why aren't they a problem? Kris Jurka
Kris Jurka <books@ejurka.com> writes: > I'm not much of an expert on TCP, so could you give me a little more > background on this? I've duplicated the behavior you describe and I > believe your fix is correct, but I'd like to understand this better. > First why is this a problem, the CLOSE_WAIT entries disappear rather > quickly. Second the number of CLOSE_WAIT items is nothing compared to the > number of TIME_WAIT entries, why aren't they a problem? I think CLOSE_WAIT means that the kernel knows the far end has closed the connection, but it's not yet been able to tell the local client so (the vice-versa case may be called this too, not sure). This state can persist indefinitely if the client is uncooperative. TIME_WAIT is a short-lived state; the connection will be forgotten completely after the timeout, which I think is order-of-a-minute. (TIME_WAIT exists only so that the kernel remembers what to do with any delayed packets that may arrive from the far end.) Bottom line is that CLOSE_WAIT means that a client bug is keeping the kernel from freeing resources. You don't want to see that. regards, tom lane
Laurent Sylvain wrote: > Hello, > > I experienced some TCP connection leaks when using PGSQL JDBC driver 7.4 > (build 214) to connect to a 7.3.4 server. > The symptoms are that when performing a netstat on the client machine, many > connections were in the CLOSE_WAIT state. > > The problem is that the driver tries to connect using v3 protocol and when > it sees that the server doesn't understand it, it opens a new connection > (PGStream) to the server without closing the previous one: In theory the discarded connections should eventually be garbage collected and closed, right? So at least the leak is bounded. (I'll check that this is fixed in my patches; I restructured that area quite a bit) -O
<snip> > In theory the discarded connections should eventually be garbage > collected and closed, right? So at least the leak is bounded. > I don't think so. We had a similar problem here. After many hours of debugging, we came to the fact that either java garbage collector does _not_ close open connections or do it after a long time (much longer than one would expect). This feature/bug caused a severe connection leak in our code and we had to issue a explicit close() to our socket connections.
Marcus Andree S. Magalhaes wrote: > <snip> > >>In theory the discarded connections should eventually be garbage >>collected and closed, right? So at least the leak is bounded. > > I don't think so. We had a similar problem here. > > After many hours of debugging, we came to the fact that either > java garbage collector does _not_ close open connections It's not the GC that directly closes the open connections; it just causes unreachable objects to be collected. It's the responsibility of object that holds out-of-heap resources to do whatever cleanup (OS-level close() calls etc) is needed when it becomes unreachable, traditionally via finalize() (although you can also use ReferenceQueue). I'd expect the socket implementation to do this. > or do it > after a long time (much longer than one would expect). Typically finalization and Reference clearing happens only on full collections (it's not guaranteed to happen this way, though), so I'd expect the connections to be collected then. It can take some time for a full GC to happen, depending on your heap settings. We see problems in NIO buffer allocation when System.gc() is disabled (see below) on a quiescent system that can have intervals of over a day between full GCs. If you leak sockets or some other resource rapidly enough that you run out of the resources at the OS level (or indirectly hit a resource limitation on the remote side e.g. server connection limit) before the owning objects are collected, you will have a problem. NIO direct buffer allocation hits exactly this issue, and ends up issuing a System.gc() when it thinks it's out of direct buffer space to try to avoid the problem. There are some bugs in Sun's implementation though which make it hard to use (it forces a stop-the-world GC which is bad if you are using the CMS collector, and there is a race between the allocating thread returning from System.gc() and buffer references being cleared in the reference-handler thread). Assuming those problems can be fixed, there's an argument for doing the same thing in the socket allocation code (and anywhere else that allocates out-of-heap resources) too. It becomes a question of how far to take it: if the server refuses a connection with "too many connections", should you GC and try again if that collected any connections to the server in question? See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5025281 (apologies for rambling :) -O
On Mon, 21 Jun 2004, Laurent Sylvain wrote: > I experienced some TCP connection leaks when using PGSQL JDBC driver 7.4 > (build 214) to connect to a 7.3.4 server. > The symptoms are that when performing a netstat on the client machine, many > connections were in the CLOSE_WAIT state. > > The problem is that the driver tries to connect using v3 protocol and when > it sees that the server doesn't understand it, it opens a new connection > (PGStream) to the server without closing the previous one: > I've applied the attached patch to the 7.4 and 7.5 cvs trees. It adds the pgStream.close() you suggested as well as closing the stream for other connection failures such as authentication failures or lack of ssl support. Kris Jurka