Thread: Bug (and fix): leaks of TCP connections when connected to a <7.4 server

Bug (and fix): leaks of TCP connections when connected to a <7.4 server

From
Laurent Sylvain
Date:
Hello,

I experienced some TCP connection leaks when using PGSQL JDBC driver 7.4
(build 214) to connect to a 7.3.4 server.
The symptoms are that when performing a netstat on the client machine, many
connections were in the CLOSE_WAIT state.

The problem is that the driver tries to connect using v3 protocol and when
it sees that the server doesn't understand it, it opens a new connection
(PGStream) to the server without closing the previous one:

AbstractJdbc1Connection.java line 304 and on:
    if (l_elen > 30000) {
        //if the error length is > than 30000 we assume this is
really a v2 protocol
        //server so try again with a v2 connection
        //need to create a new connection and try again
        try
        {
            pgStream = new PGStream(p_host, p_port);
        }
        catch (ConnectException cex)

A quick fix is to do a pgStream.close(); before instantiating a new
PGStream, but I don't know very well the sources and it might have other
side effects.

To circumvent the problem while waiting for an eventual new version of the
driver, I added "?compatible=7.3" to the connection URL...

I hope it'll improve the driver ;-)

Sylvain Laurent

Re: Bug (and fix): leaks of TCP connections when connected

From
Kris Jurka
Date:

On Mon, 21 Jun 2004, Laurent Sylvain wrote:

> I experienced some TCP connection leaks when using PGSQL JDBC driver 7.4
> (build 214) to connect to a 7.3.4 server.
> The symptoms are that when performing a netstat on the client machine, many
> connections were in the CLOSE_WAIT state.

I'm not much of an expert on TCP, so could you give me a little more
background on this?  I've duplicated the behavior you describe and I
believe your fix is correct, but I'd like to understand this better.
First why is this a problem, the CLOSE_WAIT entries disappear rather
quickly.  Second the number of CLOSE_WAIT items is nothing compared to the
number of TIME_WAIT entries, why aren't they a problem?

Kris Jurka

Re: Bug (and fix): leaks of TCP connections when connected

From
Tom Lane
Date:
Kris Jurka <books@ejurka.com> writes:
> I'm not much of an expert on TCP, so could you give me a little more
> background on this?  I've duplicated the behavior you describe and I
> believe your fix is correct, but I'd like to understand this better.
> First why is this a problem, the CLOSE_WAIT entries disappear rather
> quickly.  Second the number of CLOSE_WAIT items is nothing compared to the
> number of TIME_WAIT entries, why aren't they a problem?

I think CLOSE_WAIT means that the kernel knows the far end has closed
the connection, but it's not yet been able to tell the local client so
(the vice-versa case may be called this too, not sure).  This state can
persist indefinitely if the client is uncooperative.  TIME_WAIT is a
short-lived state; the connection will be forgotten completely after the
timeout, which I think is order-of-a-minute.  (TIME_WAIT exists only so
that the kernel remembers what to do with any delayed packets that may
arrive from the far end.)

Bottom line is that CLOSE_WAIT means that a client bug is keeping the
kernel from freeing resources.  You don't want to see that.

            regards, tom lane

Re: Bug (and fix): leaks of TCP connections when connected

From
Oliver Jowett
Date:
Laurent Sylvain wrote:
> Hello,
>
> I experienced some TCP connection leaks when using PGSQL JDBC driver 7.4
> (build 214) to connect to a 7.3.4 server.
> The symptoms are that when performing a netstat on the client machine, many
> connections were in the CLOSE_WAIT state.
>
> The problem is that the driver tries to connect using v3 protocol and when
> it sees that the server doesn't understand it, it opens a new connection
> (PGStream) to the server without closing the previous one:

In theory the discarded connections should eventually be garbage
collected and closed, right? So at least the leak is bounded.

(I'll check that this is fixed in my patches; I restructured that area
quite a bit)

-O

Re: Bug (and fix): leaks of TCP connections when connected

From
"Marcus Andree S. Magalhaes"
Date:
<snip>

> In theory the discarded connections should eventually be garbage
> collected and closed, right? So at least the leak is bounded.
>

I don't think so. We had a similar problem here.

After many hours of debugging, we came to the fact that either
java garbage collector does _not_ close open connections or do it
after a long time (much longer than one would expect).
This feature/bug caused a severe connection leak in our code and
we had to issue a explicit close() to our socket connections.




Re: Bug (and fix): leaks of TCP connections when connected

From
Oliver Jowett
Date:
Marcus Andree S. Magalhaes wrote:
> <snip>
>
>>In theory the discarded connections should eventually be garbage
>>collected and closed, right? So at least the leak is bounded.
>
> I don't think so. We had a similar problem here.
>
> After many hours of debugging, we came to the fact that either
> java garbage collector does _not_ close open connections

It's not the GC that directly closes the open connections; it just
causes unreachable objects to be collected. It's the responsibility of
object that holds out-of-heap resources to do whatever cleanup (OS-level
close() calls etc) is needed when it becomes unreachable, traditionally
via finalize() (although you can also use ReferenceQueue). I'd expect
the socket implementation to do this.

> or do it
> after a long time (much longer than one would expect).

Typically finalization and Reference clearing happens only on full
collections (it's not guaranteed to happen this way, though), so I'd
expect the connections to be collected then. It can take some time for a
full GC to happen, depending on your heap settings. We see problems in
NIO buffer allocation when System.gc() is disabled (see below) on a
quiescent system that can have intervals of over a day between full GCs.

If you leak sockets or some other resource rapidly enough that you run
out of the resources at the OS level (or indirectly hit a resource
limitation on the remote side e.g. server connection limit) before the
owning objects are collected, you will have a problem.

NIO direct buffer allocation hits exactly this issue, and ends up
issuing a System.gc() when it thinks it's out of direct buffer space to
try to avoid the problem. There are some bugs in Sun's implementation
though which make it hard to use (it forces a stop-the-world GC which is
bad if you are using the CMS collector, and there is a race between the
allocating thread returning from System.gc() and buffer references being
cleared in the reference-handler thread). Assuming those problems can be
fixed, there's an argument for doing the same thing in the socket
allocation code (and anywhere else that allocates out-of-heap resources)
too. It becomes a question of how far to take it: if the server refuses
a connection with "too many connections", should you GC and try again if
that collected any connections to the server in question?

See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5025281

(apologies for rambling :)

-O

Re: Bug (and fix): leaks of TCP connections when connected

From
Kris Jurka
Date:

On Mon, 21 Jun 2004, Laurent Sylvain wrote:

> I experienced some TCP connection leaks when using PGSQL JDBC driver 7.4
> (build 214) to connect to a 7.3.4 server.
> The symptoms are that when performing a netstat on the client machine, many
> connections were in the CLOSE_WAIT state.
>
> The problem is that the driver tries to connect using v3 protocol and when
> it sees that the server doesn't understand it, it opens a new connection
> (PGStream) to the server without closing the previous one:
>

I've applied the attached patch to the 7.4 and 7.5 cvs trees.  It adds the
pgStream.close() you suggested as well as closing the stream for other
connection failures such as authentication failures or lack of ssl
support.

Kris Jurka

Attachment