On E, 2005-05-02 at 18:47 +0300, Heikki Linnakangas wrote:
> On Mon, 2 May 2005, Hannu Krosing wrote:
> > It would be nice if I coud st up some timeut using keepalives (like ssh-
> > s ProtocoKeepalives") and use similar timeouts on client and server.
>
> FWIW, I've been bitten by this problem twice with other applications.
>
> 1. We had a DB2 database with clients running in other computers in the
> network. A faulty switch caused random network outages. If the connection
> timed out and the client was unable to send it's request to the server,
> the client would notice that the connection was down, and open a new one.
> But the server never noticed that the connection was dead. Eventually,
> the maximum number of connections was reached, and the administrator had
> to kill all the connections manually.
>
> 2. We had a custom client-server application using TCP across a network.
> There was stateful firewall between the server and the clients that
> dropped the connection at night when there was no activity. After a
> couple of days, the server reached the maximum number of threads on the
> platform and stopped accepting new connections.
>
> In case 1, the switch was fixed. If another switch fails, the same will
> happen again. In case 2, we added an application-level heartbeat that
> sends a dummy message from server to client every 10 minutes.
>
> TCP keep-alive with a small interval would have saved the day in both
> cases. Unfortunately the default interval must be >= 2 hours, according
> to RFC1122.
>
> On most platforms, including Windows and Linux, the TCP keep-alive
> interval can't be set on a per-connection basis. The ideal solution would
> be to modify the operating system to support it.
Yep. I think this could be done for (our instance of) linux, but getting
it into mainstream kernel, and then into all popular distros is a lot of
effort.
Going the ssh way (protocol level keepalives) might be way simpler.
> What we can do in PostgreSQL is to introduce an application-level
> heartbeat. A simple "Hello world" message sent from server to client that
> the client would ignore would do the trick.
Actually we would need a round-trip indicator (some there-and-back
message: A: do you copy 42 --> B: yes I copy 42), and not just send. The
difficult part is what to do when one side happens to send the keepalive
in the middle of actual data transfer ?
move to packet oriented connections (UDP) and make different packet
types independant of each other?
--
Hannu Krosing <hannu@skype.net>