Re: Feature freeze date for 8.1 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Feature freeze date for 8.1
Date
Msg-id Pine.OSF.4.61.0505021800510.109089@kosh.hut.fi
Whole thread Raw
In response to Re: Feature freeze date for 8.1  (Hannu Krosing <hannu@skype.net>)
Responses Re: Feature freeze date for 8.1  (<adnandursun@asrinbilisim.com.tr>)
Re: Feature freeze date for 8.1  (Dawid Kuroczko <qnex42@gmail.com>)
Re: Feature freeze date for 8.1  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers
On Mon, 2 May 2005, Hannu Krosing wrote:

> Well, I've had problems with clients which resolve DB timeouts by
> closing the current connection and establish a new one.
>
> If it is actual DB timeout, then it all is ok, the server soon notices
> that the client connection is closed and kills itself.
>
> Problems happen when the timeout is caused by actual network problems -
> when i have 300 clients (server's max_connections=500) which try to
> reconnect after network outage, only 200 of them can do so as the server
> is holding to 300 old connections.
>
> In my case this has nothing to do with locks or transactions.
>
> It would be nice if I coud st up some timeut using keepalives (like ssh-
> s ProtocoKeepalives") and use similar timeouts on client and server.

FWIW, I've been bitten by this problem twice with other applications.

1. We had a DB2 database with clients running in other computers in the 
network. A faulty switch caused random network outages. If the connection 
timed out and the client was unable to send it's request to the server, 
the client would notice that the connection was down, and open a new one. 
But the server never noticed that the connection was dead. Eventually, 
the maximum number of connections was reached, and the administrator had 
to kill all the connections manually.

2. We had a custom client-server application using TCP across a network. 
There was stateful firewall between the server and the clients that 
dropped the connection at night when there was no activity. After a 
couple of days, the server reached the maximum number of threads on the 
platform and stopped accepting new connections.

In case 1, the switch was fixed. If another switch fails, the same will 
happen again. In case 2, we added an application-level heartbeat that 
sends a dummy message from server to client every 10 minutes.

TCP keep-alive with a small interval would have saved the day in both 
cases. Unfortunately the default interval must be >= 2 hours, according 
to RFC1122.

On most platforms, including Windows and Linux, the TCP keep-alive 
interval can't be set on a per-connection basis. The ideal solution would 
be to modify the operating system to support it.

What we can do in PostgreSQL is to introduce an application-level 
heartbeat. A simple "Hello world" message sent from server to client that 
the client would ignore would do the trick.

- Heikki


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_locks needs a facelift
Next
From: Josh Berkus
Date:
Subject: Re: [pgsql-advocacy] Increased company involvement