On Tue, 2008-01-08 at 12:57 -0500, Scot Loach wrote:
> This may be true, but I still think PostgreSQL should be more defensive
> and actively terminate the connection when this happens (like ssh does)
I think postgresql's behavior is well within reason. Let me explain:
What is happening is that FreeBSD *actually sends the data* before
returning EHOSTDOWN as an error, and leaving the TCP connection open! At
the time I was tracking this problem down, I wrote a C program to
demonstrate that fact. This is the core of the reason why it's a
protocol violation in PostgreSQL (or SSL error) rather than a
disconnection.
I think PostgreSQL is making the assumption here that an unrecognized
error code from send() that leaves the connection in a good state, is a
temporary error that may be resolved. Thus, PostgreSQL assumes that due
to the error, no data was written, and re-sends the data, succeeding
this time. I reason that the openssl library makes similar assumptions
(i.e. assuming an error means the data wasn't sent, and resets some
internal SSL protocol state), otherwise I wouldn't get SSL errors
afterward, but it would manifest itself as a PostgreSQL protocol
violation regardless of whether you're using SSL or not.
If the OS leaves a TCP connection open, I think it is perfectly
reasonable for an application to assume that the OS has sent exactly as
many bytes as it said it sent; no more, no less.
I would lean toward the opinion that postgresql works just fine now, and
that TCP is explicitly designed to prevent these kinds of problems, and
we only see this problem because FreeBSD 6.1 TCP is broken.
Regards,
Jeff Davis