Re: BUG #3855: backend sends corrupted data onEHOSTDOWNerror - Mailing list pgsql-bugs

From Scot Loach
Subject Re: BUG #3855: backend sends corrupted data onEHOSTDOWNerror
Date
Msg-id F489AB573A749146B33461ECE080913A049D37B6@EXCHANGE-1.sandvine.com
Whole thread Raw
In response to Re: BUG #3855: backend sends corrupted data on EHOSTDOWNerror  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: BUG #3855: backend sends corrupted data onEHOSTDOWNerror
List pgsql-bugs
I agree this would be fine if PostgreSQL works the way you say below.

However, PostgreSQL does not look at the # of bytes written and continue
sending after that many bytes.  PostgreSQL actually simply clears its
buffer of bytes to send on this error, in this code:

pqcomm.c:1075
        /*
         * We drop the buffered data anyway so that processing can
         * continue, even though we'll probably quit soon.
         */
        PqSendPointer =3D 0;
        return EOF;


The result as I saw on a system where this was occurring, was that when
PostgreSQL was sending back a large result set, there was simply a
fragment of it missing.

scot.
=20

-----Original Message-----
From: Jeff Davis [mailto:pgsql@j-davis.com]=20
Sent: Tuesday, January 08, 2008 2:02 PM
To: Scot Loach
Cc: pgsql-bugs@postgresql.org
Subject: RE: [BUGS] BUG #3855: backend sends corrupted data
onEHOSTDOWNerror

On Tue, 2008-01-08 at 12:57 -0500, Scot Loach wrote:
> This may be true, but I still think PostgreSQL should be more=20
> defensive and actively terminate the connection when this happens=20
> (like ssh does)

I think postgresql's behavior is well within reason. Let me explain:

What is happening is that FreeBSD *actually sends the data* before
returning EHOSTDOWN as an error, and leaving the TCP connection open! At
the time I was tracking this problem down, I wrote a C program to
demonstrate that fact. This is the core of the reason why it's a
protocol violation in PostgreSQL (or SSL error) rather than a
disconnection.

I think PostgreSQL is making the assumption here that an unrecognized
error code from send() that leaves the connection in a good state, is a
temporary error that may be resolved. Thus, PostgreSQL assumes that due
to the error, no data was written, and re-sends the data, succeeding
this time. I reason that the openssl library makes similar assumptions
(i.e. assuming an error means the data wasn't sent, and resets some
internal SSL protocol state), otherwise I wouldn't get SSL errors
afterward, but it would manifest itself as a PostgreSQL protocol
violation regardless of whether you're using SSL or not.

If the OS leaves a TCP connection open, I think it is perfectly
reasonable for an application to assume that the OS has sent exactly as
many bytes as it said it sent; no more, no less.

I would lean toward the opinion that postgresql works just fine now, and
that TCP is explicitly designed to prevent these kinds of problems, and
we only see this problem because FreeBSD 6.1 TCP is broken.

Regards,
    Jeff Davis

pgsql-bugs by date:

Previous
From: Jeff Davis
Date:
Subject: Re: BUG #3855: backend sends corrupted data on EHOSTDOWNerror
Next
From: Jeff Davis
Date:
Subject: Re: BUG #3855: backend sends corrupted data onEHOSTDOWNerror