Thread: BUG #3855: backend sends corrupted data on EHOSTDOWN error

BUG #3855: backend sends corrupted data on EHOSTDOWN error

From
"Scot Loach"
Date:
The following bug has been logged online:

Bug reference:      3855
Logged by:          Scot Loach
Email address:      sloach@sandvine.com
PostgreSQL version: 8.2.4
Operating system:   freebsd 6.1
Description:        backend sends corrupted data on EHOSTDOWN error
Details:

On FreeBSD, it is possible for a send() call on the backend socket to return
an error code of EHOSTDOWN.  This error can happen, for example, if a host
on the local LAN is temporarily unreachable.  In this case, the socket is
not closed, and it may recover from this state.  If it recovers, it is
possible that the backend will continue sending results from a query, but it
will have dropped some data from the reply.  This causes the client to be
out of sync with the server, which usually causes it to read an invalid
length byte.  This can cause various issues, such as clients crashing or,
more commonly, blocking forever while trying to read a large response the
server will never send.

This is due to the way the backend handles errors.  The following code
(pqcomm.c:1075) is what happens when an error occurs on the write:

        /*
         * We drop the buffered data anyway so that processing can
         * continue, even though we'll probably quit soon.
         */
        PqSendPointer = 0;
        return EOF;

This sets PqSendPointer to 0, which effectively clears any data that was
waiting to be sent.  This EOF error propagates up the stack to pqformat.c:

void
pq_endmessage(StringInfo buf)
{
        /* msgtype was saved in cursor field */
        (void) pq_putmessage(buf->cursor, buf->data, buf->len);
        /* no need to complain about any failure, since pqcomm.c already did
*/
        pfree(buf->data);
        buf->data = NULL;
}

In other words, postgres seems to be expecting that the connection will
somehow be closed.  Which in most errors, does happen; the stack will close
the TCP connection and no harm will be done.  But in the case of this
particular error, the connection stays open, the client is waiting forever
for bytes the server will never send, and the server is idle in its
transaction, holding locks and waiting for a command from the client that
will never come.

The backend should either close the connection itself in this case, or
handle the error better by not clearing the send buffer.

Re: BUG #3855: backend sends corrupted data on EHOSTDOWN error

From
Tom Lane
Date:
"Scot Loach" <sloach@sandvine.com> writes:
> On FreeBSD, it is possible for a send() call on the backend socket to return
> an error code of EHOSTDOWN.

That's fine as long as the error condition is reasonably persistent.
I think what you are describing is a bug in FreeBSD's TCP stack: it
obviously isn't making adequately good-faith efforts to deliver the data
it's been handed.

            regards, tom lane

Re: BUG #3855: backend sends corrupted data on EHOSTDOWN error

From
Jeff Davis
Date:
On Tue, 2008-01-08 at 01:50 +0000, Scot Loach wrote:
> The following bug has been logged online:
>
> Bug reference:      3855
> Logged by:          Scot Loach
> Email address:      sloach@sandvine.com
> PostgreSQL version: 8.2.4
> Operating system:   freebsd 6.1
> Description:        backend sends corrupted data on EHOSTDOWN error
> Details:
>

This is a FreeBSD bug.

http://www.freebsd.org/cgi/query-pr.cgi?pr=100172

It has been fixed here:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_output.c

in revision 1.112.2.1.

I ran into this bug too, and it was very frustrating! For me, it
manifested itself as SSL errors.

You can demonstrate the problem with SSH as well (inducing an ARP
failure will terminate the SSH session, when TCP should protect you
against that), so it is clearly not a PostgreSQL bug.

Thanks to "Andrew - Supernews" (a PostgreSQL user) for tracking this bug
down.

Regards,
    Jeff Davis

Re: BUG #3855: backend sends corrupted data on EHOSTDOWNerror

From
"Scot Loach"
Date:
This may be true, but I still think PostgreSQL should be more defensive
and actively terminate the connection when this happens (like ssh does)

scot.
=20

-----Original Message-----
From: Jeff Davis [mailto:pgsql@j-davis.com]=20
Sent: Tuesday, January 08, 2008 12:52 PM
To: Scot Loach
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #3855: backend sends corrupted data on
EHOSTDOWNerror

On Tue, 2008-01-08 at 01:50 +0000, Scot Loach wrote:
> The following bug has been logged online:
>=20
> Bug reference:      3855
> Logged by:          Scot Loach
> Email address:      sloach@sandvine.com
> PostgreSQL version: 8.2.4
> Operating system:   freebsd 6.1
> Description:        backend sends corrupted data on EHOSTDOWN error
> Details:=20
>=20

This is a FreeBSD bug.=20

http://www.freebsd.org/cgi/query-pr.cgi?pr=3D100172

It has been fixed here:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_output.c

in revision 1.112.2.1.

I ran into this bug too, and it was very frustrating! For me, it
manifested itself as SSL errors.

You can demonstrate the problem with SSH as well (inducing an ARP
failure will terminate the SSH session, when TCP should protect you
against that), so it is clearly not a PostgreSQL bug.

Thanks to "Andrew - Supernews" (a PostgreSQL user) for tracking this bug
down.

Regards,
    Jeff Davis

Re: BUG #3855: backend sends corrupted data on EHOSTDOWNerror

From
Jeff Davis
Date:
On Tue, 2008-01-08 at 12:57 -0500, Scot Loach wrote:
> This may be true, but I still think PostgreSQL should be more defensive
> and actively terminate the connection when this happens (like ssh does)

I think postgresql's behavior is well within reason. Let me explain:

What is happening is that FreeBSD *actually sends the data* before
returning EHOSTDOWN as an error, and leaving the TCP connection open! At
the time I was tracking this problem down, I wrote a C program to
demonstrate that fact. This is the core of the reason why it's a
protocol violation in PostgreSQL (or SSL error) rather than a
disconnection.

I think PostgreSQL is making the assumption here that an unrecognized
error code from send() that leaves the connection in a good state, is a
temporary error that may be resolved. Thus, PostgreSQL assumes that due
to the error, no data was written, and re-sends the data, succeeding
this time. I reason that the openssl library makes similar assumptions
(i.e. assuming an error means the data wasn't sent, and resets some
internal SSL protocol state), otherwise I wouldn't get SSL errors
afterward, but it would manifest itself as a PostgreSQL protocol
violation regardless of whether you're using SSL or not.

If the OS leaves a TCP connection open, I think it is perfectly
reasonable for an application to assume that the OS has sent exactly as
many bytes as it said it sent; no more, no less.

I would lean toward the opinion that postgresql works just fine now, and
that TCP is explicitly designed to prevent these kinds of problems, and
we only see this problem because FreeBSD 6.1 TCP is broken.

Regards,
    Jeff Davis

Re: BUG #3855: backend sends corrupted data onEHOSTDOWNerror

From
"Scot Loach"
Date:
I agree this would be fine if PostgreSQL works the way you say below.

However, PostgreSQL does not look at the # of bytes written and continue
sending after that many bytes.  PostgreSQL actually simply clears its
buffer of bytes to send on this error, in this code:

pqcomm.c:1075
        /*
         * We drop the buffered data anyway so that processing can
         * continue, even though we'll probably quit soon.
         */
        PqSendPointer =3D 0;
        return EOF;


The result as I saw on a system where this was occurring, was that when
PostgreSQL was sending back a large result set, there was simply a
fragment of it missing.

scot.
=20

-----Original Message-----
From: Jeff Davis [mailto:pgsql@j-davis.com]=20
Sent: Tuesday, January 08, 2008 2:02 PM
To: Scot Loach
Cc: pgsql-bugs@postgresql.org
Subject: RE: [BUGS] BUG #3855: backend sends corrupted data
onEHOSTDOWNerror

On Tue, 2008-01-08 at 12:57 -0500, Scot Loach wrote:
> This may be true, but I still think PostgreSQL should be more=20
> defensive and actively terminate the connection when this happens=20
> (like ssh does)

I think postgresql's behavior is well within reason. Let me explain:

What is happening is that FreeBSD *actually sends the data* before
returning EHOSTDOWN as an error, and leaving the TCP connection open! At
the time I was tracking this problem down, I wrote a C program to
demonstrate that fact. This is the core of the reason why it's a
protocol violation in PostgreSQL (or SSL error) rather than a
disconnection.

I think PostgreSQL is making the assumption here that an unrecognized
error code from send() that leaves the connection in a good state, is a
temporary error that may be resolved. Thus, PostgreSQL assumes that due
to the error, no data was written, and re-sends the data, succeeding
this time. I reason that the openssl library makes similar assumptions
(i.e. assuming an error means the data wasn't sent, and resets some
internal SSL protocol state), otherwise I wouldn't get SSL errors
afterward, but it would manifest itself as a PostgreSQL protocol
violation regardless of whether you're using SSL or not.

If the OS leaves a TCP connection open, I think it is perfectly
reasonable for an application to assume that the OS has sent exactly as
many bytes as it said it sent; no more, no less.

I would lean toward the opinion that postgresql works just fine now, and
that TCP is explicitly designed to prevent these kinds of problems, and
we only see this problem because FreeBSD 6.1 TCP is broken.

Regards,
    Jeff Davis

Re: BUG #3855: backend sends corrupted data onEHOSTDOWNerror

From
Jeff Davis
Date:
On Tue, 2008-01-08 at 14:06 -0500, Scot Loach wrote:
> I agree this would be fine if PostgreSQL works the way you say below.
>
> However, PostgreSQL does not look at the # of bytes written and continue
> sending after that many bytes.  PostgreSQL actually simply clears its
> buffer of bytes to send on this error, in this code:
>
> pqcomm.c:1075
>         /*
>          * We drop the buffered data anyway so that processing can
>          * continue, even though we'll probably quit soon.
>          */
>         PqSendPointer = 0;
>         return EOF;
>
>
> The result as I saw on a system where this was occurring, was that when
> PostgreSQL was sending back a large result set, there was simply a
> fragment of it missing.

I think I see what you are saying. I was thinking about fe-misc.c, where
it explicitly says (in the default case of a switch statement of the
return value):

/* We don't assume it's a fatal error... */
conn->outCount = 0;
return -1;

(but that's on the frontend, obviously)

I think the problem you're talking about comes from the callers of
pq_putmessage, which simply ignore any return value at all (and thus do
not retransmit the message). I agree that is a problem (assuming I
understand what's going on).

Regards,
    Jeff Davis

Re: BUG #3855: backend sends corrupted data onEHOSTDOWNerror

From
"Scot Loach"
Date:
Yes that is what I am trying to explain.
So I think this is still a bug that should be fixed in the backend code.

scot.
=20

-----Original Message-----
From: Jeff Davis [mailto:pgsql@j-davis.com]=20
Sent: Tuesday, January 08, 2008 2:40 PM
To: Scot Loach
Cc: pgsql-bugs@postgresql.org
Subject: RE: [BUGS] BUG #3855: backend sends corrupted data
onEHOSTDOWNerror

On Tue, 2008-01-08 at 14:06 -0500, Scot Loach wrote:
> I agree this would be fine if PostgreSQL works the way you say below.
>=20
> However, PostgreSQL does not look at the # of bytes written and=20
> continue sending after that many bytes.  PostgreSQL actually simply=20
> clears its buffer of bytes to send on this error, in this code:
>=20
> pqcomm.c:1075
>         /*
>          * We drop the buffered data anyway so that processing can
>          * continue, even though we'll probably quit soon.
>          */
>         PqSendPointer =3D 0;
>         return EOF;
>=20
>=20
> The result as I saw on a system where this was occurring, was that=20
> when PostgreSQL was sending back a large result set, there was simply=20
> a fragment of it missing.

I think I see what you are saying. I was thinking about fe-misc.c, where
it explicitly says (in the default case of a switch statement of the
return value):

/* We don't assume it's a fatal error... */
conn->outCount =3D 0;
return -1;

(but that's on the frontend, obviously)

I think the problem you're talking about comes from the callers of
pq_putmessage, which simply ignore any return value at all (and thus do
not retransmit the message). I agree that is a problem (assuming I
understand what's going on).

Regards,
    Jeff Davis

Re: BUG #3855: backend sends corrupted data onEHOSTDOWNerror

From
Bruce Momjian
Date:
Email removed from patch queue --- Tom indicates this is an operating
system bug.  Perhaps if we get more bug reports we will have to address
it.

---------------------------------------------------------------------------

Scot Loach wrote:
> Yes that is what I am trying to explain.
> So I think this is still a bug that should be fixed in the backend code.
>
> scot.
>
>
> -----Original Message-----
> From: Jeff Davis [mailto:pgsql@j-davis.com]
> Sent: Tuesday, January 08, 2008 2:40 PM
> To: Scot Loach
> Cc: pgsql-bugs@postgresql.org
> Subject: RE: [BUGS] BUG #3855: backend sends corrupted data
> onEHOSTDOWNerror
>
> On Tue, 2008-01-08 at 14:06 -0500, Scot Loach wrote:
> > I agree this would be fine if PostgreSQL works the way you say below.
> >
> > However, PostgreSQL does not look at the # of bytes written and
> > continue sending after that many bytes.  PostgreSQL actually simply
> > clears its buffer of bytes to send on this error, in this code:
> >
> > pqcomm.c:1075
> >         /*
> >          * We drop the buffered data anyway so that processing can
> >          * continue, even though we'll probably quit soon.
> >          */
> >         PqSendPointer = 0;
> >         return EOF;
> >
> >
> > The result as I saw on a system where this was occurring, was that
> > when PostgreSQL was sending back a large result set, there was simply
> > a fragment of it missing.
>
> I think I see what you are saying. I was thinking about fe-misc.c, where
> it explicitly says (in the default case of a switch statement of the
> return value):
>
> /* We don't assume it's a fatal error... */
> conn->outCount = 0;
> return -1;
>
> (but that's on the frontend, obviously)
>
> I think the problem you're talking about comes from the callers of
> pq_putmessage, which simply ignore any return value at all (and thus do
> not retransmit the message). I agree that is a problem (assuming I
> understand what's going on).
>
> Regards,
>     Jeff Davis
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +