BUG #3855: backend sends corrupted data on EHOSTDOWN error - Mailing list pgsql-bugs

From Scot Loach
Subject BUG #3855: backend sends corrupted data on EHOSTDOWN error
Date
Msg-id 200801080150.m081opPX034245@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #3855: backend sends corrupted data on EHOSTDOWN error
Re: BUG #3855: backend sends corrupted data on EHOSTDOWN error
List pgsql-bugs
The following bug has been logged online:

Bug reference:      3855
Logged by:          Scot Loach
Email address:      sloach@sandvine.com
PostgreSQL version: 8.2.4
Operating system:   freebsd 6.1
Description:        backend sends corrupted data on EHOSTDOWN error
Details:

On FreeBSD, it is possible for a send() call on the backend socket to return
an error code of EHOSTDOWN.  This error can happen, for example, if a host
on the local LAN is temporarily unreachable.  In this case, the socket is
not closed, and it may recover from this state.  If it recovers, it is
possible that the backend will continue sending results from a query, but it
will have dropped some data from the reply.  This causes the client to be
out of sync with the server, which usually causes it to read an invalid
length byte.  This can cause various issues, such as clients crashing or,
more commonly, blocking forever while trying to read a large response the
server will never send.

This is due to the way the backend handles errors.  The following code
(pqcomm.c:1075) is what happens when an error occurs on the write:

        /*
         * We drop the buffered data anyway so that processing can
         * continue, even though we'll probably quit soon.
         */
        PqSendPointer = 0;
        return EOF;

This sets PqSendPointer to 0, which effectively clears any data that was
waiting to be sent.  This EOF error propagates up the stack to pqformat.c:

void
pq_endmessage(StringInfo buf)
{
        /* msgtype was saved in cursor field */
        (void) pq_putmessage(buf->cursor, buf->data, buf->len);
        /* no need to complain about any failure, since pqcomm.c already did
*/
        pfree(buf->data);
        buf->data = NULL;
}

In other words, postgres seems to be expecting that the connection will
somehow be closed.  Which in most errors, does happen; the stack will close
the TCP connection and no harm will be done.  But in the case of this
particular error, the connection stays open, the client is waiting forever
for bytes the server will never send, and the server is idle in its
transaction, holding locks and waiting for a command from the client that
will never come.

The backend should either close the connection itself in this case, or
handle the error better by not clearing the send buffer.

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #3852: Could not create complex aggregate
Next
From: ""
Date:
Subject: BUG #3856: faile to run initdb:1!