Re: Unhappy with error handling in psql's handleCopyOut() - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Unhappy with error handling in psql's handleCopyOut()
Date
Msg-id 524.1392159921@sss.pgh.pa.us
Whole thread Raw
In response to Re: Unhappy with error handling in psql's handleCopyOut()  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Unhappy with error handling in psql's handleCopyOut()  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> I've not gotten back to it yet, but I ran into a related-seeming issue
>> where psql would happily chew up 2G of memory trying to send "COPY
>> failed" notices when it gets disconnected from a server that it's trying
>> to send data to mid-COPY.  conn->sock was -1, connection was
>> 'CONNECTION_BAD', but the loop towards the end of handleCopyIn doesn't
>> care and nothing in libpq is changing PQresultStatus():

> [ scratches head... ]  Offhand I'd have expected PQgetResult to start
> returning error indications.

After some study of the code I have a theory about this.  Note that
we don't close the socket on send failures anymore; if the status is
CONNECTION_BAD, that pretty much has to have been done by pqReadData.
Which wouldn't be called in a PQputCopyData loop, as long as everything
was going normally.

Ordinarily, while we're pumping data to the backend, we'll flush libpq's
output buffer every 8K (see pqPutMsgEnd).  Suppose that, in one of those
calls to pqSendSome, we fail to send everything in the buffer but
send() doesn't report a hard error --- maybe it says EINTR, or maybe
it just returns zero.  Since there's still data to send, now pqSendSome
will call pqReadData.  Assume that the read attempt *does* give a hard
error; whereupon pqReadData closes the socket, sets CONNECTION_BAD,
and returns -1.  pqSendSome falls out, leaving data still in the
output buffer, and then we'll return -1 to handleCopyIn, which abandons
its data sending loop and calls PQputCopyEnd.  But there's still >= 8K
in the buffer, so when PQputCopyEnd calls pqPutMsgEnd, the latter
tries to send it.  And fails.  (What's more, pqSendSome doesn't clear
the output buffer when it fails at the top because the socket's gone.)
This results in PQputCopyEnd exiting without having changed the
asyncStatus back to ASYNC_BUSY from ASYNC_COPY_IN.  Now PQgetResult
will happily return a COPY_IN result, so we loop, and the whole thing
repeats, with no state change except we keep adding data to the buffer.

I think we need to change pqSendSome to reset outCount to zero if it exits
at the top due to no socket.  It would do that if it got a hard error from
pqsecure_write, so why's it not doing so if there's not even a socket?
This would avoid the problem of bloating the output buffer while we
continue to queue data that can never be sent.  (It might be that some of
the other error situations in pqSendSome should also abandon unwritten
data, but that's less clear.)

Another thought is that maybe PQputCopyEnd should change asyncStatus even
if it fails to queue the copy-end message.  I think the reason why it's
coded like it is is the idea that if we've not queued the message, then
the backend still thinks the copy is active, so we should not change
state.  However, if we've already closed the socket then this is pretty
much moot, so perhaps there needs to be an exception allowing the state
change in that case.  Also, if the output buffer is already over 8K, we're
failing to distinguish "couldn't queue the message" from "couldn't send
the message"; but I'm not sure it's worth refactoring to detect that case.

Alternatively, we could do what I suggested earlier and change PQgetResult
to not return an ordinary COPY_IN or COPY_OUT status if the connection is
known dead.  This seems probably more robust than tinkering at the margins
in PQputCopyEnd.
        regards, tom lane



pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: GSOC13 proposal - extend RETURNING syntax
Next
From: Tom Lane
Date:
Subject: Re: narwhal and PGDLLIMPORT