Re: Unhappy with error handling in psql's handleCopyOut() - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Unhappy with error handling in psql's handleCopyOut() |
Date | |
Msg-id | 524.1392159921@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Unhappy with error handling in psql's handleCopyOut() (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Unhappy with error handling in psql's handleCopyOut()
|
List | pgsql-hackers |
I wrote: > Stephen Frost <sfrost@snowman.net> writes: >> I've not gotten back to it yet, but I ran into a related-seeming issue >> where psql would happily chew up 2G of memory trying to send "COPY >> failed" notices when it gets disconnected from a server that it's trying >> to send data to mid-COPY. conn->sock was -1, connection was >> 'CONNECTION_BAD', but the loop towards the end of handleCopyIn doesn't >> care and nothing in libpq is changing PQresultStatus(): > [ scratches head... ] Offhand I'd have expected PQgetResult to start > returning error indications. After some study of the code I have a theory about this. Note that we don't close the socket on send failures anymore; if the status is CONNECTION_BAD, that pretty much has to have been done by pqReadData. Which wouldn't be called in a PQputCopyData loop, as long as everything was going normally. Ordinarily, while we're pumping data to the backend, we'll flush libpq's output buffer every 8K (see pqPutMsgEnd). Suppose that, in one of those calls to pqSendSome, we fail to send everything in the buffer but send() doesn't report a hard error --- maybe it says EINTR, or maybe it just returns zero. Since there's still data to send, now pqSendSome will call pqReadData. Assume that the read attempt *does* give a hard error; whereupon pqReadData closes the socket, sets CONNECTION_BAD, and returns -1. pqSendSome falls out, leaving data still in the output buffer, and then we'll return -1 to handleCopyIn, which abandons its data sending loop and calls PQputCopyEnd. But there's still >= 8K in the buffer, so when PQputCopyEnd calls pqPutMsgEnd, the latter tries to send it. And fails. (What's more, pqSendSome doesn't clear the output buffer when it fails at the top because the socket's gone.) This results in PQputCopyEnd exiting without having changed the asyncStatus back to ASYNC_BUSY from ASYNC_COPY_IN. Now PQgetResult will happily return a COPY_IN result, so we loop, and the whole thing repeats, with no state change except we keep adding data to the buffer. I think we need to change pqSendSome to reset outCount to zero if it exits at the top due to no socket. It would do that if it got a hard error from pqsecure_write, so why's it not doing so if there's not even a socket? This would avoid the problem of bloating the output buffer while we continue to queue data that can never be sent. (It might be that some of the other error situations in pqSendSome should also abandon unwritten data, but that's less clear.) Another thought is that maybe PQputCopyEnd should change asyncStatus even if it fails to queue the copy-end message. I think the reason why it's coded like it is is the idea that if we've not queued the message, then the backend still thinks the copy is active, so we should not change state. However, if we've already closed the socket then this is pretty much moot, so perhaps there needs to be an exception allowing the state change in that case. Also, if the output buffer is already over 8K, we're failing to distinguish "couldn't queue the message" from "couldn't send the message"; but I'm not sure it's worth refactoring to detect that case. Alternatively, we could do what I suggested earlier and change PQgetResult to not return an ordinary COPY_IN or COPY_OUT status if the connection is known dead. This seems probably more robust than tinkering at the margins in PQputCopyEnd. regards, tom lane
pgsql-hackers by date: