Thread: [bug fix] psql \copy doesn't end if backend is killed
Hello, I've encountered a bug on PG 9.2 and fixed it for 9.4. Please find attached the patch. I'd like it to be backported to at least 9.2. [Problem] If the backend is terminated with SIGKILL while psql is running "\copy table_name from file_name", the \copy didn't end forever. I expected \copy to be cancelled because the corresponding server process vanished. [Cause] psql could not get out of the loop below in handleCopyIn(): while (res = PQgetResult(conn), PQresultStatus(res) == PGRES_COPY_IN) { OK = false; PQclear(res); PQputCopyEnd(pset.db, _("trying to exit copy mode")); } This situation is reached as follows: 1. handleCopyIn() calls PQputCopyData(). 2. PQputCopyData() calls pqPutMsgEnd(). 3. pqPutMsgEnd() calls pqFlush(), which calls pqSendSome(). 4. pqSendSome() calls pqReadData(). 5. At this moment, the backend is killed with SIGKILL. 6. pqReadData() fails to read the socket, receiving ECONNRESET. It closes the socket. 7. As a result, PQputCopyData() fails in 2. 8. handleCopyIn() then calls PQputCopyEnd(). 9. PQputCopyEnd() calls pqPutMsgENd(), which calls pqFlush(), which in turn calls pqSendSome(). 10. pqSendSome() fails because the socket is not open. 11. As a result, PQputCopyENd() returns an error, leaving conn->asyncStatus PGASYNC_COPY_IN. 12. Because conn->asyncStatus remains PGASYNC_COPY_IN, PQgetResult() continues to return pgresult whose status is PGRES_COPY_IN. [Fix] If the message transmission fails in PQputCopyEnd(), switch conn->asyncStatus back to PGASYNC_BUSY. That causes PQgetResult() to try to read data with pqReadData(). pqReadData() fails and PQgetResult() returns NULL. As a consequence, the loop in question terminates. Regards MauMau
Attachment
On 20 December 2013 19:43, MauMau Wrote > [Problem] > If the backend is terminated with SIGKILL while psql is running "\copy > table_name from file_name", the \copy didn't end forever. I expected > \copy > to be cancelled because the corresponding server process vanished. > > > [Cause] > psql could not get out of the loop below in handleCopyIn(): > > while (res = PQgetResult(conn), PQresultStatus(res) == PGRES_COPY_IN) > { > OK = false; > PQclear(res); > > PQputCopyEnd(pset.db, _("trying to exit copy mode")); > } 1. Patch applied to git head successfully 2. Problem can occur in some scenario and fix looks fine to me. 3. For testing, I think it's not possible to directly generate such scenario, so I have verified by debugging as the stepsexplained. 1. Make pqsecure_write to return less byte(by updating the result while debugging in gdb in pqSendSome.(also make sure thatremaining byte is >= 8192 i.e conn->outCount-sent > 8192 , so that in next step pqPutMsgEnd called from PQputCopyEndgo for flushing the data) 2. Then Kill the backend process before it calls pqReadData. Scenario reproduced without patch and after applying the patch issue resolves. Is there any direct scenario by which it can be reproduce ? Regards, Dilip
From: "Dilip kumar" <dilip.kumar@huawei.com> Is there any direct scenario by which it can be reproduce ? Thank you for reviewing and testing the patch. There is no other direct scenario. I reproduced the failure exactly like you suggested, because it was very difficult to reproduce the problem without using the debugger. Regards MauMau
"MauMau" <maumau307@gmail.com> writes: > If the backend is terminated with SIGKILL while psql is running "\copy > table_name from file_name", the \copy didn't end forever. I expected \copy > to be cancelled because the corresponding server process vanished. I just noticed this CF entry pertaining to the same problem that Stephen Frost reported a couple days ago: http://www.postgresql.org/message-id/20140211205336.GU2921@tamriel.snowman.net I believe it's been adequately fixed as of commits fa4440f516 and b8f00a46bc, but if you'd test that those handle your problem cases, I'd appreciate it. > [Fix] > If the message transmission fails in PQputCopyEnd(), switch > conn->asyncStatus back to PGASYNC_BUSY. This patch seems inappropriate to me, because it will allow libpq to exit the COPY IN state whether or not it still has a live connection. If it does, the backend will be in an inconsistent state and we'll have a mess. regards, tom lane
From: "Tom Lane" <tgl@sss.pgh.pa.us> > I just noticed this CF entry pertaining to the same problem that Stephen > Frost reported a couple days ago: > http://www.postgresql.org/message-id/20140211205336.GU2921@tamriel.snowman.net > > I believe it's been adequately fixed as of commits fa4440f516 and > b8f00a46bc, but if you'd test that those handle your problem cases, > I'd appreciate it. I confirmed that the problem disappeared. I'll delete my CommitFest entry in several days. Regards MauMau