Thread: [bug fix] psql \copy doesn't end if backend is killed

[bug fix] psql \copy doesn't end if backend is killed

From
"MauMau"
Date:
Hello,

I've encountered a bug on PG 9.2 and fixed it for 9.4.  Please find attached
the patch.  I'd like it to be backported to at least 9.2.


[Problem]
If the backend is terminated with SIGKILL while psql is running "\copy
table_name from file_name", the \copy didn't end forever.  I expected \copy
to be cancelled because the corresponding server process vanished.


[Cause]
psql could not get out of the loop below in handleCopyIn():

while (res = PQgetResult(conn), PQresultStatus(res) == PGRES_COPY_IN)
{
    OK = false;
    PQclear(res);

    PQputCopyEnd(pset.db, _("trying to exit copy mode"));
}

This situation is reached as follows:

1. handleCopyIn() calls PQputCopyData().
2. PQputCopyData() calls pqPutMsgEnd().
3. pqPutMsgEnd() calls pqFlush(), which calls pqSendSome().
4. pqSendSome() calls pqReadData().
5. At this moment, the backend is killed with SIGKILL.
6. pqReadData() fails to read the socket, receiving ECONNRESET.  It closes
the socket.
7. As a result, PQputCopyData() fails in 2.
8. handleCopyIn() then calls PQputCopyEnd().
9. PQputCopyEnd() calls pqPutMsgENd(), which calls pqFlush(), which in turn
calls pqSendSome().
10. pqSendSome() fails because the socket is not open.
11. As a result, PQputCopyENd() returns an error, leaving conn->asyncStatus
PGASYNC_COPY_IN.
12. Because conn->asyncStatus remains PGASYNC_COPY_IN, PQgetResult()
continues to return pgresult whose status is PGRES_COPY_IN.


[Fix]
If the message transmission fails in PQputCopyEnd(), switch
conn->asyncStatus back to PGASYNC_BUSY.  That causes PQgetResult() to try to
read data with pqReadData().  pqReadData() fails and PQgetResult() returns
NULL.  As a consequence, the loop in question terminates.


Regards
MauMau

Attachment

Re: [bug fix] psql \copy doesn't end if backend is killed

From
Dilip kumar
Date:
On 20 December 2013 19:43, MauMau Wrote
> [Problem]
> If the backend is terminated with SIGKILL while psql is running "\copy
> table_name from file_name", the \copy didn't end forever.  I expected
> \copy
> to be cancelled because the corresponding server process vanished.
>
>
> [Cause]
> psql could not get out of the loop below in handleCopyIn():
>
> while (res = PQgetResult(conn), PQresultStatus(res) == PGRES_COPY_IN)
> {
>     OK = false;
>     PQclear(res);
>
>     PQputCopyEnd(pset.db, _("trying to exit copy mode"));
> }


1. Patch applied to git head successfully
2. Problem can occur in some scenario and fix looks fine to me.
3. For testing, I think it's not possible to directly generate such scenario, so I have verified by debugging as the
stepsexplained. 
1. Make pqsecure_write to return less byte(by updating the result while debugging in gdb  in pqSendSome.(also make sure
thatremaining byte is >= 8192 i.e conn->outCount-sent > 8192 , so that in next step pqPutMsgEnd called from
PQputCopyEndgo for flushing the data) 
2. Then Kill the backend process before it calls pqReadData.
Scenario reproduced without patch and after applying the patch issue resolves.

Is there any direct scenario by which it can be reproduce ?

Regards,
Dilip





Re: [bug fix] psql \copy doesn't end if backend is killed

From
"MauMau"
Date:
From: "Dilip kumar" <dilip.kumar@huawei.com>
Is there any direct scenario by which it can be reproduce ?

Thank you for reviewing and testing the patch.  There is no other direct 
scenario.
I reproduced the failure exactly like you suggested, because it was very 
difficult to reproduce the problem without using the debugger.

Regards
MauMau




Re: [bug fix] psql \copy doesn't end if backend is killed

From
Tom Lane
Date:
"MauMau" <maumau307@gmail.com> writes:
> If the backend is terminated with SIGKILL while psql is running "\copy 
> table_name from file_name", the \copy didn't end forever.  I expected \copy 
> to be cancelled because the corresponding server process vanished.

I just noticed this CF entry pertaining to the same problem that Stephen
Frost reported a couple days ago:
http://www.postgresql.org/message-id/20140211205336.GU2921@tamriel.snowman.net

I believe it's been adequately fixed as of commits fa4440f516 and
b8f00a46bc, but if you'd test that those handle your problem cases,
I'd appreciate it.

> [Fix]
> If the message transmission fails in PQputCopyEnd(), switch 
> conn->asyncStatus back to PGASYNC_BUSY.

This patch seems inappropriate to me, because it will allow libpq to exit
the COPY IN state whether or not it still has a live connection.  If it
does, the backend will be in an inconsistent state and we'll have a mess.
        regards, tom lane



Re: [bug fix] psql \copy doesn't end if backend is killed

From
"MauMau"
Date:
From: "Tom Lane" <tgl@sss.pgh.pa.us>
> I just noticed this CF entry pertaining to the same problem that Stephen
> Frost reported a couple days ago:
> http://www.postgresql.org/message-id/20140211205336.GU2921@tamriel.snowman.net
>
> I believe it's been adequately fixed as of commits fa4440f516 and
> b8f00a46bc, but if you'd test that those handle your problem cases,
> I'd appreciate it.

I confirmed that the problem disappeared.  I'll delete my CommitFest entry 
in several days.

Regards
MauMau