Thread: Full socket send buffer prevents cancel, timeout

Full socket send buffer prevents cancel, timeout

From
Michael Fuhr
Date:
I've recently been investigating long-running statements that,
despite statement_timeout settings and pg_cancel_backend() attempts,
remain visible in pg_stat_activity and continue to hold locks.  When
this happens, a process trace and a debugger show that the backend
is blocked at the send() in secure_write(), netstat shows that the
backend's send buffer is full, and a packet sniffer shows that the
client TCP stack is sending "win 0", suggesting that the client has
a full receive buffer because the application has stopped reading
data.  Unfortunately we have limited ability to continue the
investigation at the client.

Here's an excerpt from internal_flush():
   while (bufptr < bufend)   {       int         r;
       r = secure_write(MyProcPort, bufptr, bufend - bufptr);
       if (r <= 0)       {           if (errno == EINTR)               continue;       /* Ok if we were interrupted */

If the write is interrupted by a timeout or cancel, can anything
be done here or elsewhere to abort the statement and release its
locks?  I realize that the full send buffer complicates the matter
because of the inability to send any more data to the client, but
I'm wondering if the backend can do anything to get rid of statements
from such misbehaving applications.  We've reluctantly tried SIGTERM
but that doesn't work either.  SIGQUIT and SIGABRT would kill the
entire backend.

Since these statements won't go away, they hold locks that can block
other transactions and they cause vacuum to leave behind dead rows
that it could otherwise clean up.  I noticed this problem in 8.1.14
but I can reproduce it in later versions as well.  I can provide a
test case if anybody's interested (it's easy: use PQsendQuery() to
execute a query that returns enough data to fill both the client's
and the server's socket buffers, then go to sleep without reading
the response).

-- 
Michael Fuhr


Re: Full socket send buffer prevents cancel, timeout

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> I've recently been investigating long-running statements that,
> despite statement_timeout settings and pg_cancel_backend() attempts,
> remain visible in pg_stat_activity and continue to hold locks.  When
> this happens, a process trace and a debugger show that the backend
> is blocked at the send() in secure_write(), netstat shows that the
> backend's send buffer is full, and a packet sniffer shows that the
> client TCP stack is sending "win 0", suggesting that the client has
> a full receive buffer because the application has stopped reading
> data.

> If the write is interrupted by a timeout or cancel, can anything
> be done here or elsewhere to abort the statement and release its
> locks?

The best thing would really be to kill the client.  The backend can't
take it upon itself to interrupt the send, because that would result in
loss of protocol message sync, and without knowing how many bytes got
sent there's really no way to recover.  The only escape from the backend
side would be to abort the session --- and even that's a bit problematic
since we'd probably try to issue an error message somewhere on the way
out, which isn't going to work either if the send buffer is full.
        regards, tom lane


Re: Full socket send buffer prevents cancel, timeout

From
Michael Fuhr
Date:
On Sat, Oct 25, 2008 at 12:36:24PM -0400, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > If the write is interrupted by a timeout or cancel, can anything
> > be done here or elsewhere to abort the statement and release its
> > locks?
> 
> The best thing would really be to kill the client.

Unfortunately the people running the database don't have control
over the client.  They'd like to kill the connection at the database
end but we haven't yet found a reliable way to do that short of an
immediate shutdown, which interrupts service and can lead to a long
recovery.  Are we missing any other possibilities?

> The backend can't take it upon itself to interrupt the send, because
> that would result in loss of protocol message sync, and without
> knowing how many bytes got sent there's really no way to recover.
> The only escape from the backend side would be to abort the session ---
> and even that's a bit problematic since we'd probably try to issue an
> error message somewhere on the way out, which isn't going to work
> either if the send buffer is full.

Yeah, I've already explained those difficulties.  I was hoping that
discussion might generate ideas on how to deal with them.

-- 
Michael Fuhr


Re: Full socket send buffer prevents cancel, timeout

From
"Stephen R. van den Berg"
Date:
Michael Fuhr wrote:
>On Sat, Oct 25, 2008 at 12:36:24PM -0400, Tom Lane wrote:
>> The backend can't take it upon itself to interrupt the send, because
>> that would result in loss of protocol message sync, and without
>> knowing how many bytes got sent there's really no way to recover.
>> The only escape from the backend side would be to abort the session ---
>> and even that's a bit problematic since we'd probably try to issue an
>> error message somewhere on the way out, which isn't going to work
>> either if the send buffer is full.

>Yeah, I've already explained those difficulties.  I was hoping that
>discussion might generate ideas on how to deal with them.

What about simply closing the filedescriptor upon discovering a
non-empty sendbuffer upon timeout/querycancel?
-- 
Sincerely,          Stephen R. van den Berg.

Teamwork is essential -- it allows you to blame someone else.


Re: Full socket send buffer prevents cancel, timeout

From
Tom Lane
Date:
"Stephen R. van den Berg" <srb@cuci.nl> writes:
> What about simply closing the filedescriptor upon discovering a
> non-empty sendbuffer upon timeout/querycancel?

So in other words, convert any network glitch, no matter how small,
into an instant fatal error?
        regards, tom lane


Re: Full socket send buffer prevents cancel, timeout

From
"Stephen R. van den Berg"
Date:
Tom Lane wrote:
>"Stephen R. van den Berg" <srb@cuci.nl> writes:
>> What about simply closing the filedescriptor upon discovering a
>> non-empty sendbuffer upon timeout/querycancel?

>So in other words, convert any network glitch, no matter how small,
>into an instant fatal error?

The fact that a timeout or querycancel has taken place, indicates that
this does *not* act on just any network glitch.
The preferred exact logic would look something like:

a. Take note of the time the last write returned as Tlastwritten.
b. Whenever a timeout occurs, or a querycancel is being requested:
c. Check if the sendbuffer is empty.
d. If the sendbuffer is non-empty *and* Tlastwritten is longer ago  than some Tkill (say 128 seconds), then close the
filedescriptor.

In all other cases, just hang on tight.
-- 
Sincerely,          Stephen R. van den Berg.

Teamwork is essential -- it allows you to blame someone else.


Re: Full socket send buffer prevents cancel, timeout

From
Tom Lane
Date:
"Stephen R. van den Berg" <srb@cuci.nl> writes:
> Tom Lane wrote:
>> "Stephen R. van den Berg" <srb@cuci.nl> writes:
>>> What about simply closing the filedescriptor upon discovering a
>>> non-empty sendbuffer upon timeout/querycancel?

>> So in other words, convert any network glitch, no matter how small,
>> into an instant fatal error?

> The fact that a timeout or querycancel has taken place, indicates that
> this does *not* act on just any network glitch.

No, just any one that is unfortunate enough to occur at the moment of a
query timeout or cancel.
        regards, tom lane