Thread: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
Denis Perchine
Date:
Hello all, This is the result of my 6 days "war" with Linux kernel people... Any usefull comments on this???? ---------- Forwarded Message ---------- Subject: Re: Fwd: Problem with recv syscall on socket when other side closed connection Date: Tue, 27 Jun 2000 16:21:55 +0400 (MSK DST) From: kuznet@ms2.inr.ac.ru Hello! > Sorry... But seems that you did not understand the problem. > I talk about recv... Not write... write SHOULD give EPIPE on connection reset... > But not recv/read. I did understand. This error was for write(), but it became known _after_ you exited write(). So that it is delivered to read(). It is usual problem of all full-duplex pipes. We could translate this EPIPE to ECONNRESET, when it is delivered to read(), but it does not change its sense. Solaris does not translate. > Usual way of handling connection reset when you do only read is to give > all data available and then return 0, indicating EOF. Sorry? Think a bit. You wrote to dead socket, right? It is the hardest error. If the transport were local, you would get SIGPIPE and died painful death. If an OS ignores such events, it is simply impossible to use, you will get silently truncated data all the time. > Or some OSes (HPUX if I'm not mistaken) gives you all data available and then > ECONNRESET. But not other way around... This approach has its merits, and it is acceptable in principle. But Linux approach is evidently better, because errors are expedited. Each protocol, where out of band events are inlined to data is inclined to deadlocks. In Linux scheme you know forward that stream is aborted. Depending on protocol you may choose to abort protocol or to continue to operate, parsing already received messages. > But not other way around... You have just seen a new way around. The correct one. 8) Alexey ------------------------------------------------------- -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
Tom Lane
Date:
Denis Perchine <dyp@perchine.com> writes: > This is the result of my 6 days "war" with Linux kernel people... > Any usefull comments on this???? The Linux people seem to be assuming (erroneously) that application protocols are strictly I-send-and-then-you-send. That's too restrictive, and in fact it falls down in exactly the case that we're seeing in libpq: one side of the connection may send an error message "out of turn" and then close the connection. If the other side of the connection was busy sending, the first thing it will get is an EPIPE error on its send. By closing the connection *AND DISCARDING VALID USER DATA* at this point, the Linux kernel makes it impossible to retrieve the error message --- which might have contained essential information. > In Linux scheme you know forward that stream is aborted. > Depending on protocol you may choose to abort protocol > or to continue to operate, parsing already received messages. But what about the messages you didn't get yet, but the other end sent in good faith? There's nothing in the TCP specs that says a program can't close its end of the connection as soon as it has sent the last data it intends to send. >> But not other way around... > You have just seen a new way around. The correct one. 8) No, just a new half-baked excuse for doing things wrong. The kernel at the other side of the connection accepted the data for delivery. That means that both sides of the connection are going to make their best efforts to deliver it. By willfully failing to deliver that data, the Linux kernel is violating the fundamental premise of TCP (or any other reliable byte-stream protocol). This is not "correct", it is broken. Do I need to quote RFC chapter and verse at you? regards, tom lane
Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
kuznet@ms2.inr.ac.ru
Date:
Hello! > > This is the result of my 6 days "war" with Linux kernel people... Alas, I did not understand that he was in the state of war. I believed he asked for an advice and/or reported a bug. 8) > *AND DISCARDING VALID USER DATA* at this point, the Linux kernel > makes it impossible to retrieve the error message --- which might > have contained essential information. Denis was explained that Linux _never_ discards any data... Ough, I beg pardon: Denis noticed this _himself_. Please, do not desinform people. BTW look at this. It is RFC1122, 4.2.2.13. If a TCP connection is closed by the remote site, the local applicationMUST be informed whether it closed normally or was aborted. See? 8) Also, David Miller explained (and Denis understood and accepted this) that TCP does not guarantee data delivery, when an RST is issued. It is solely due to intrinsic network unreliability and segment reordering. The application depending on these aspects is broken. > But what about the messages you didn't get yet, but the other end > sent in good faith? There's nothing in the TCP specs that says > a program can't close its end of the connection as soon as it has > sent the last data it intends to send. TCP specs say directly and unabiguously, that sending data to half-closed pipe is followed by immediate abort (RFC1122, a bit below the cite above). More detailed explanation can be found in current draft-ietf-tcpimpl. > Do I need to quote RFC chapter and verse at you? Of course. Alexey
Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
Tom Lane
Date:
[ Sorry for delay in response, I had other things to do over the weekend. ] kuznet@ms2.inr.ac.ru writes: > BTW look at this. It is RFC1122, 4.2.2.13. > If a TCP > connection is closed by the remote site, the local > application MUST be informed whether it closed normally or > was aborted. So? This is not relevant, because the connection was not aborted. The sentence immediately preceding that one defines an abort as an event in which RST segment(s) are sent, but closure of a connection is defined to send FIN, not RST. (More about that below.) The more relevant quote is the next paragraph, The normal TCP close sequence delivers buffered data reliably in both directions. Since the two directionsof a TCP connection are closed independently, it is possible for a connection to be "half closed,"i.e., closed in only one direction, and a host is permitted to continue sending data in the opendirection on a half-closed connection. I do not see how you can read the first sentence of that paragraph in any way but to say that data once sent must be delivered if at all possible. Another example is from RFC-793 (STD-7), section 3.8, definition of CLOSE: Closing connections is intended to be a graceful operation in the sense that outstanding SENDs will be transmitted(and retransmitted), as flow control permits, until all have been serviced. Thus, it should be acceptableto make several SEND calls, followed by a CLOSE, and expect all the data to be sent to the destination. In our situation, the server sends (queues) some data and then closes its side of the connection. The server-side TCP stack should send the data along with FIN and then go to FIN-WAIT-1 state. In this state the server side may receive more data from the client side (since the client isn't yet aware the server has quit). RFC-793 is perfectly clear that the server side must send a dummy ACK but *no* RST in this case --- see section 3.4, almost the end of the section: 3. If the connection is in a synchronized state (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK,TIME-WAIT), any unacceptable segment (out of window sequence number or unacceptible acknowledgment number) mustelicit only an empty acknowledgment segment containing the current send-sequence number and an acknowledgment indicatingthe next sequence number expected to be received, and the connection remains in the same state. Therefore, sending data to a no-longer-present receiver does not cause a connection reset (at least not in a spec-conforming TCP stack), and there is no justification for discarding data that is coming the other way. The Linux kernel's present behavior is contrary to the standard, unable to support an essential user capability (ie, delivery of last-gasp error messages), and contrary to the behavior of all other TCP implementations that I have worked with. There is a reason why you are in the minority here... regards, tom lane
Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
Tom Lane
Date:
Alan Cox <alan@lxorguk.ukuu.org.uk> writes: >> 3. If the connection is in a synchronized state (ESTABLISHED, >> FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), >> any unacceptable segment (out of window sequence number or >> unacceptible acknowledgment number) must elicit only an empty >> acknowledgment segment containing the current send-sequence number >> and an acknowledgment indicating the next sequence number expected >> to be received, and the connection remains in the same state. > Reread the 3. above. What it actually requires if you think about it is that > the receive window is shrunk to zero and the connection hangs for all > eternity the way you are arguing it. No, it doesn't "hang for all eternity", it sits in the same state until (a) the client side closes its sending side of the connection (ie, sends FIN), or (b) the FIN-WAIT-1 state times out. But given a normally responsive client and no loss of physical connectivity or crash of either TCP stack, there is no connection reset and no failure to deliver sent data. There would be no need for all the half-open-connection verbiage if the spec were meant to be read the way you are reading it. regards, tom lane
Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
Tom Lane
Date:
Alan Cox <alan@lxorguk.ukuu.org.uk> writes: >> No, it doesn't "hang for all eternity", it sits in the same state until >> (a) the client side closes its sending side of the connection (ie, sends >> FIN), or (b) the FIN-WAIT-1 state times out. But given a normally >> responsive client and no loss of physical connectivity or crash of >> either TCP stack, there is no connection reset and no failure to deliver >> sent data. > I cannot ack the data since it has not been read, so how can I ack the fin ? ACK does not mean that you've delivered the data to the end user. RFC 793, 2.6: An acknowledgment by TCP does not guarantee that the data has been delivered to the end user, but only that the receivingTCP has taken the responsibility to do so. Bit-bucketing the data because the end user app is no longer present to accept it (due to having already closed its input socket) is implicitly within the receiving TCP's authority here. I think this is the core of our disagreement, but I can see no justification for your position that ACK implies the data has been delivered to the end user. Every TCP implementation I've ever heard of sends ACK as soon as it's collected data into kernel buffers, *not* after the application has executed recv() to extract the data from the kernel. (Who's to say that completion of recv() represents final delivery of the data anyway? Sending ACK cannot be considered a report of end-to-end delivery; that has to be an application-level concept.) Also observe that the discussion of segment-arrival processing in section 3.9 explicitly says that the behavior in FIN-WAIT-1 and later states is not different from the behavior in ESTABLISHED state. In particular, if you do not like the segment: If an incoming segment is not acceptable, an acknowledgment should be sent in reply (unless the RST bit is set,if so drop the segment and return): <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> After sending the acknowledgment, drop the unacceptable segment and return. There is no room here for the TCP to decide to send RST instead. regards, tom lane
Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
kuznet@ms2.inr.ac.ru
Date:
Hello! > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> > > After sending the acknowledgment, drop the unacceptable segment > and return. > > There is no room here for the TCP to decide to send RST instead. I apologize, but RFC793 is sort of incomplete. Please, look at errata in RFC1122 and to bug alerts described in documents published by tcp-impl (draft-tcpimpl-*). I cited you corresponding paragraph of the RFC in previous mail. Shortly: 1. When new data arrive after half-duplex close, we must reset. 2. When close occurs on connection, which has unread data, we must reset. It is required from the viewpoint of TCP protocol. Any OS, which forgets to make this is buggy. By the way, I do not know about OSes, which do not make this. From the viewpoint of application, the behaviour is also correct. Data arrived, when nobody plans to read it, unambiguously means either connection abort or hard bug in application protocol. Alexey
Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
Tom Lane
Date:
kuznet@ms2.inr.ac.ru writes: > I apologize, but RFC793 is sort of incomplete. Please, look at > errata in RFC1122 and to bug alerts described in documents published > by tcp-impl (draft-tcpimpl-*). The errata in RFC 1122 do not recommend any changes in connection closure behavior. You appear to be hanging your hat on this paragraph from 1122 4.2.2.13: A host MAY implement a "half-duplex" TCP close sequence, so that an application that has called CLOSEcannot continue to read data from the connection. If such a host issues a CLOSE call while receiveddata is still pending in TCP, or if new data is received after CLOSE is called, its TCP SHOULDsend a RST to show that data was lost. However I read this as a requirement pertaining only to half-duplex close sequences. There is nothing half-duplex about closing a socket completely. In any case, it can hardly be a good idea to abort the flow of data in the outbound direction in order to report that data is being dropped in the inbound direction. If an application has done a half-close to close its inbound side only, but wants to keep sending outbound data, it presumably has a good reason for doing so. Behaving as you suggest would render this mode of operation useless. As for the drafts, I assume you are referring to sections 2.16 and 2.17 of RFC 2525 --- I couldn't find anything about connection resets in the other files ftp://ftp.isi.edu/internet-drafts/draft-ietf-tcpimpl-*. May I remind you that 2525 is an informational RFC, not a standards-track RFC, and accordingly it has not been reviewed to the extent that a proposed standards change would be? I shall be writing to the authors of 2525 to object to sections 2.16 and 2.17 on the grounds that an RST causes data loss in the other direction. We'll see what they have to say. > From the viewpoint of application, the behaviour is also correct. > Data arrived, when nobody plans to read it, unambiguously means > either connection abort or hard bug in application protocol. Sure, it's a connection abort. My point is that RST is an unacceptably blunt instrument for reporting it, because it causes loss of data going in the other direction. regards, tom lane
Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
kuznet@ms2.inr.ac.ru
Date:
Hello! > blunt instrument for reporting it, because it causes loss of data going > in the other direction. First. Data, which reached the host are not lost. Second. TCP may lose this data because this data did not reach host before reset arrived, indeed. After the second we arrive at the next: if you send to dead pipe, or do not read some remnant of data before closing, it is _HARD_ bug in your application or in protocol. Do you understand what hard bug is? It is when further behaviour is unpredictable and the state cannot be recovered. Essentially, it is thing which exceptions and fatal signals are invented for. 8) Alexey
Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
Tom Lane
Date:
kuznet@ms2.inr.ac.ru writes: >> blunt instrument for reporting it, because it causes loss of data going >> in the other direction. > First. Data, which reached the host are not lost. As I recall, the original complaint was precisely that Linux discards the server->client data instead of allowing the client to read it. This was on a single machine, so there's no issue of whether it got lost in the network. regards, tom lane
Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
Alan Cox
Date:
> 3. If the connection is in a synchronized state (ESTABLISHED, > FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), > any unacceptable segment (out of window sequence number or > unacceptible acknowledgment number) must elicit only an empty > acknowledgment segment containing the current send-sequence number > and an acknowledgment indicating the next sequence number expected > to be received, and the connection remains in the same state. > > Therefore, sending data to a no-longer-present receiver does not cause > a connection reset (at least not in a spec-conforming TCP stack), and > there is no justification for discarding data that is coming the other > way. > > The Linux kernel's present behavior is contrary to the standard, unable > to support an essential user capability (ie, delivery of last-gasp error > messages), and contrary to the behavior of all other TCP implementations > that I have worked with. There is a reason why you are in the minority > here... Reread the 3. above. What it actually requires if you think about it is that the receive window is shrunk to zero and the connection hangs for all eternity the way you are arguing it. Alan
Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
Alan Cox
Date:
> No, it doesn't "hang for all eternity", it sits in the same state until > (a) the client side closes its sending side of the connection (ie, sends > FIN), or (b) the FIN-WAIT-1 state times out. But given a normally > responsive client and no loss of physical connectivity or crash of > either TCP stack, there is no connection reset and no failure to deliver > sent data. I cannot ack the data since it has not been read, so how can I ack the fin ?
Re: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
kuznet@ms2.inr.ac.ru
Date:
Hello! > As I recall, the original complaint was precisely that Linux discards > the server->client data instead of allowing the client to read it. This > was on a single machine, so there's no issue of whether it got lost in > the network. I am sorry. I have already said: it is not truth. Original reporter (Denis) blamed particularly on the fact, that Linux allows to read all queued data until EOF. Try yourself, if you do not believe. Unfortunately, I deleted that his mail, but you can find it in mail archives I think, it was to netdev or to linux-kernel. Alexey
Re[2]: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
Denis Perchine
Date:
Hello kuznet, Wednesday, July 05, 2000, 7:06:06 PM, you wrote: kmiar> Hello! >> As I recall, the original complaint was precisely that Linux discards >> the server->client data instead of allowing the client to read it. This >> was on a single machine, so there's no issue of whether it got lost in >> the network. kmiar> I am sorry. I have already said: it is not truth. kmiar> Original reporter (Denis) blamed particularly on the fact, kmiar> that Linux allows to read all queued data until EOF. kmiar> Try yourself, if you do not believe. kmiar> Unfortunately, I deleted that his mail, but you can find it kmiar> in mail archives I think, it was to netdev or to linux-kernel. I blamed that: Linux gives you EPIPE when you call recv before all data available is retrieved. If you will try to read AFTER error you will get all data. Problem is that it makes handling very complicated. In the case of EPIPE you should try to read again. The problem is that you should always try only once. -- Best regards,Denis mailto:dyp@perchine.com
Re: Re[2]: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
Tom Lane
Date:
Denis Perchine <dyp@perchine.com> writes: > I blamed that: Linux gives you EPIPE when you call recv before all > data available is retrieved. If you will try to read AFTER error you > will get all data. Problem is that it makes handling very complicated. > In the case of EPIPE you should try to read again. The problem is that > you should always try only once. Ah, thanks for the correction. Now, should we really program around this behavior of the Linux kernel? I cannot think of any other OS where it is appropriate to retry on an EPIPE error condition, nor does it make any sense to do so given the normal meaning of that error code. "Retry, but only once" is even sillier. I still think this behavior is a bug, not a feature. If you want to issue EPIPE (or more likely ECONNRESET) *after* all available data has been read, that's fine, and EPIPE for subsequent send attempts makes sense too. But EPIPE on read when there is more data available is just plain bizarre. regards, tom lane
Re: Re[2]: Fwd: Re: Fwd: Problem with recv syscall on socket when other side closed connection
From
kuznet@ms2.inr.ac.ru
Date:
Hello! > I blamed that: Linux gives you EPIPE when you call recv before all > data available is retrieved. If you will try to read AFTER error you > will get all data. Problem is that it makes handling very complicated. > In the case of EPIPE you should try to read again. The problem is that > you should always try only once. Well, to me it does not look very essential, when asynchronous error returned. Remember about EAGAIN and EINTR yet. You are not confused with such erros, right? Why? 8) Seems, this order of issuing errors etc. at read() is specified in posix. I do not know really. I have said, error reporting only if no data are pending, looks legal and has its merits. Main thing is not to forget to report error at all. 8) [ Alan, seems, all the comments about order of checks while read() are your ones. Can you comment? Maybe, it is really worthto change. ] Side note: TLI really does not _allow_ any operations on endpoint in any direction until asynchronous error condition is cleared. In fact, Linux does this on BSD sockets as well. This is really natural, but I agree, it is inconvenient yet. Alexey