Thread: BUG #1467: fe_connect doesn't handle EINTR right

BUG #1467: fe_connect doesn't handle EINTR right

From
AgentM
Date:
Attached is a patch which corrects the behavior. I verified that the
patch does not interfere with normal operation (using psql) but
unfortunately the code path is virtually impossible to test without a
really slow connection to a postgresql server [which I thankfully
don't have]. To test the patch, you would need to send an interrupt
at the exact time that the kernel is connect()ing in blocking mode-
good luck.

Also, I recommend removing a (sarcastic?) comment left by a previous
developer- I wrote a note about in my patch.

The patch is against 8.0.3 [because HEAD requires access to a
specific version of bison] but I imagine that the code hasn't changed
in fe-connect.c since then.

Patch against src/interfaces/libpq/fe-connect.c (v 8.0.3)

Bonus link: http://maps.google.com/maps?q=13+Roberts+Road,+Newtown
+Square,+Pennsylvania&spn=0.007639,0.010654&t=k&hl=en

Begin forwarded message:

> From: Bruce Momjian <pgman@candle.pha.pa.us>
> Date: June 24, 2005 9:27:09 PM CDT
> To: AgentM <agentm@themactionfaction.com>
> Subject: Re: [HACKERS] [BUGS] BUG #1467: fe_connect doesn't handle
> EINTR right
>
>
>
> Would you develop a patch in the next few days for this?
>
> ----------------------------------------------------------------------
> -----
>
> AgentM wrote:
>
>> It seems pretty cut-and-dry. The reporter is correct:
>> "If connect() is interrupted by a signal that is caught while blocked
>> waiting to establish a connection, connect() shall fail and set errno
>> to [EINTR], but the connection request shall not be aborted, and the
>> connection shall be established asynchronously."
>> http://www.opengroup.org/onlinepubs/009695399/functions/connect.html
>>
>> This must obviously be the case because the kernel cannot "rollback"
>> a TCP/IP connection. This is equivalent to the general case of
>> establishing an asynchronous connection (whose description follows in
>> the next paragraph of the linked text).
>>
>> So the correct code should be:
>>    if (connect(conn->sock, addr_cur->ai_addr,
>>                        addr_cur->ai_addrlen) < 0)
>>    {
>>      fd_set writefd;
>>      if (SOCK_ERRNO == EINTR) //pause until connection established
>>      {
>>          do
>>      {
>>          int ret;
>>              FD_ZERO(&writefd);
>>              FD_SET(conn->sock,&writefd);
>>              errno=0;
>>          ret=select(sock->conn+1,NULL,&writefd,NULL,NULL); //wait
>> forever
>>      }while((ret<0 && SOCK_ERRNO==EINTR) || ret<1)
>>      }
>>    } //now ready to write to socket
>>
>> Disclaimer: typed in Mail.app.
>>
>>
>> On Jun 6, 2005, at 8:19 PM, Bruce Momjian wrote:
>>
>>
>>>
>>> Would someone comment on this bug report from February?  I can
>>> confirm
>>> the code is unchanged and is in function fe-connect.c::PQconnectPoll
>>> ().
>>>
>>> --------------------------------------------------------------------
>>> --
>>> -----
>>>
>>> Florian Hars wrote:
>>>
>>>
>>>>
>>>> The following bug has been logged online:
>>>>
>>>> Bug reference:      1467
>>>> Logged by:          Florian Hars
>>>> Email address:      hars@bik-gmbh.de
>>>> PostgreSQL version: 8.0.1
>>>> Operating system:   All
>>>> Description:        fe_connect doesn't handle EINTR right
>>>> Details:
>>>>
>>>> The file pgsql/src/interfaces/libpq/fe-connect.c contains the code
>>>> fragment
>>>>
>>>> retry_connect:
>>>>   if (connect(conn->sock, addr_cur->ai_addr,
>>>>                       addr_cur->ai_addrlen) < 0)
>>>>   {
>>>>     if (SOCK_ERRNO == EINTR)
>>>>     /* Interrupted system call - just try again */
>>>>         goto retry_connect;
>>>>   }
>>>>
>>>> This is not in accordance with a strict legalistic reading of the
>>>> POSIX
>>>> spec, according to which connect is not restartable so that you
>>>> have to use
>>>> select or poll after connect returned with EINTR.
>>>>
>>>> See
>>>> http://www.eleves.ens.fr:8080/home/madore/computers/connect-
>>>> intr.html
>>>> for the ugly details, your code should work on Linux, but not on
>>>> Solaris or
>>>> (Free|Open)BSD.
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 4: Don't 'kill -9' the postmaster
>>>>
>>>>
>>>>
>>>
>>> --
>>>   Bruce Momjian                        |  http://candle.pha.pa.us
>>>   pgman@candle.pha.pa.us               |  (610) 359-1001
>>>   +  If your life is a hard drive,     |  13 Roberts Road
>>>   +  Christ can be your backup.        |  Newtown Square,
>>> Pennsylvania 19073
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 8: explain analyze is your friend
>>>
>>>
>>
>> |-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
>> AgentM
>> agentm@themactionfaction.com
>> |-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
>>
>>
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square,
> Pennsylvania 19073
>

|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
AgentM
agentm@themactionfaction.com
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-


Attachment

Re: BUG #1467: fe_connect doesn't handle EINTR right

From
Tom Lane
Date:
AgentM <agentm@themactionfaction.com> writes:
> Attached is a patch which corrects the behavior. I verified that the
> patch does not interfere with normal operation (using psql) but
> unfortunately the code path is virtually impossible to test without a
> really slow connection to a postgresql server [which I thankfully
> don't have].

I'm still looking for some demonstration (not an unsupported assertion)
that there's an issue here.  A patch you cannot test doesn't impress
me at all --- what are the odds that it makes things worse not better?

            regards, tom lane

Re: BUG #1467: fe_connect doesn't handle EINTR right

From
"Andrew Dunstan"
Date:
AgentM said:
> Attached is a patch which corrects the behavior. I verified that the
> patch does not interfere with normal operation (using psql) but
> unfortunately the code path is virtually impossible to test without a
> really slow connection to a postgresql server [which I thankfully
> don't have]. To test the patch, you would need to send an interrupt
> at the exact time that the kernel is connect()ing in blocking mode-
> good luck.
>
> Also, I recommend removing a (sarcastic?) comment left by a previous
> developer- I wrote a note about in my patch.
>
> The patch is against 8.0.3 [because HEAD requires access to a
> specific version of bison] but I imagine that the code hasn't changed
> in fe-connect.c since then.
>
> Patch against src/interfaces/libpq/fe-connect.c (v 8.0.3)

It's not even legal C89 C - please don't use // style comments.

cheers

andrew



Re: BUG #1467: fe_connect doesn't handle EINTR right

From
Bruce Momjian
Date:
Tom Lane wrote:
> AgentM <agentm@themactionfaction.com> writes:
> > Attached is a patch which corrects the behavior. I verified that the
> > patch does not interfere with normal operation (using psql) but
> > unfortunately the code path is virtually impossible to test without a
> > really slow connection to a postgresql server [which I thankfully
> > don't have].
>
> I'm still looking for some demonstration (not an unsupported assertion)
> that there's an issue here.  A patch you cannot test doesn't impress
> me at all --- what are the odds that it makes things worse not better?

Well, we have a documented case that we are not following the API.  In
that case, I don't consider it necessary for someone to provide a
reproducable failure (it might be quite rare and therefore hard to
demostrate).  It is enough we are not following the API and need to fix
our code.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: BUG #1467: fe_connect doesn't handle EINTR right

From
Tom Lane
Date:
AgentM <agentm@themactionfaction.com> writes:
> Attached is a patch which corrects the behavior. I verified that the
> patch does not interfere with normal operation (using psql) but
> unfortunately the code path is virtually impossible to test without a
> really slow connection to a postgresql server [which I thankfully
> don't have]. To test the patch, you would need to send an interrupt
> at the exact time that the kernel is connect()ing in blocking mode-
> good luck.

I've applied a simplified version of this patch --- there is no need to
duplicate the functionality that the calling level must have anyway.

            regards, tom lane

Index: fe-connect.c
===================================================================
RCS file: /cvsroot/pgsql/src/interfaces/libpq/fe-connect.c,v
retrieving revision 1.316
retrieving revision 1.317
diff -c -r1.316 -r1.317
*** fe-connect.c    9 Aug 2005 05:14:26 -0000    1.316
--- fe-connect.c    11 Aug 2005 22:53:41 -0000    1.317
***************
*** 1082,1096 ****
                       * since we are in nonblock mode.  If it does, well,
                       * too bad.
                       */
-             retry_connect:
                      if (connect(conn->sock, addr_cur->ai_addr,
                                  addr_cur->ai_addrlen) < 0)
                      {
-                         if (SOCK_ERRNO == EINTR)
-                             /* Interrupted system call - just try again */
-                             goto retry_connect;
                          if (SOCK_ERRNO == EINPROGRESS ||
                              SOCK_ERRNO == EWOULDBLOCK ||
                              SOCK_ERRNO == 0)
                          {
                              /*
--- 1082,1093 ----
                       * since we are in nonblock mode.  If it does, well,
                       * too bad.
                       */
                      if (connect(conn->sock, addr_cur->ai_addr,
                                  addr_cur->ai_addrlen) < 0)
                      {
                          if (SOCK_ERRNO == EINPROGRESS ||
                              SOCK_ERRNO == EWOULDBLOCK ||
+                             SOCK_ERRNO == EINTR ||
                              SOCK_ERRNO == 0)
                          {
                              /*