Re: [PATCHES] [patch] fe-connect.c doesn't handle EINTR correctly - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [PATCHES] [patch] fe-connect.c doesn't handle EINTR correctly
Date
Msg-id 200204140454.g3E4skw02153@candle.pha.pa.us
Whole thread Raw
In response to [patch] fe-connect.c doesn't handle EINTR correctly  (David Ford <david+cert@blue-labs.org>)
Responses Re: [PATCHES] [patch] fe-connect.c doesn't handle EINTR correctly
List pgsql-hackers
David, sorry you patch didn't make it into 7.2.X.  That whole EINTR
discussion was quite complicated so I am not surprised we missed it.

The attached patch implements your ENITR test in cases that seems to
need it.  I have followed the method we used for ENITRY in fe-misc.c.


---------------------------------------------------------------------------

David Ford wrote:
> Last year we had a drawn out discussion about this and I created a patch
> for it.  I never noticed that the patch didn't go in until I installed
> 7.2 the other day and realised that fe-connect.c never was fixed.
>
> Here is the patch again.  It is against CVS 3/16/2002.  This time I only
> rewrote the connect procedure at line 912, I leave it up to the regular
> hackers to copy it's functionality to the SSL procedure just below it.
>
> In summary, if a software writer implements timer events or other events
> which generate a signal with a timing fast enough to occur while libpq
> is inside connect(), then connect returns -EINTR.  The code following
> the connect call does not handle this and generates an error message.
>  The sum result is that the pg_connect() fails.  If the timer or other
> event is right on the window of the connect() completion time, the
> pg_connect() may appear to work sporadically.  If the event is too slow,
> pg_connect() will appear to always work and if the event is too fast,
> pg_connect() will always fail.
>
> David
>

> Index: src/interfaces/libpq/fe-connect.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/fe-connect.c,v
> retrieving revision 1.181
> diff -u -r1.181 fe-connect.c
> --- src/interfaces/libpq/fe-connect.c    2001/11/11 02:09:05    1.181
> +++ src/interfaces/libpq/fe-connect.c    2002/03/16 05:17:47
> @@ -909,29 +909,48 @@
>       * Thus, we have to make arrangements for all eventualities.
>       * ----------
>       */
> -    if (connect(conn->sock, &conn->raddr.sa, conn->raddr_len) < 0)
> -    {
> -        if (SOCK_ERRNO == EINPROGRESS || SOCK_ERRNO == EWOULDBLOCK || SOCK_ERRNO == 0)
> -        {
> -            /*
> -             * This is fine - we're in non-blocking mode, and the
> -             * connection is in progress.
> -             */
> -            conn->status = CONNECTION_STARTED;
> -        }
> -        else
> -        {
> -            /* Something's gone wrong */
> -            connectFailureMessage(conn, SOCK_ERRNO);
> -            goto connect_errReturn;
> +    do {
> +        int e;
> +        e=connect(conn->sock, &conn->raddr.sa, conn->raddr_len)
> +
> +        if(e < 0) {
> +            switch (e) {
> +                case EINTR:
> +                    /*
> +                     * Interrupted by a signal, keep trying.  This handling is
> +                     * required because the user may have turned on signals in
> +                     * his program.  Previously, libpq would erronously fail to
> +                     * connect if the user's timer event fired and interrupted
> +                     * this syscall.  It is important that we don't try to sleep
> +                     * here because this may cause havoc with the user program.
> +                     */
> +                    continue;
> +                    break;
> +                case 0:
> +                case EINPROGRESS:
> +                case EWOULDBLOCK:
> +                    /*
> +                     * This is fine - we're in non-blocking mode, and the
> +                     * connection is in progress.
> +                     */
> +                    conn->status = CONNECTION_STARTED;
> +                    break;
> +                default:
> +                    /* Something's gone wrong */
> +                    connectFailureMessage(conn, SOCK_ERRNO);
> +                    goto connect_errReturn;
> +                    break;
> +            }
> +        } else {
> +            /* We're connected now */
> +            conn->status = CONNECTION_MADE;
>          }
> -    }
> -    else
> -    {
> -        /* We're connected already */
> -        conn->status = CONNECTION_MADE;
> -    }
> +
> +        if(conn->status == CONNECTION_STARTED || conn->status == CONNECTION_MADE)
> +            break;
>
> +    } while(1);
> +
>  #ifdef USE_SSL
>      /* Attempt to negotiate SSL usage */
>      if (conn->allow_ssl_try)

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: src/interfaces/libpq/fe-connect.c
===================================================================
RCS file: /cvsroot/pgsql/src/interfaces/libpq/fe-connect.c,v
retrieving revision 1.182
diff -c -r1.182 fe-connect.c
*** src/interfaces/libpq/fe-connect.c    2 Mar 2002 00:49:22 -0000    1.182
--- src/interfaces/libpq/fe-connect.c    14 Apr 2002 04:40:24 -0000
***************
*** 913,920 ****
--- 913,925 ----
       * Thus, we have to make arrangements for all eventualities.
       * ----------
       */
+ retry1:
      if (connect(conn->sock, &conn->raddr.sa, conn->raddr_len) < 0)
      {
+         if (SOCK_ERRNO == EINTR)
+             /* Interrupted system call - we'll just try again */
+             goto retry1;
+
          if (SOCK_ERRNO == EINPROGRESS || SOCK_ERRNO == EWOULDBLOCK || SOCK_ERRNO == 0)
          {
              /*
***************
*** 949,957 ****
--- 954,967 ----
                                SOCK_STRERROR(SOCK_ERRNO));
              goto connect_errReturn;
          }
+ retry2:
          /* Now receive the postmasters response */
          if (recv(conn->sock, &SSLok, 1, 0) != 1)
          {
+             if (SOCK_ERRNO == EINTR)
+                 /* Interrupted system call - we'll just try again */
+                 goto retry2;
+
              printfPQExpBuffer(&conn->errorMessage,
                                libpq_gettext("could not receive server response to SSL negotiation packet: %s\n"),
                                SOCK_STRERROR(SOCK_ERRNO));
***************
*** 2132,2139 ****
--- 2142,2153 ----
                 "PQrequestCancel() -- socket() failed: ");
          goto cancel_errReturn;
      }
+ retry3:
      if (connect(tmpsock, &conn->raddr.sa, conn->raddr_len) < 0)
      {
+         if (SOCK_ERRNO == EINTR)
+             /* Interrupted system call - we'll just try again */
+             goto retry3;
          strcpy(conn->errorMessage.data,
                 "PQrequestCancel() -- connect() failed: ");
          goto cancel_errReturn;
***************
*** 2150,2157 ****
--- 2164,2175 ----
      crp.cp.backendPID = htonl(conn->be_pid);
      crp.cp.cancelAuthCode = htonl(conn->be_key);

+ retry4:
      if (send(tmpsock, (char *) &crp, sizeof(crp), 0) != (int) sizeof(crp))
      {
+         if (SOCK_ERRNO == EINTR)
+             /* Interrupted system call - we'll just try again */
+             goto retry4;
          strcpy(conn->errorMessage.data,
                 "PQrequestCancel() -- send() failed: ");
          goto cancel_errReturn;
Index: src/interfaces/libpq/fe-misc.c
===================================================================
RCS file: /cvsroot/pgsql/src/interfaces/libpq/fe-misc.c,v
retrieving revision 1.68
diff -c -r1.68 fe-misc.c
*** src/interfaces/libpq/fe-misc.c    6 Mar 2002 06:10:42 -0000    1.68
--- src/interfaces/libpq/fe-misc.c    14 Apr 2002 04:40:25 -0000
***************
*** 361,367 ****
      if (!conn || conn->sock < 0)
          return -1;

! retry:
      FD_ZERO(&input_mask);
      FD_SET(conn->sock, &input_mask);
      timeout.tv_sec = 0;
--- 361,367 ----
      if (!conn || conn->sock < 0)
          return -1;

! retry1:
      FD_ZERO(&input_mask);
      FD_SET(conn->sock, &input_mask);
      timeout.tv_sec = 0;
***************
*** 371,377 ****
      {
          if (SOCK_ERRNO == EINTR)
              /* Interrupted system call - we'll just try again */
!             goto retry;

          printfPQExpBuffer(&conn->errorMessage,
                            libpq_gettext("select() failed: %s\n"),
--- 371,377 ----
      {
          if (SOCK_ERRNO == EINTR)
              /* Interrupted system call - we'll just try again */
!             goto retry1;

          printfPQExpBuffer(&conn->errorMessage,
                            libpq_gettext("select() failed: %s\n"),
***************
*** 395,401 ****
      if (!conn || conn->sock < 0)
          return -1;

! retry:
      FD_ZERO(&input_mask);
      FD_SET(conn->sock, &input_mask);
      timeout.tv_sec = 0;
--- 395,401 ----
      if (!conn || conn->sock < 0)
          return -1;

! retry2:
      FD_ZERO(&input_mask);
      FD_SET(conn->sock, &input_mask);
      timeout.tv_sec = 0;
***************
*** 405,411 ****
      {
          if (SOCK_ERRNO == EINTR)
              /* Interrupted system call - we'll just try again */
!             goto retry;

          printfPQExpBuffer(&conn->errorMessage,
                            libpq_gettext("select() failed: %s\n"),
--- 405,411 ----
      {
          if (SOCK_ERRNO == EINTR)
              /* Interrupted system call - we'll just try again */
!             goto retry2;

          printfPQExpBuffer(&conn->errorMessage,
                            libpq_gettext("select() failed: %s\n"),
***************
*** 478,484 ****
      }

      /* OK, try to read some data */
! tryAgain:
  #ifdef USE_SSL
      if (conn->ssl)
          nread = SSL_read(conn->ssl, conn->inBuffer + conn->inEnd,
--- 478,484 ----
      }

      /* OK, try to read some data */
! retry3:
  #ifdef USE_SSL
      if (conn->ssl)
          nread = SSL_read(conn->ssl, conn->inBuffer + conn->inEnd,
***************
*** 490,496 ****
      if (nread < 0)
      {
          if (SOCK_ERRNO == EINTR)
!             goto tryAgain;
          /* Some systems return EAGAIN/EWOULDBLOCK for no data */
  #ifdef EAGAIN
          if (SOCK_ERRNO == EAGAIN)
--- 490,496 ----
      if (nread < 0)
      {
          if (SOCK_ERRNO == EINTR)
!             goto retry3;
          /* Some systems return EAGAIN/EWOULDBLOCK for no data */
  #ifdef EAGAIN
          if (SOCK_ERRNO == EAGAIN)
***************
*** 531,537 ****
              (conn->inBufSize - conn->inEnd) >= 8192)
          {
              someread = 1;
!             goto tryAgain;
          }
          return 1;
      }
--- 531,537 ----
              (conn->inBufSize - conn->inEnd) >= 8192)
          {
              someread = 1;
!             goto retry3;
          }
          return 1;
      }
***************
*** 564,570 ****
       * Still not sure that it's EOF, because some data could have just
       * arrived.
       */
! tryAgain2:
  #ifdef USE_SSL
      if (conn->ssl)
          nread = SSL_read(conn->ssl, conn->inBuffer + conn->inEnd,
--- 564,570 ----
       * Still not sure that it's EOF, because some data could have just
       * arrived.
       */
! retry4:
  #ifdef USE_SSL
      if (conn->ssl)
          nread = SSL_read(conn->ssl, conn->inBuffer + conn->inEnd,
***************
*** 576,582 ****
      if (nread < 0)
      {
          if (SOCK_ERRNO == EINTR)
!             goto tryAgain2;
          /* Some systems return EAGAIN/EWOULDBLOCK for no data */
  #ifdef EAGAIN
          if (SOCK_ERRNO == EAGAIN)
--- 576,582 ----
      if (nread < 0)
      {
          if (SOCK_ERRNO == EINTR)
!             goto retry4;
          /* Some systems return EAGAIN/EWOULDBLOCK for no data */
  #ifdef EAGAIN
          if (SOCK_ERRNO == EAGAIN)
***************
*** 804,810 ****

      if (forRead || forWrite)
      {
! retry:
          FD_ZERO(&input_mask);
          FD_ZERO(&output_mask);
          FD_ZERO(&except_mask);
--- 804,810 ----

      if (forRead || forWrite)
      {
! retry5:
          FD_ZERO(&input_mask);
          FD_ZERO(&output_mask);
          FD_ZERO(&except_mask);
***************
*** 817,823 ****
                     (struct timeval *) NULL) < 0)
          {
              if (SOCK_ERRNO == EINTR)
!                 goto retry;
              printfPQExpBuffer(&conn->errorMessage,
                                libpq_gettext("select() failed: %s\n"),
                                SOCK_STRERROR(SOCK_ERRNO));
--- 817,823 ----
                     (struct timeval *) NULL) < 0)
          {
              if (SOCK_ERRNO == EINTR)
!                 goto retry5;
              printfPQExpBuffer(&conn->errorMessage,
                                libpq_gettext("select() failed: %s\n"),
                                SOCK_STRERROR(SOCK_ERRNO));

pgsql-hackers by date:

Previous
From: Neil Conway
Date:
Subject: Re: experimental pg_qcache patch
Next
From: Bruce Momjian
Date:
Subject: Re: [PATCHES] WITH DELIMITERS in COPY