Re: [PATCHES] [patch] fe-connect.c doesn't handle EINTR correctly - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [PATCHES] [patch] fe-connect.c doesn't handle EINTR correctly
Date
Msg-id 200204152334.g3FNYEO05629@candle.pha.pa.us
Whole thread Raw
In response to Re: [PATCHES] [patch] fe-connect.c doesn't handle EINTR correctly  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
Fix applied.  Thanks.

---------------------------------------------------------------------------

Bruce Momjian wrote:
>
> David, sorry you patch didn't make it into 7.2.X.  That whole EINTR
> discussion was quite complicated so I am not surprised we missed it.
>
> The attached patch implements your ENITR test in cases that seems to
> need it.  I have followed the method we used for ENITRY in fe-misc.c.
>
>
> ---------------------------------------------------------------------------
>
> David Ford wrote:
> > Last year we had a drawn out discussion about this and I created a patch
> > for it.  I never noticed that the patch didn't go in until I installed
> > 7.2 the other day and realised that fe-connect.c never was fixed.
> >
> > Here is the patch again.  It is against CVS 3/16/2002.  This time I only
> > rewrote the connect procedure at line 912, I leave it up to the regular
> > hackers to copy it's functionality to the SSL procedure just below it.
> >
> > In summary, if a software writer implements timer events or other events
> > which generate a signal with a timing fast enough to occur while libpq
> > is inside connect(), then connect returns -EINTR.  The code following
> > the connect call does not handle this and generates an error message.
> >  The sum result is that the pg_connect() fails.  If the timer or other
> > event is right on the window of the connect() completion time, the
> > pg_connect() may appear to work sporadically.  If the event is too slow,
> > pg_connect() will appear to always work and if the event is too fast,
> > pg_connect() will always fail.
> >
> > David
> >
>
> > Index: src/interfaces/libpq/fe-connect.c
> > ===================================================================
> > RCS file: /projects/cvsroot/pgsql/src/interfaces/libpq/fe-connect.c,v
> > retrieving revision 1.181
> > diff -u -r1.181 fe-connect.c
> > --- src/interfaces/libpq/fe-connect.c    2001/11/11 02:09:05    1.181
> > +++ src/interfaces/libpq/fe-connect.c    2002/03/16 05:17:47
> > @@ -909,29 +909,48 @@
> >       * Thus, we have to make arrangements for all eventualities.
> >       * ----------
> >       */
> > -    if (connect(conn->sock, &conn->raddr.sa, conn->raddr_len) < 0)
> > -    {
> > -        if (SOCK_ERRNO == EINPROGRESS || SOCK_ERRNO == EWOULDBLOCK || SOCK_ERRNO == 0)
> > -        {
> > -            /*
> > -             * This is fine - we're in non-blocking mode, and the
> > -             * connection is in progress.
> > -             */
> > -            conn->status = CONNECTION_STARTED;
> > -        }
> > -        else
> > -        {
> > -            /* Something's gone wrong */
> > -            connectFailureMessage(conn, SOCK_ERRNO);
> > -            goto connect_errReturn;
> > +    do {
> > +        int e;
> > +        e=connect(conn->sock, &conn->raddr.sa, conn->raddr_len)
> > +
> > +        if(e < 0) {
> > +            switch (e) {
> > +                case EINTR:
> > +                    /*
> > +                     * Interrupted by a signal, keep trying.  This handling is
> > +                     * required because the user may have turned on signals in
> > +                     * his program.  Previously, libpq would erronously fail to
> > +                     * connect if the user's timer event fired and interrupted
> > +                     * this syscall.  It is important that we don't try to sleep
> > +                     * here because this may cause havoc with the user program.
> > +                     */
> > +                    continue;
> > +                    break;
> > +                case 0:
> > +                case EINPROGRESS:
> > +                case EWOULDBLOCK:
> > +                    /*
> > +                     * This is fine - we're in non-blocking mode, and the
> > +                     * connection is in progress.
> > +                     */
> > +                    conn->status = CONNECTION_STARTED;
> > +                    break;
> > +                default:
> > +                    /* Something's gone wrong */
> > +                    connectFailureMessage(conn, SOCK_ERRNO);
> > +                    goto connect_errReturn;
> > +                    break;
> > +            }
> > +        } else {
> > +            /* We're connected now */
> > +            conn->status = CONNECTION_MADE;
> >          }
> > -    }
> > -    else
> > -    {
> > -        /* We're connected already */
> > -        conn->status = CONNECTION_MADE;
> > -    }
> > +
> > +        if(conn->status == CONNECTION_STARTED || conn->status == CONNECTION_MADE)
> > +            break;
> >
> > +    } while(1);
> > +
> >  #ifdef USE_SSL
> >      /* Attempt to negotiate SSL usage */
> >      if (conn->allow_ssl_try)
>
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> > http://www.postgresql.org/users-lounge/docs/faq.html
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

> Index: src/interfaces/libpq/fe-connect.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/interfaces/libpq/fe-connect.c,v
> retrieving revision 1.182
> diff -c -r1.182 fe-connect.c
> *** src/interfaces/libpq/fe-connect.c    2 Mar 2002 00:49:22 -0000    1.182
> --- src/interfaces/libpq/fe-connect.c    14 Apr 2002 04:40:24 -0000
> ***************
> *** 913,920 ****
> --- 913,925 ----
>        * Thus, we have to make arrangements for all eventualities.
>        * ----------
>        */
> + retry1:
>       if (connect(conn->sock, &conn->raddr.sa, conn->raddr_len) < 0)
>       {
> +         if (SOCK_ERRNO == EINTR)
> +             /* Interrupted system call - we'll just try again */
> +             goto retry1;
> +
>           if (SOCK_ERRNO == EINPROGRESS || SOCK_ERRNO == EWOULDBLOCK || SOCK_ERRNO == 0)
>           {
>               /*
> ***************
> *** 949,957 ****
> --- 954,967 ----
>                                 SOCK_STRERROR(SOCK_ERRNO));
>               goto connect_errReturn;
>           }
> + retry2:
>           /* Now receive the postmasters response */
>           if (recv(conn->sock, &SSLok, 1, 0) != 1)
>           {
> +             if (SOCK_ERRNO == EINTR)
> +                 /* Interrupted system call - we'll just try again */
> +                 goto retry2;
> +
>               printfPQExpBuffer(&conn->errorMessage,
>                                 libpq_gettext("could not receive server response to SSL negotiation packet: %s\n"),
>                                 SOCK_STRERROR(SOCK_ERRNO));
> ***************
> *** 2132,2139 ****
> --- 2142,2153 ----
>                  "PQrequestCancel() -- socket() failed: ");
>           goto cancel_errReturn;
>       }
> + retry3:
>       if (connect(tmpsock, &conn->raddr.sa, conn->raddr_len) < 0)
>       {
> +         if (SOCK_ERRNO == EINTR)
> +             /* Interrupted system call - we'll just try again */
> +             goto retry3;
>           strcpy(conn->errorMessage.data,
>                  "PQrequestCancel() -- connect() failed: ");
>           goto cancel_errReturn;
> ***************
> *** 2150,2157 ****
> --- 2164,2175 ----
>       crp.cp.backendPID = htonl(conn->be_pid);
>       crp.cp.cancelAuthCode = htonl(conn->be_key);
>
> + retry4:
>       if (send(tmpsock, (char *) &crp, sizeof(crp), 0) != (int) sizeof(crp))
>       {
> +         if (SOCK_ERRNO == EINTR)
> +             /* Interrupted system call - we'll just try again */
> +             goto retry4;
>           strcpy(conn->errorMessage.data,
>                  "PQrequestCancel() -- send() failed: ");
>           goto cancel_errReturn;
> Index: src/interfaces/libpq/fe-misc.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/interfaces/libpq/fe-misc.c,v
> retrieving revision 1.68
> diff -c -r1.68 fe-misc.c
> *** src/interfaces/libpq/fe-misc.c    6 Mar 2002 06:10:42 -0000    1.68
> --- src/interfaces/libpq/fe-misc.c    14 Apr 2002 04:40:25 -0000
> ***************
> *** 361,367 ****
>       if (!conn || conn->sock < 0)
>           return -1;
>
> ! retry:
>       FD_ZERO(&input_mask);
>       FD_SET(conn->sock, &input_mask);
>       timeout.tv_sec = 0;
> --- 361,367 ----
>       if (!conn || conn->sock < 0)
>           return -1;
>
> ! retry1:
>       FD_ZERO(&input_mask);
>       FD_SET(conn->sock, &input_mask);
>       timeout.tv_sec = 0;
> ***************
> *** 371,377 ****
>       {
>           if (SOCK_ERRNO == EINTR)
>               /* Interrupted system call - we'll just try again */
> !             goto retry;
>
>           printfPQExpBuffer(&conn->errorMessage,
>                             libpq_gettext("select() failed: %s\n"),
> --- 371,377 ----
>       {
>           if (SOCK_ERRNO == EINTR)
>               /* Interrupted system call - we'll just try again */
> !             goto retry1;
>
>           printfPQExpBuffer(&conn->errorMessage,
>                             libpq_gettext("select() failed: %s\n"),
> ***************
> *** 395,401 ****
>       if (!conn || conn->sock < 0)
>           return -1;
>
> ! retry:
>       FD_ZERO(&input_mask);
>       FD_SET(conn->sock, &input_mask);
>       timeout.tv_sec = 0;
> --- 395,401 ----
>       if (!conn || conn->sock < 0)
>           return -1;
>
> ! retry2:
>       FD_ZERO(&input_mask);
>       FD_SET(conn->sock, &input_mask);
>       timeout.tv_sec = 0;
> ***************
> *** 405,411 ****
>       {
>           if (SOCK_ERRNO == EINTR)
>               /* Interrupted system call - we'll just try again */
> !             goto retry;
>
>           printfPQExpBuffer(&conn->errorMessage,
>                             libpq_gettext("select() failed: %s\n"),
> --- 405,411 ----
>       {
>           if (SOCK_ERRNO == EINTR)
>               /* Interrupted system call - we'll just try again */
> !             goto retry2;
>
>           printfPQExpBuffer(&conn->errorMessage,
>                             libpq_gettext("select() failed: %s\n"),
> ***************
> *** 478,484 ****
>       }
>
>       /* OK, try to read some data */
> ! tryAgain:
>   #ifdef USE_SSL
>       if (conn->ssl)
>           nread = SSL_read(conn->ssl, conn->inBuffer + conn->inEnd,
> --- 478,484 ----
>       }
>
>       /* OK, try to read some data */
> ! retry3:
>   #ifdef USE_SSL
>       if (conn->ssl)
>           nread = SSL_read(conn->ssl, conn->inBuffer + conn->inEnd,
> ***************
> *** 490,496 ****
>       if (nread < 0)
>       {
>           if (SOCK_ERRNO == EINTR)
> !             goto tryAgain;
>           /* Some systems return EAGAIN/EWOULDBLOCK for no data */
>   #ifdef EAGAIN
>           if (SOCK_ERRNO == EAGAIN)
> --- 490,496 ----
>       if (nread < 0)
>       {
>           if (SOCK_ERRNO == EINTR)
> !             goto retry3;
>           /* Some systems return EAGAIN/EWOULDBLOCK for no data */
>   #ifdef EAGAIN
>           if (SOCK_ERRNO == EAGAIN)
> ***************
> *** 531,537 ****
>               (conn->inBufSize - conn->inEnd) >= 8192)
>           {
>               someread = 1;
> !             goto tryAgain;
>           }
>           return 1;
>       }
> --- 531,537 ----
>               (conn->inBufSize - conn->inEnd) >= 8192)
>           {
>               someread = 1;
> !             goto retry3;
>           }
>           return 1;
>       }
> ***************
> *** 564,570 ****
>        * Still not sure that it's EOF, because some data could have just
>        * arrived.
>        */
> ! tryAgain2:
>   #ifdef USE_SSL
>       if (conn->ssl)
>           nread = SSL_read(conn->ssl, conn->inBuffer + conn->inEnd,
> --- 564,570 ----
>        * Still not sure that it's EOF, because some data could have just
>        * arrived.
>        */
> ! retry4:
>   #ifdef USE_SSL
>       if (conn->ssl)
>           nread = SSL_read(conn->ssl, conn->inBuffer + conn->inEnd,
> ***************
> *** 576,582 ****
>       if (nread < 0)
>       {
>           if (SOCK_ERRNO == EINTR)
> !             goto tryAgain2;
>           /* Some systems return EAGAIN/EWOULDBLOCK for no data */
>   #ifdef EAGAIN
>           if (SOCK_ERRNO == EAGAIN)
> --- 576,582 ----
>       if (nread < 0)
>       {
>           if (SOCK_ERRNO == EINTR)
> !             goto retry4;
>           /* Some systems return EAGAIN/EWOULDBLOCK for no data */
>   #ifdef EAGAIN
>           if (SOCK_ERRNO == EAGAIN)
> ***************
> *** 804,810 ****
>
>       if (forRead || forWrite)
>       {
> ! retry:
>           FD_ZERO(&input_mask);
>           FD_ZERO(&output_mask);
>           FD_ZERO(&except_mask);
> --- 804,810 ----
>
>       if (forRead || forWrite)
>       {
> ! retry5:
>           FD_ZERO(&input_mask);
>           FD_ZERO(&output_mask);
>           FD_ZERO(&except_mask);
> ***************
> *** 817,823 ****
>                      (struct timeval *) NULL) < 0)
>           {
>               if (SOCK_ERRNO == EINTR)
> !                 goto retry;
>               printfPQExpBuffer(&conn->errorMessage,
>                                 libpq_gettext("select() failed: %s\n"),
>                                 SOCK_STRERROR(SOCK_ERRNO));
> --- 817,823 ----
>                      (struct timeval *) NULL) < 0)
>           {
>               if (SOCK_ERRNO == EINTR)
> !                 goto retry5;
>               printfPQExpBuffer(&conn->errorMessage,
>                                 libpq_gettext("select() failed: %s\n"),
>                                 SOCK_STRERROR(SOCK_ERRNO));

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: regression in CVS HEAD
Next
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] Notify argument?