Thread: Re: [HACKERS] listening addresses

Re: [HACKERS] listening addresses

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>
>
>>A small problem with it was reported to me a couple of days ago - user
>>had firewalled off all IP6 traffic. The stats collector happily bound
>>and connected to the socket, but all the packets fell in the bit bucket.
>>They found it quite hard to diagnose the problem.
>>
>>
>
>
>
>>Possible solutions that occurred to me:
>>. an initial "hello"-"yes i'm here" exchange to validate the address
>>
>>
>
>That one seems reasonable to me.  Seems like it would take just a few
>more lines of code in the loop that tries to find a working socket to
>check that we can send a byte and receive it.  You'd have to be careful
>not to block if the socket is indeed not working ... also, is it safe to
>assume that a byte sent with send() is *immediately* ready to recv()?
>
>
>

This patch attempts to implement the idea, with safety in case the
packet is not immediately available.

comments welcome

cheers

andrew
Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/postmaster/pgstat.c,v
retrieving revision 1.61
diff -c -r1.61 pgstat.c
*** src/backend/postmaster/pgstat.c    15 Mar 2004 16:21:37 -0000    1.61
--- src/backend/postmaster/pgstat.c    21 Mar 2004 16:41:23 -0000
***************
*** 191,196 ****
--- 191,199 ----
                 *addr,
                  hints;
      int            ret;
+     fd_set      rset;
+     struct timeval tv;
+     char        test_byte;

      /*
       * Force start of collector daemon if something to collect
***************
*** 307,312 ****
--- 310,351 ----
              pgStatSock = -1;
              continue;
          }
+
+ #define TESTBYTEVAL 99
+
+         /* try to send and receive a test byte on the socket */
+
+         FD_ZERO(&rset);
+         FD_SET(pgStatSock, &rset);
+         tv.tv_sec = 0;
+         tv.tv_usec = 500000;
+         test_byte = TESTBYTEVAL;
+         send(pgStatSock,&test_byte,1,0);
+         test_byte++;
+         if(select(pgStatSock+1,&rset,NULL,NULL,&tv) > 0 &&
+            FD_ISSET(pgStatSock,&rset))
+         {
+             recv(pgStatSock,&test_byte,1,0);
+             if(test_byte != TESTBYTEVAL)
+             {
+                 ereport(LOG,
+                         (errcode_for_socket_access(),
+                          errmsg("incorrect test value on send/receive on socket for statistics collector: %m")));
+             closesocket(pgStatSock);
+             pgStatSock = -1;
+             continue;
+             }
+         }
+         else
+         {
+             ereport(LOG,
+                     (errcode_for_socket_access(),
+                      errmsg("could send/receive on socket for statistics collector: %m")));
+             closesocket(pgStatSock);
+             pgStatSock = -1;
+             continue;
+         }
+

          /* If we get here, we have a working socket */
          break;

Re: [HACKERS] listening addresses

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> This patch attempts to implement the idea, with safety in case the
> packet is not immediately available.

Seems like you ought to be testing for failure returns from send() and
recv().  Also, what of EINTR from select()?

            regards, tom lane

Re: [HACKERS] listening addresses

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>
>
>>This patch attempts to implement the idea, with safety in case the
>>packet is not immediately available.
>>
>>
>
>Seems like you ought to be testing for failure returns from send() and
>recv().
>

Good point. will do.

>Also, what of EINTR from select()?
>
>
>

It will fail. Not sure what else it should do - I'm open to suggestions.
Retry?

cheers

andrew


Re: [HACKERS] listening addresses

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> Also, what of EINTR from select()?

> It will fail. Not sure what else it should do - I'm open to suggestions.
> Retry?

Yes.  AFAIR, all our other select calls will just loop indefinitely on
EINTR.

            regards, tom lane

Re: [HACKERS] listening addresses

From
Andrew Dunstan
Date:

Andrew Dunstan wrote:

>
>
> Tom Lane wrote:
>
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>
>>
>>> A small problem with it was reported to me a couple of days ago -
>>> user had firewalled off all IP6 traffic. The stats collector happily
>>> bound and connected to the socket, but all the packets fell in the
>>> bit bucket. They found it quite hard to diagnose the problem.
>>>
>>
>>
>>
>>
>>> Possible solutions that occurred to me:
>>> . an initial "hello"-"yes i'm here" exchange to validate the address
>>>
>>
>>
>> That one seems reasonable to me.  Seems like it would take just a few
>> more lines of code in the loop that tries to find a working socket to
>> check that we can send a byte and receive it.  You'd have to be careful
>> not to block if the socket is indeed not working ... also, is it safe to
>> assume that a byte sent with send() is *immediately* ready to recv()?
>>
>>
>>
>

Revised patch attached. I think this is about as much trouble as this
problem is worth ;-)

cheers

andrew
Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/postmaster/pgstat.c,v
retrieving revision 1.61
diff -c -r1.61 pgstat.c
*** src/backend/postmaster/pgstat.c    15 Mar 2004 16:21:37 -0000    1.61
--- src/backend/postmaster/pgstat.c    22 Mar 2004 12:47:54 -0000
***************
*** 191,196 ****
--- 191,200 ----
                 *addr,
                  hints;
      int            ret;
+     fd_set      rset;
+     struct timeval tv;
+     char        test_byte;
+     int         sel_res;

      /*
       * Force start of collector daemon if something to collect
***************
*** 307,312 ****
--- 311,365 ----
              pgStatSock = -1;
              continue;
          }
+
+ #define TESTBYTEVAL 99
+
+         /* try to send and receive a test byte on the socket */
+
+         test_byte = TESTBYTEVAL;
+         if (send(pgStatSock,&test_byte,1,0) != 1)
+         {
+             ereport(LOG,
+                     (errcode_for_socket_access(),
+                      errmsg("test byte send failure socket for statistics collector: %m")));
+             closesocket(pgStatSock);
+             pgStatSock = -1;
+             continue;
+         }
+         test_byte++;
+         for(;;)
+         {
+             FD_ZERO(&rset);
+             FD_SET(pgStatSock, &rset);
+             tv.tv_sec = 0;
+             tv.tv_usec = 500000;
+             sel_res = select(pgStatSock+1,&rset,NULL,NULL,&tv);
+             if (sel_res != -1 || errno != EINTR)
+                 break;
+         }
+         if( sel_res > 0 && FD_ISSET(pgStatSock,&rset)
+             && recv(pgStatSock,&test_byte,1,0) == 1)
+         {
+             if(test_byte != TESTBYTEVAL)
+             {
+                 ereport(LOG,
+                         (errcode_for_socket_access(),
+                          errmsg("incorrect test value on send/receive on socket for statistics collector")));
+             closesocket(pgStatSock);
+             pgStatSock = -1;
+             continue;
+             }
+         }
+         else
+         {
+             ereport(LOG,
+                     (errcode_for_socket_access(),
+                      errmsg("could receive test byte on socket for statistics collector: %m")));
+             closesocket(pgStatSock);
+             pgStatSock = -1;
+             continue;
+         }
+

          /* If we get here, we have a working socket */
          break;

Re: [HACKERS] listening addresses

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
>> A small problem with it was reported to me a couple of days ago -
>> user had firewalled off all IP6 traffic. The stats collector happily
>> bound and connected to the socket, but all the packets fell in the
>> bit bucket. They found it quite hard to diagnose the problem.

> Revised patch attached. I think this is about as much trouble as this
> problem is worth ;-)

I thought the messages were a bit sloppy, which made the patch much less
useful than it should be: we are testing for a very specific failure
mode and we can give a very specific message.  Patch as-applied is
attached.

I don't have any real convenient way to set up a situation where this
failure can actually occur.  Anyone want to verify that the patch
acts as intended?

            regards, tom lane

*** src/backend/postmaster/pgstat.c.orig    Mon Mar 15 15:01:57 2004
--- src/backend/postmaster/pgstat.c    Mon Mar 22 18:55:29 2004
***************
*** 191,196 ****
--- 191,202 ----
                 *addr,
                  hints;
      int            ret;
+     fd_set      rset;
+     struct timeval tv;
+     char        test_byte;
+     int         sel_res;
+
+ #define TESTBYTEVAL ((char) 199)

      /*
       * Force start of collector daemon if something to collect
***************
*** 303,308 ****
--- 309,393 ----
              ereport(LOG,
                      (errcode_for_socket_access(),
                       errmsg("could not connect socket for statistics collector: %m")));
+             closesocket(pgStatSock);
+             pgStatSock = -1;
+             continue;
+         }
+
+         /*
+          * Try to send and receive a one-byte test message on the socket.
+          * This is to catch situations where the socket can be created but
+          * will not actually pass data (for instance, because kernel packet
+          * filtering rules prevent it).
+          */
+         test_byte = TESTBYTEVAL;
+         if (send(pgStatSock, &test_byte, 1, 0) != 1)
+         {
+             ereport(LOG,
+                     (errcode_for_socket_access(),
+                      errmsg("could not send test message on socket for statistics collector: %m")));
+             closesocket(pgStatSock);
+             pgStatSock = -1;
+             continue;
+         }
+
+         /*
+          * There could possibly be a little delay before the message can be
+          * received.  We arbitrarily allow up to half a second before deciding
+          * it's broken.
+          */
+         for (;;)                /* need a loop to handle EINTR */
+         {
+             FD_ZERO(&rset);
+             FD_SET(pgStatSock, &rset);
+             tv.tv_sec = 0;
+             tv.tv_usec = 500000;
+             sel_res = select(pgStatSock+1, &rset, NULL, NULL, &tv);
+             if (sel_res >= 0 || errno != EINTR)
+                 break;
+         }
+         if (sel_res < 0)
+         {
+             ereport(LOG,
+                     (errcode_for_socket_access(),
+                      errmsg("select() failed in statistics collector: %m")));
+             closesocket(pgStatSock);
+             pgStatSock = -1;
+             continue;
+         }
+         if (sel_res == 0 || !FD_ISSET(pgStatSock, &rset))
+         {
+             /*
+              * This is the case we actually think is likely, so take pains to
+              * give a specific message for it.
+              *
+              * errno will not be set meaningfully here, so don't use it.
+              */
+             ereport(LOG,
+                     (ERRCODE_CONNECTION_FAILURE,
+                      errmsg("test message did not get through on socket for statistics collector")));
+             closesocket(pgStatSock);
+             pgStatSock = -1;
+             continue;
+         }
+
+         test_byte++;            /* just make sure variable is changed */
+
+         if (recv(pgStatSock, &test_byte, 1, 0) != 1)
+         {
+             ereport(LOG,
+                     (errcode_for_socket_access(),
+                      errmsg("could not receive test message on socket for statistics collector: %m")));
+             closesocket(pgStatSock);
+             pgStatSock = -1;
+             continue;
+         }
+
+         if (test_byte != TESTBYTEVAL) /* strictly paranoia ... */
+         {
+             ereport(LOG,
+                     (ERRCODE_INTERNAL_ERROR,
+                      errmsg("incorrect test message transmission on socket for statistics collector")));
              closesocket(pgStatSock);
              pgStatSock = -1;
              continue;

Re: [HACKERS] listening addresses

From
Bruce Momjian
Date:
Patch applied by Tom.  Thanks.

---------------------------------------------------------------------------

Andrew Dunstan wrote:
>
>
> Andrew Dunstan wrote:
>
> >
> >
> > Tom Lane wrote:
> >
> >> Andrew Dunstan <andrew@dunslane.net> writes:
> >>
> >>
> >>> A small problem with it was reported to me a couple of days ago -
> >>> user had firewalled off all IP6 traffic. The stats collector happily
> >>> bound and connected to the socket, but all the packets fell in the
> >>> bit bucket. They found it quite hard to diagnose the problem.
> >>>
> >>
> >>
> >>
> >>
> >>> Possible solutions that occurred to me:
> >>> . an initial "hello"-"yes i'm here" exchange to validate the address
> >>>
> >>
> >>
> >> That one seems reasonable to me.  Seems like it would take just a few
> >> more lines of code in the loop that tries to find a working socket to
> >> check that we can send a byte and receive it.  You'd have to be careful
> >> not to block if the socket is indeed not working ... also, is it safe to
> >> assume that a byte sent with send() is *immediately* ready to recv()?
> >>
> >>
> >>
> >
>
> Revised patch attached. I think this is about as much trouble as this
> problem is worth ;-)
>
> cheers
>
> andrew

> Index: src/backend/postmaster/pgstat.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql-server/src/backend/postmaster/pgstat.c,v
> retrieving revision 1.61
> diff -c -r1.61 pgstat.c
> *** src/backend/postmaster/pgstat.c    15 Mar 2004 16:21:37 -0000    1.61
> --- src/backend/postmaster/pgstat.c    22 Mar 2004 12:47:54 -0000
> ***************
> *** 191,196 ****
> --- 191,200 ----
>                  *addr,
>                   hints;
>       int            ret;
> +     fd_set      rset;
> +     struct timeval tv;
> +     char        test_byte;
> +     int         sel_res;
>
>       /*
>        * Force start of collector daemon if something to collect
> ***************
> *** 307,312 ****
> --- 311,365 ----
>               pgStatSock = -1;
>               continue;
>           }
> +
> + #define TESTBYTEVAL 99
> +
> +         /* try to send and receive a test byte on the socket */
> +
> +         test_byte = TESTBYTEVAL;
> +         if (send(pgStatSock,&test_byte,1,0) != 1)
> +         {
> +             ereport(LOG,
> +                     (errcode_for_socket_access(),
> +                      errmsg("test byte send failure socket for statistics collector: %m")));
> +             closesocket(pgStatSock);
> +             pgStatSock = -1;
> +             continue;
> +         }
> +         test_byte++;
> +         for(;;)
> +         {
> +             FD_ZERO(&rset);
> +             FD_SET(pgStatSock, &rset);
> +             tv.tv_sec = 0;
> +             tv.tv_usec = 500000;
> +             sel_res = select(pgStatSock+1,&rset,NULL,NULL,&tv);
> +             if (sel_res != -1 || errno != EINTR)
> +                 break;
> +         }
> +         if( sel_res > 0 && FD_ISSET(pgStatSock,&rset)
> +             && recv(pgStatSock,&test_byte,1,0) == 1)
> +         {
> +             if(test_byte != TESTBYTEVAL)
> +             {
> +                 ereport(LOG,
> +                         (errcode_for_socket_access(),
> +                          errmsg("incorrect test value on send/receive on socket for statistics collector")));
> +             closesocket(pgStatSock);
> +             pgStatSock = -1;
> +             continue;
> +             }
> +         }
> +         else
> +         {
> +             ereport(LOG,
> +                     (errcode_for_socket_access(),
> +                      errmsg("could receive test byte on socket for statistics collector: %m")));
> +             closesocket(pgStatSock);
> +             pgStatSock = -1;
> +             continue;
> +         }
> +
>
>           /* If we get here, we have a working socket */
>           break;

>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073