Thread: Re: [HACKERS] listening addresses
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>A small problem with it was reported to me a couple of days ago - user >>had firewalled off all IP6 traffic. The stats collector happily bound >>and connected to the socket, but all the packets fell in the bit bucket. >>They found it quite hard to diagnose the problem. >> >> > > > >>Possible solutions that occurred to me: >>. an initial "hello"-"yes i'm here" exchange to validate the address >> >> > >That one seems reasonable to me. Seems like it would take just a few >more lines of code in the loop that tries to find a working socket to >check that we can send a byte and receive it. You'd have to be careful >not to block if the socket is indeed not working ... also, is it safe to >assume that a byte sent with send() is *immediately* ready to recv()? > > > This patch attempts to implement the idea, with safety in case the packet is not immediately available. comments welcome cheers andrew Index: src/backend/postmaster/pgstat.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/postmaster/pgstat.c,v retrieving revision 1.61 diff -c -r1.61 pgstat.c *** src/backend/postmaster/pgstat.c 15 Mar 2004 16:21:37 -0000 1.61 --- src/backend/postmaster/pgstat.c 21 Mar 2004 16:41:23 -0000 *************** *** 191,196 **** --- 191,199 ---- *addr, hints; int ret; + fd_set rset; + struct timeval tv; + char test_byte; /* * Force start of collector daemon if something to collect *************** *** 307,312 **** --- 310,351 ---- pgStatSock = -1; continue; } + + #define TESTBYTEVAL 99 + + /* try to send and receive a test byte on the socket */ + + FD_ZERO(&rset); + FD_SET(pgStatSock, &rset); + tv.tv_sec = 0; + tv.tv_usec = 500000; + test_byte = TESTBYTEVAL; + send(pgStatSock,&test_byte,1,0); + test_byte++; + if(select(pgStatSock+1,&rset,NULL,NULL,&tv) > 0 && + FD_ISSET(pgStatSock,&rset)) + { + recv(pgStatSock,&test_byte,1,0); + if(test_byte != TESTBYTEVAL) + { + ereport(LOG, + (errcode_for_socket_access(), + errmsg("incorrect test value on send/receive on socket for statistics collector: %m"))); + closesocket(pgStatSock); + pgStatSock = -1; + continue; + } + } + else + { + ereport(LOG, + (errcode_for_socket_access(), + errmsg("could send/receive on socket for statistics collector: %m"))); + closesocket(pgStatSock); + pgStatSock = -1; + continue; + } + /* If we get here, we have a working socket */ break;
Andrew Dunstan <andrew@dunslane.net> writes: > This patch attempts to implement the idea, with safety in case the > packet is not immediately available. Seems like you ought to be testing for failure returns from send() and recv(). Also, what of EINTR from select()? regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>This patch attempts to implement the idea, with safety in case the >>packet is not immediately available. >> >> > >Seems like you ought to be testing for failure returns from send() and >recv(). > Good point. will do. >Also, what of EINTR from select()? > > > It will fail. Not sure what else it should do - I'm open to suggestions. Retry? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> Also, what of EINTR from select()? > It will fail. Not sure what else it should do - I'm open to suggestions. > Retry? Yes. AFAIR, all our other select calls will just loop indefinitely on EINTR. regards, tom lane
Andrew Dunstan wrote: > > > Tom Lane wrote: > >> Andrew Dunstan <andrew@dunslane.net> writes: >> >> >>> A small problem with it was reported to me a couple of days ago - >>> user had firewalled off all IP6 traffic. The stats collector happily >>> bound and connected to the socket, but all the packets fell in the >>> bit bucket. They found it quite hard to diagnose the problem. >>> >> >> >> >> >>> Possible solutions that occurred to me: >>> . an initial "hello"-"yes i'm here" exchange to validate the address >>> >> >> >> That one seems reasonable to me. Seems like it would take just a few >> more lines of code in the loop that tries to find a working socket to >> check that we can send a byte and receive it. You'd have to be careful >> not to block if the socket is indeed not working ... also, is it safe to >> assume that a byte sent with send() is *immediately* ready to recv()? >> >> >> > Revised patch attached. I think this is about as much trouble as this problem is worth ;-) cheers andrew Index: src/backend/postmaster/pgstat.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/postmaster/pgstat.c,v retrieving revision 1.61 diff -c -r1.61 pgstat.c *** src/backend/postmaster/pgstat.c 15 Mar 2004 16:21:37 -0000 1.61 --- src/backend/postmaster/pgstat.c 22 Mar 2004 12:47:54 -0000 *************** *** 191,196 **** --- 191,200 ---- *addr, hints; int ret; + fd_set rset; + struct timeval tv; + char test_byte; + int sel_res; /* * Force start of collector daemon if something to collect *************** *** 307,312 **** --- 311,365 ---- pgStatSock = -1; continue; } + + #define TESTBYTEVAL 99 + + /* try to send and receive a test byte on the socket */ + + test_byte = TESTBYTEVAL; + if (send(pgStatSock,&test_byte,1,0) != 1) + { + ereport(LOG, + (errcode_for_socket_access(), + errmsg("test byte send failure socket for statistics collector: %m"))); + closesocket(pgStatSock); + pgStatSock = -1; + continue; + } + test_byte++; + for(;;) + { + FD_ZERO(&rset); + FD_SET(pgStatSock, &rset); + tv.tv_sec = 0; + tv.tv_usec = 500000; + sel_res = select(pgStatSock+1,&rset,NULL,NULL,&tv); + if (sel_res != -1 || errno != EINTR) + break; + } + if( sel_res > 0 && FD_ISSET(pgStatSock,&rset) + && recv(pgStatSock,&test_byte,1,0) == 1) + { + if(test_byte != TESTBYTEVAL) + { + ereport(LOG, + (errcode_for_socket_access(), + errmsg("incorrect test value on send/receive on socket for statistics collector"))); + closesocket(pgStatSock); + pgStatSock = -1; + continue; + } + } + else + { + ereport(LOG, + (errcode_for_socket_access(), + errmsg("could receive test byte on socket for statistics collector: %m"))); + closesocket(pgStatSock); + pgStatSock = -1; + continue; + } + /* If we get here, we have a working socket */ break;
Andrew Dunstan <andrew@dunslane.net> writes: >> A small problem with it was reported to me a couple of days ago - >> user had firewalled off all IP6 traffic. The stats collector happily >> bound and connected to the socket, but all the packets fell in the >> bit bucket. They found it quite hard to diagnose the problem. > Revised patch attached. I think this is about as much trouble as this > problem is worth ;-) I thought the messages were a bit sloppy, which made the patch much less useful than it should be: we are testing for a very specific failure mode and we can give a very specific message. Patch as-applied is attached. I don't have any real convenient way to set up a situation where this failure can actually occur. Anyone want to verify that the patch acts as intended? regards, tom lane *** src/backend/postmaster/pgstat.c.orig Mon Mar 15 15:01:57 2004 --- src/backend/postmaster/pgstat.c Mon Mar 22 18:55:29 2004 *************** *** 191,196 **** --- 191,202 ---- *addr, hints; int ret; + fd_set rset; + struct timeval tv; + char test_byte; + int sel_res; + + #define TESTBYTEVAL ((char) 199) /* * Force start of collector daemon if something to collect *************** *** 303,308 **** --- 309,393 ---- ereport(LOG, (errcode_for_socket_access(), errmsg("could not connect socket for statistics collector: %m"))); + closesocket(pgStatSock); + pgStatSock = -1; + continue; + } + + /* + * Try to send and receive a one-byte test message on the socket. + * This is to catch situations where the socket can be created but + * will not actually pass data (for instance, because kernel packet + * filtering rules prevent it). + */ + test_byte = TESTBYTEVAL; + if (send(pgStatSock, &test_byte, 1, 0) != 1) + { + ereport(LOG, + (errcode_for_socket_access(), + errmsg("could not send test message on socket for statistics collector: %m"))); + closesocket(pgStatSock); + pgStatSock = -1; + continue; + } + + /* + * There could possibly be a little delay before the message can be + * received. We arbitrarily allow up to half a second before deciding + * it's broken. + */ + for (;;) /* need a loop to handle EINTR */ + { + FD_ZERO(&rset); + FD_SET(pgStatSock, &rset); + tv.tv_sec = 0; + tv.tv_usec = 500000; + sel_res = select(pgStatSock+1, &rset, NULL, NULL, &tv); + if (sel_res >= 0 || errno != EINTR) + break; + } + if (sel_res < 0) + { + ereport(LOG, + (errcode_for_socket_access(), + errmsg("select() failed in statistics collector: %m"))); + closesocket(pgStatSock); + pgStatSock = -1; + continue; + } + if (sel_res == 0 || !FD_ISSET(pgStatSock, &rset)) + { + /* + * This is the case we actually think is likely, so take pains to + * give a specific message for it. + * + * errno will not be set meaningfully here, so don't use it. + */ + ereport(LOG, + (ERRCODE_CONNECTION_FAILURE, + errmsg("test message did not get through on socket for statistics collector"))); + closesocket(pgStatSock); + pgStatSock = -1; + continue; + } + + test_byte++; /* just make sure variable is changed */ + + if (recv(pgStatSock, &test_byte, 1, 0) != 1) + { + ereport(LOG, + (errcode_for_socket_access(), + errmsg("could not receive test message on socket for statistics collector: %m"))); + closesocket(pgStatSock); + pgStatSock = -1; + continue; + } + + if (test_byte != TESTBYTEVAL) /* strictly paranoia ... */ + { + ereport(LOG, + (ERRCODE_INTERNAL_ERROR, + errmsg("incorrect test message transmission on socket for statistics collector"))); closesocket(pgStatSock); pgStatSock = -1; continue;
Patch applied by Tom. Thanks. --------------------------------------------------------------------------- Andrew Dunstan wrote: > > > Andrew Dunstan wrote: > > > > > > > Tom Lane wrote: > > > >> Andrew Dunstan <andrew@dunslane.net> writes: > >> > >> > >>> A small problem with it was reported to me a couple of days ago - > >>> user had firewalled off all IP6 traffic. The stats collector happily > >>> bound and connected to the socket, but all the packets fell in the > >>> bit bucket. They found it quite hard to diagnose the problem. > >>> > >> > >> > >> > >> > >>> Possible solutions that occurred to me: > >>> . an initial "hello"-"yes i'm here" exchange to validate the address > >>> > >> > >> > >> That one seems reasonable to me. Seems like it would take just a few > >> more lines of code in the loop that tries to find a working socket to > >> check that we can send a byte and receive it. You'd have to be careful > >> not to block if the socket is indeed not working ... also, is it safe to > >> assume that a byte sent with send() is *immediately* ready to recv()? > >> > >> > >> > > > > Revised patch attached. I think this is about as much trouble as this > problem is worth ;-) > > cheers > > andrew > Index: src/backend/postmaster/pgstat.c > =================================================================== > RCS file: /projects/cvsroot/pgsql-server/src/backend/postmaster/pgstat.c,v > retrieving revision 1.61 > diff -c -r1.61 pgstat.c > *** src/backend/postmaster/pgstat.c 15 Mar 2004 16:21:37 -0000 1.61 > --- src/backend/postmaster/pgstat.c 22 Mar 2004 12:47:54 -0000 > *************** > *** 191,196 **** > --- 191,200 ---- > *addr, > hints; > int ret; > + fd_set rset; > + struct timeval tv; > + char test_byte; > + int sel_res; > > /* > * Force start of collector daemon if something to collect > *************** > *** 307,312 **** > --- 311,365 ---- > pgStatSock = -1; > continue; > } > + > + #define TESTBYTEVAL 99 > + > + /* try to send and receive a test byte on the socket */ > + > + test_byte = TESTBYTEVAL; > + if (send(pgStatSock,&test_byte,1,0) != 1) > + { > + ereport(LOG, > + (errcode_for_socket_access(), > + errmsg("test byte send failure socket for statistics collector: %m"))); > + closesocket(pgStatSock); > + pgStatSock = -1; > + continue; > + } > + test_byte++; > + for(;;) > + { > + FD_ZERO(&rset); > + FD_SET(pgStatSock, &rset); > + tv.tv_sec = 0; > + tv.tv_usec = 500000; > + sel_res = select(pgStatSock+1,&rset,NULL,NULL,&tv); > + if (sel_res != -1 || errno != EINTR) > + break; > + } > + if( sel_res > 0 && FD_ISSET(pgStatSock,&rset) > + && recv(pgStatSock,&test_byte,1,0) == 1) > + { > + if(test_byte != TESTBYTEVAL) > + { > + ereport(LOG, > + (errcode_for_socket_access(), > + errmsg("incorrect test value on send/receive on socket for statistics collector"))); > + closesocket(pgStatSock); > + pgStatSock = -1; > + continue; > + } > + } > + else > + { > + ereport(LOG, > + (errcode_for_socket_access(), > + errmsg("could receive test byte on socket for statistics collector: %m"))); > + closesocket(pgStatSock); > + pgStatSock = -1; > + continue; > + } > + > > /* If we get here, we have a working socket */ > break; > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073