Thread: Re: 7.4RC2 regression failur and not running stats collector process
I think I have some more information on the statistics collector startup problem on Solaris. I inserted the following into pgstat.c: if (bind(pgStatSock, addr->ai_addr, addr->ai_addrlen) < 0) { /* what type of socket are we trying to bind? */ fprintf(stderr, "Address family is %d\n", addr->ai_addr->sa_family); ... } This returns a value of 26, which on Solaris is AF_INET6. But the machine I'm using (a V880 running 2.8) has no IPv6 address on any of its interfaces. And addr->ai_addr->sa_data is empty, so it's no surprise why bind() is failing. I'm not sure why Solaris is giving getaddrinfo_all an IPv6 address, though. -derek
On Thu, Nov 13, 2003 at 04:04:23PM -0500, Derek Morr wrote: > > the > machine I'm using (a V880 running 2.8) has no IPv6 address on any of its > interfaces. So the for loop over the addresses that are returned should go over both socket() and bind() instead of only socket(). And probably connect() too. The code now assumes if you create a socket of a certain type you can actually use it. Kurt
Kurt Roeckx <Q@ping.be> writes: > So the for loop over the addresses that are returned should go > over both socket() and bind() instead of only socket(). And > probably connect() too. > The code now assumes if you create a socket of a certain type you > can actually use it. Ah, light dawns... the postmaster socket code does this correctly, but pgstat.c doesn't. Too bad we didn't figure this out yesterday. We are now in code freeze for 7.4 release, and I'm hesitant to apply a fix for what is arguably a broken platform. Core guys, time for a vote ... do we fix, or hold this for 7.4.1? regards, tom lane
Tom, > Too bad we didn't figure this out yesterday. We are now in code freeze > for 7.4 release, and I'm hesitant to apply a fix for what is arguably a > broken platform. Core guys, time for a vote ... do we fix, or hold this > for 7.4.1? One thing I've not seen an answer to: does Postgres run acceptably on other people's Solaris boxes? If this bug is preventing running on Solaris at all, I'd say fix it ... Solaris is a major platform. If it only affects users of one particular Solaris patch version, then we do a big warning and save it for 7.4.1. -- Josh Berkus Aglio Database Solutions San Francisco
Tom Lane wrote: > Kurt Roeckx <Q@ping.be> writes: > > So the for loop over the addresses that are returned should go > > over both socket() and bind() instead of only socket(). And > > probably connect() too. > > The code now assumes if you create a socket of a certain type you > > can actually use it. > > Ah, light dawns... the postmaster socket code does this correctly, > but pgstat.c doesn't. > > Too bad we didn't figure this out yesterday. We are now in code freeze > for 7.4 release, and I'm hesitant to apply a fix for what is arguably a > broken platform. Core guys, time for a vote ... do we fix, or hold this > for 7.4.1? Must fix, I believe, especially if it is the same function call sequence used by the postmaster so we have a high probability it will work on all platforms. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Josh Berkus <josh@agliodbs.com> writes: > One thing I've not seen an answer to: does Postgres run acceptably on other > people's Solaris boxes? If this bug is preventing running on Solaris at > all, I'd say fix it ... Solaris is a major platform. If it only affects > users of one particular Solaris patch version, then we do a big warning and > save it for 7.4.1. I'm sure it depends on which Solaris version you're running, and possibly on local configuration issues as well. We should not however assume that the problem occurs *only* on Solaris. My take on a lot of the IPv6 funnies we've turned up is that they are kernel/userland compatibility issues (v6-ready libc on non-v6-ready kernel or vice versa), and that's surely at least as likely on Linux as Solaris. The regression test that detects the problem was only put in as of 7.4beta3. I'm not sure how many of our existing port reports were based on prior betas ... regards, tom lane
Solaris (5.7, 5.8, 5.9) on many different workstation/server types is very important to us... I agree with Bruce.... Bruce Momjian wrote: > > Must fix, I believe, especially if it is the same function call sequence > used by the postmaster so we have a high probability it will work on all > platforms. > -- P. J. "Josh" Rovero Sonalysts, Inc. Email: rovero@sonalysts.com www.sonalysts.com 215 Parkway North Work: (860)326-3671 or 442-4355 Waterford CT 06385 ***********************************************************************
Re: [CORE] 7.4RC2 regression failur and not running stats collector process
From
Christopher Browne
Date:
josh@agliodbs.com (Josh Berkus) writes: >> Too bad we didn't figure this out yesterday. We are now in code freeze >> for 7.4 release, and I'm hesitant to apply a fix for what is arguably a >> broken platform. Core guys, time for a vote ... do we fix, or hold this >> for 7.4.1? > > One thing I've not seen an answer to: does Postgres run acceptably on other > people's Solaris boxes? If this bug is preventing running on Solaris at > all, I'd say fix it ... Solaris is a major platform. If it only affects > users of one particular Solaris patch version, then we do a big warning and > save it for 7.4.1. For what it's worth, I have been running regression on Solaris with numerous of the betas, and RC1 and [just now] RC2, with NO problems. If the patch is deemed vital for others, it's possible that all I'm reporting is one of the statistics that will be outnumbered by others. (And in that case, I would be quick to test the patch to ensure it causes no adverse side-effects.) But it's not apparent that it is _vital_ here right now. -- let name="cbbrowne" and tld="libertyrms.info" in name ^ "@" ^ tld;; <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land)
Christopher Browne <cbbrowne@libertyrms.info> writes: > For what it's worth, I have been running regression on Solaris with > numerous of the betas, and RC1 and [just now] RC2, with NO problems. It seems clear that some Solaris installations are affected and some are not. Presumably there is some version difference or some local configuration difference ... but since we don't know what the critical factor is, we have no basis for guessing what fraction of Solaris installations will see the problem. > (And in that case, I would be quick to test the patch to ensure it > causes no adverse side-effects.) Here is the proposed patch --- please test it ASAP if you can. This is against RC2. regards, tom lane *** src/backend/postmaster/pgstat.c.orig Fri Nov 7 16:55:50 2003 --- src/backend/postmaster/pgstat.c Fri Nov 14 15:02:14 2003 *************** *** 203,208 **** --- 203,216 ---- goto startup_failed; } + /* + * On some platforms, getaddrinfo_all() may return multiple addresses + * only one of which will actually work (eg, both IPv6 and IPv4 addresses + * when kernel will reject IPv6). Worse, the failure may occur at the + * bind() or perhaps even connect() stage. So we must loop through the + * results till we find a working combination. We will generate LOG + * messages, but no error, for bogus combinations. + */ for (addr = addrs; addr; addr = addr->ai_next) { #ifdef HAVE_UNIX_SOCKETS *************** *** 210,262 **** if (addr->ai_family == AF_UNIX) continue; #endif ! if ((pgStatSock = socket(addr->ai_family, SOCK_DGRAM, 0)) >= 0) ! break; ! } ! if (!addr || pgStatSock < 0) ! { ! ereport(LOG, ! (errcode_for_socket_access(), ! errmsg("could not create socket for statistics collector: %m"))); ! goto startup_failed; ! } ! /* ! * Bind it to a kernel assigned port on localhost and get the assigned ! * port via getsockname(). ! */ ! if (bind(pgStatSock, addr->ai_addr, addr->ai_addrlen) < 0) ! { ! ereport(LOG, ! (errcode_for_socket_access(), ! errmsg("could not bind socket for statistics collector: %m"))); ! goto startup_failed; ! } ! freeaddrinfo_all(hints.ai_family, addrs); ! addrs = NULL; ! alen = sizeof(pgStatAddr); ! if (getsockname(pgStatSock, (struct sockaddr *) & pgStatAddr, &alen) < 0) ! { ! ereport(LOG, ! (errcode_for_socket_access(), ! errmsg("could not get address of socket for statistics collector: %m"))); ! goto startup_failed; } ! /* ! * Connect the socket to its own address. This saves a few cycles by ! * not having to respecify the target address on every send. This also ! * provides a kernel-level check that only packets from this same ! * address will be received. ! */ ! if (connect(pgStatSock, (struct sockaddr *) & pgStatAddr, alen) < 0) { ereport(LOG, (errcode_for_socket_access(), ! errmsg("could not connect socket for statistics collector: %m"))); goto startup_failed; } --- 218,285 ---- if (addr->ai_family == AF_UNIX) continue; #endif ! /* ! * Create the socket. ! */ ! if ((pgStatSock = socket(addr->ai_family, SOCK_DGRAM, 0)) < 0) ! { ! ereport(LOG, ! (errcode_for_socket_access(), ! errmsg("could not create socket for statistics collector: %m"))); ! continue; ! } ! /* ! * Bind it to a kernel assigned port on localhost and get the assigned ! * port via getsockname(). ! */ ! if (bind(pgStatSock, addr->ai_addr, addr->ai_addrlen) < 0) ! { ! ereport(LOG, ! (errcode_for_socket_access(), ! errmsg("could not bind socket for statistics collector: %m"))); ! closesocket(pgStatSock); ! pgStatSock = -1; ! continue; ! } ! alen = sizeof(pgStatAddr); ! if (getsockname(pgStatSock, (struct sockaddr *) &pgStatAddr, &alen) < 0) ! { ! ereport(LOG, ! (errcode_for_socket_access(), ! errmsg("could not get address of socket for statistics collector: %m"))); ! closesocket(pgStatSock); ! pgStatSock = -1; ! continue; ! } ! /* ! * Connect the socket to its own address. This saves a few cycles by ! * not having to respecify the target address on every send. This also ! * provides a kernel-level check that only packets from this same ! * address will be received. ! */ ! if (connect(pgStatSock, (struct sockaddr *) &pgStatAddr, alen) < 0) ! { ! ereport(LOG, ! (errcode_for_socket_access(), ! errmsg("could not connect socket for statistics collector: %m"))); ! closesocket(pgStatSock); ! pgStatSock = -1; ! continue; ! } ! /* If we get here, we have a working socket */ ! break; } ! /* Did we find a working address? */ ! if (!addr || pgStatSock < 0) { ereport(LOG, (errcode_for_socket_access(), ! errmsg("disabling statistics collector for lack of working socket"))); goto startup_failed; } *************** *** 284,289 **** --- 307,314 ---- errmsg("could not create pipe for statistics collector: %m"))); goto startup_failed; } + + freeaddrinfo_all(hints.ai_family, addrs); return;
I can fire up our solaris machine and let you have access to it if you want to do some destructive testing. Tom Lane wrote: >Christopher Browne <cbbrowne@libertyrms.info> writes: > > >>For what it's worth, I have been running regression on Solaris with >>numerous of the betas, and RC1 and [just now] RC2, with NO problems. >> >> > >It seems clear that some Solaris installations are affected and some >are not. Presumably there is some version difference or some local >configuration difference ... but since we don't know what the critical >factor is, we have no basis for guessing what fraction of Solaris >installations will see the problem. > > > >>(And in that case, I would be quick to test the patch to ensure it >>causes no adverse side-effects.) >> >> > >Here is the proposed patch --- please test it ASAP if you can. >This is against RC2. > > regards, tom lane > > > >------------------------------------------------------------------------ > >*** src/backend/postmaster/pgstat.c.orig Fri Nov 7 16:55:50 2003 >--- src/backend/postmaster/pgstat.c Fri Nov 14 15:02:14 2003 >*************** >*** 203,208 **** >--- 203,216 ---- > goto startup_failed; > } > >+ /* >+ * On some platforms, getaddrinfo_all() may return multiple addresses >+ * only one of which will actually work (eg, both IPv6 and IPv4 addresses >+ * when kernel will reject IPv6). Worse, the failure may occur at the >+ * bind() or perhaps even connect() stage. So we must loop through the >+ * results till we find a working combination. We will generate LOG >+ * messages, but no error, for bogus combinations. >+ */ > for (addr = addrs; addr; addr = addr->ai_next) > { > #ifdef HAVE_UNIX_SOCKETS >*************** >*** 210,262 **** > if (addr->ai_family == AF_UNIX) > continue; > #endif >! if ((pgStatSock = socket(addr->ai_family, SOCK_DGRAM, 0)) >= 0) >! break; >! } > >! if (!addr || pgStatSock < 0) >! { >! ereport(LOG, >! (errcode_for_socket_access(), >! errmsg("could not create socket for statistics collector: %m"))); >! goto startup_failed; >! } > >! /* >! * Bind it to a kernel assigned port on localhost and get the assigned >! * port via getsockname(). >! */ >! if (bind(pgStatSock, addr->ai_addr, addr->ai_addrlen) < 0) >! { >! ereport(LOG, >! (errcode_for_socket_access(), >! errmsg("could not bind socket for statistics collector: %m"))); >! goto startup_failed; >! } > >! freeaddrinfo_all(hints.ai_family, addrs); >! addrs = NULL; > >! alen = sizeof(pgStatAddr); >! if (getsockname(pgStatSock, (struct sockaddr *) & pgStatAddr, &alen) < 0) >! { >! ereport(LOG, >! (errcode_for_socket_access(), >! errmsg("could not get address of socket for statistics collector: %m"))); >! goto startup_failed; > } > >! /* >! * Connect the socket to its own address. This saves a few cycles by >! * not having to respecify the target address on every send. This also >! * provides a kernel-level check that only packets from this same >! * address will be received. >! */ >! if (connect(pgStatSock, (struct sockaddr *) & pgStatAddr, alen) < 0) > { > ereport(LOG, > (errcode_for_socket_access(), >! errmsg("could not connect socket for statistics collector: %m"))); > goto startup_failed; > } > >--- 218,285 ---- > if (addr->ai_family == AF_UNIX) > continue; > #endif >! /* >! * Create the socket. >! */ >! if ((pgStatSock = socket(addr->ai_family, SOCK_DGRAM, 0)) < 0) >! { >! ereport(LOG, >! (errcode_for_socket_access(), >! errmsg("could not create socket for statistics collector: %m"))); >! continue; >! } > >! /* >! * Bind it to a kernel assigned port on localhost and get the assigned >! * port via getsockname(). >! */ >! if (bind(pgStatSock, addr->ai_addr, addr->ai_addrlen) < 0) >! { >! ereport(LOG, >! (errcode_for_socket_access(), >! errmsg("could not bind socket for statistics collector: %m"))); >! closesocket(pgStatSock); >! pgStatSock = -1; >! continue; >! } > >! alen = sizeof(pgStatAddr); >! if (getsockname(pgStatSock, (struct sockaddr *) &pgStatAddr, &alen) < 0) >! { >! ereport(LOG, >! (errcode_for_socket_access(), >! errmsg("could not get address of socket for statistics collector: %m"))); >! closesocket(pgStatSock); >! pgStatSock = -1; >! continue; >! } > >! /* >! * Connect the socket to its own address. This saves a few cycles by >! * not having to respecify the target address on every send. This also >! * provides a kernel-level check that only packets from this same >! * address will be received. >! */ >! if (connect(pgStatSock, (struct sockaddr *) &pgStatAddr, alen) < 0) >! { >! ereport(LOG, >! (errcode_for_socket_access(), >! errmsg("could not connect socket for statistics collector: %m"))); >! closesocket(pgStatSock); >! pgStatSock = -1; >! continue; >! } > >! /* If we get here, we have a working socket */ >! break; > } > >! /* Did we find a working address? */ >! if (!addr || pgStatSock < 0) > { > ereport(LOG, > (errcode_for_socket_access(), >! errmsg("disabling statistics collector for lack of working socket"))); > goto startup_failed; > } > >*************** >*** 284,289 **** >--- 307,314 ---- > errmsg("could not create pipe for statistics collector: %m"))); > goto startup_failed; > } >+ >+ freeaddrinfo_all(hints.ai_family, addrs); > > return; > > > >------------------------------------------------------------------------ > > >---------------------------(end of broadcast)--------------------------- >TIP 7: don't forget to increase your free space map settings > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org
Re: [CORE] 7.4RC2 regression failur and not running stats collector process
From
"Glenn Wiorek"
Date:
Hmm I know it's been a while since I used patch but I seem to be having problems applying it. Perhaps my patch is outdated?? patch -b pgstat.c < patchfile Looks like a new-style context diff. Hunk#2failed at line 203. Hunk#2failed at line 210. Hunk#3failed at line 284. 3 out of 3 hunks ailed: saving reject to pgstat.c.rej ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Christopher Browne" <cbbrowne@libertyrms.info> Cc: <pgsql-hackers@postgresql.org> Sent: Friday, November 14, 2003 2:42 PM Subject: Re: [HACKERS] [CORE] 7.4RC2 regression failur and not running stats collector process > Christopher Browne <cbbrowne@libertyrms.info> writes: > > For what it's worth, I have been running regression on Solaris with > > numerous of the betas, and RC1 and [just now] RC2, with NO problems. > > It seems clear that some Solaris installations are affected and some > are not. Presumably there is some version difference or some local > configuration difference ... but since we don't know what the critical > factor is, we have no basis for guessing what fraction of Solaris > installations will see the problem. > > > (And in that case, I would be quick to test the patch to ensure it > > causes no adverse side-effects.) > > Here is the proposed patch --- please test it ASAP if you can. > This is against RC2. > > regards, tom lane > > ---------------------------------------------------------------------------- ---- > *** src/backend/postmaster/pgstat.c.orig Fri Nov 7 16:55:50 2003 > --- src/backend/postmaster/pgstat.c Fri Nov 14 15:02:14 2003 > *************** > *** 203,208 **** > --- 203,216 ---- > goto startup_failed; > } > > + /* > + * On some platforms, getaddrinfo_all() may return multiple addresses > + * only one of which will actually work (eg, both IPv6 and IPv4 addresses > + * when kernel will reject IPv6). Worse, the failure may occur at the > + * bind() or perhaps even connect() stage. So we must loop through the > + * results till we find a working combination. We will generate LOG > + * messages, but no error, for bogus combinations. > + */ > for (addr = addrs; addr; addr = addr->ai_next) > { > #ifdef HAVE_UNIX_SOCKETS > *************** > *** 210,262 **** > if (addr->ai_family == AF_UNIX) > continue; > #endif > ! if ((pgStatSock = socket(addr->ai_family, SOCK_DGRAM, 0)) >= 0) > ! break; > ! } > > ! if (!addr || pgStatSock < 0) > ! { > ! ereport(LOG, > ! (errcode_for_socket_access(), > ! errmsg("could not create socket for statistics collector: %m"))); > ! goto startup_failed; > ! } > > ! /* > ! * Bind it to a kernel assigned port on localhost and get the assigned > ! * port via getsockname(). > ! */ > ! if (bind(pgStatSock, addr->ai_addr, addr->ai_addrlen) < 0) > ! { > ! ereport(LOG, > ! (errcode_for_socket_access(), > ! errmsg("could not bind socket for statistics collector: %m"))); > ! goto startup_failed; > ! } > > ! freeaddrinfo_all(hints.ai_family, addrs); > ! addrs = NULL; > > ! alen = sizeof(pgStatAddr); > ! if (getsockname(pgStatSock, (struct sockaddr *) & pgStatAddr, &alen) < 0) > ! { > ! ereport(LOG, > ! (errcode_for_socket_access(), > ! errmsg("could not get address of socket for statistics collector: %m"))); > ! goto startup_failed; > } > > ! /* > ! * Connect the socket to its own address. This saves a few cycles by > ! * not having to respecify the target address on every send. This also > ! * provides a kernel-level check that only packets from this same > ! * address will be received. > ! */ > ! if (connect(pgStatSock, (struct sockaddr *) & pgStatAddr, alen) < 0) > { > ereport(LOG, > (errcode_for_socket_access(), > ! errmsg("could not connect socket for statistics collector: %m"))); > goto startup_failed; > } > > --- 218,285 ---- > if (addr->ai_family == AF_UNIX) > continue; > #endif > ! /* > ! * Create the socket. > ! */ > ! if ((pgStatSock = socket(addr->ai_family, SOCK_DGRAM, 0)) < 0) > ! { > ! ereport(LOG, > ! (errcode_for_socket_access(), > ! errmsg("could not create socket for statistics collector: %m"))); > ! continue; > ! } > > ! /* > ! * Bind it to a kernel assigned port on localhost and get the assigned > ! * port via getsockname(). > ! */ > ! if (bind(pgStatSock, addr->ai_addr, addr->ai_addrlen) < 0) > ! { > ! ereport(LOG, > ! (errcode_for_socket_access(), > ! errmsg("could not bind socket for statistics collector: %m"))); > ! closesocket(pgStatSock); > ! pgStatSock = -1; > ! continue; > ! } > > ! alen = sizeof(pgStatAddr); > ! if (getsockname(pgStatSock, (struct sockaddr *) &pgStatAddr, &alen) < 0) > ! { > ! ereport(LOG, > ! (errcode_for_socket_access(), > ! errmsg("could not get address of socket for statistics collector: %m"))); > ! closesocket(pgStatSock); > ! pgStatSock = -1; > ! continue; > ! } > > ! /* > ! * Connect the socket to its own address. This saves a few cycles by > ! * not having to respecify the target address on every send. This also > ! * provides a kernel-level check that only packets from this same > ! * address will be received. > ! */ > ! if (connect(pgStatSock, (struct sockaddr *) &pgStatAddr, alen) < 0) > ! { > ! ereport(LOG, > ! (errcode_for_socket_access(), > ! errmsg("could not connect socket for statistics collector: %m"))); > ! closesocket(pgStatSock); > ! pgStatSock = -1; > ! continue; > ! } > > ! /* If we get here, we have a working socket */ > ! break; > } > > ! /* Did we find a working address? */ > ! if (!addr || pgStatSock < 0) > { > ereport(LOG, > (errcode_for_socket_access(), > ! errmsg("disabling statistics collector for lack of working socket"))); > goto startup_failed; > } > > *************** > *** 284,289 **** > --- 307,314 ---- > errmsg("could not create pipe for statistics collector: %m"))); > goto startup_failed; > } > + > + freeaddrinfo_all(hints.ai_family, addrs); > > return; > > ---------------------------------------------------------------------------- ---- > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
On Fri, 14 Nov 2003, Josh Berkus wrote: > Tom, > > > Too bad we didn't figure this out yesterday. We are now in code freeze > > for 7.4 release, and I'm hesitant to apply a fix for what is arguably a > > broken platform. Core guys, time for a vote ... do we fix, or hold this > > for 7.4.1? > > One thing I've not seen an answer to: does Postgres run acceptably on other > people's Solaris boxes? If this bug is preventing running on Solaris at > all, I'd say fix it ... Solaris is a major platform. If it only affects > users of one particular Solaris patch version, then we do a big warning and > save it for 7.4.1. I agree with Josh on this ...
Check that you don't need to use the -p option at all. Also, make sure you remove any ^M (DOS CR) characters from the line endings. That always happens to me if I receive the emailon a windows machine and save the attachment, windows sometimes likes to rewrite all the line endings, causing the problem below. Chris Glenn Wiorek wrote: > Hmm I know it's been a while since I used patch but I seem to be having > problems applying it. Perhaps my patch is outdated?? > > patch -b pgstat.c < patchfile > Looks like a new-style context diff. > Hunk#2failed at line 203. > Hunk#2failed at line 210. > Hunk#3failed at line 284. > 3 out of 3 hunks ailed: saving reject to pgstat.c.rej