Thread: 7.4RC2 regression failur and not running stats collector process on Solaris
7.4RC2 regression failur and not running stats collector process on Solaris
From
Kiyoshi Sawada
Date:
Failed to build on Solaris. Summary 1. Checking for pstat... no 2. Regression Failur stats ..... FAILED . 3. Not running stats buffer process and stats collector process. ------------------------------------------------ EnvironmentsSunOS 5.8 Generic_108528-15 sun4m sparcSunOS 5.8 Generic_108529-23 i86pc i386 i86pc Both sparc and i386PostgreSQL 7.4 RC2gcc (GCC) 3.3.2autoconf (GNU Autoconf) 2.57 bison (GNU Bison) 1.875GNU Make 3.80 ------------------------------------------------ (1) checking for pstat... no $ ./configure --enable-integer-datetimes \ --without-readline --with-openssl ----------------------- : : : : checking sys/pstat.h usability... no checking sys/pstat.h presence... no : : : : checking for pstat... no : : : : ----------------------- (2) Regression Failur stats ..... FAILED $ make check : : : : sequence ... ok polymorphism ... ok stats ... FAILED ============== shutting down postmaster ============== =======================1 of 93 tests failed. ======================= (3) Not running stats buffer process and stats collector process. $ pg_ctl start -D /usr/local/pgsql/data $ ps -ef | grep postmaster postgres 15912 15899 0 11:32:59 pts/2 0:00 grep postmaster postgres 15864 1 0 11:17:03 pts/1 0:00 /usr/local/pgsql/bin/postmaster $ -- Kiyoshi Sawada
Kiyoshi Sawada <sawa@nagoya2.jrc.or.jp> writes: > 2. Regression Failur stats ..... FAILED . > 3. Not running stats buffer process and stats collector process. So why not? Try looking in the postmaster log for errors related to stats collector startup. (pstat is irrelevant, btw.) regards, tom lane
Re: 7.4RC2 regression failur and not running stats collector process on Solaris
From
Kiyoshi Sawada
Date:
Dear Tom Lane. On Tue, 11 Nov 2003 09:18:48 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kiyoshi Sawada <sawa@nagoya2.jrc.or.jp> writes: > > 2. Regression Failur stats ..... FAILED . > > 3. Not running stats buffer process and stats collector process. > > So why not? Try looking in the postmaster log for errors related to > stats collector startup. (pstat is irrelevant, btw.) > There are 'could not bind socket for statistics collector' in the postmaster log. postmaster log ---------------------------- LOG: could not bind socket for statistics collector: Cannot assign requested address LOG: database system was shut down at 2003-11-12 08:56:59 JST LOG: checkpoint record is at 0/D743BCC LOG: redo record is at 0/D743BCC; undo record is at 0/0; shutdown TRUE LOG: next transaction ID: 25593; next OID: 684071 LOG: database system is ready ---------------------------- Networks and logs are shown below... ---------------------------- # netstat -a | grep 5432 *.5432 *.* 0 0 65536 0 LISTEN *.5432 *.* 0 0 65536 0 LISTEN *.5432 *.* 0 0 65536 0 LISTEN e1135ea8 stream-ord e1853918 00000000 /tmp/.s.PGSQL.5432 # cat /etc/inet/hosts 127.0.0.1 localhost mebius 127.0.0.1 mebius localhost 172.20.12.109 mebius # ifconfig -a lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 pcni0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3 inet 172.20.12.109 netmask ffff0000 broadcast172.20.255.255 ether 0:90:99:51:70:53 #cat pg_hba.conf # TYPE DATABASE USER IP-ADDRESS IP-MASK METHOD local all all trust # IPv4-style local connections: host all all 127.0.0.1 255.255.255.255 trust # IPv6-style local connections: host all all ::1 ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff tru st ---------------------------- -- Kiyoshi Sawada
Kiyoshi Sawada <sawa@nagoya2.jrc.or.jp> writes: > On Tue, 11 Nov 2003 09:18:48 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So why not? Try looking in the postmaster log for errors related to >> stats collector startup. (pstat is irrelevant, btw.) > LOG: could not bind socket for statistics collector: Cannot assign requested address Hmm ... that's sure the problem, but what can we do about it? ISTM that any non-broken system ought to be able to resolve "localhost". Actually it's worse than that: your system resolved "localhost" and then refused to bind to any of the IP addresses it had resolved. Look at the logic in pgstat_init() in src/backend/postmaster/pgstat.c. I think this suggests a misconfiguration in your system ... but if you can suggest a more robust way of setting up that connection, I'm all ears ... regards, tom lane
Re: 7.4RC2 regression failur and not running stats collector process on Solaris
From
"Zeugswetter Andreas SB SD"
Date:
> > LOG: could not bind socket for statistics collector: Cannot assign requested address > > Hmm ... that's sure the problem, but what can we do about it? ISTM that > any non-broken system ought to be able to resolve "localhost". Actually > it's worse than that: your system resolved "localhost" and then refused Are we using an api that only returns nslookup responses and not /etc/hosts entries ? At least on AIX it looks like it. Andreas
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes: > Are we using an api that only returns nslookup responses and not > /etc/hosts entries ? At least on AIX it looks like it. We use getaddrinfo(), or if that doesn't exist gethostbyname(). If there's a problem of that ilk then it's those library routines' fault. But AFAICT Kiyoshi's problem is not that ... unless maybe localhost is incorrectly listed as something other than 127.0.0.1 in one of those sources? regards, tom lane
Re: 7.4RC2 regression failur and not running stats collector process on Solaris
From
Kurt Roeckx
Date:
On Wed, Nov 12, 2003 at 10:32:38AM -0500, Tom Lane wrote: > "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes: > > Are we using an api that only returns nslookup responses and not > > /etc/hosts entries ? At least on AIX it looks like it. > > We use getaddrinfo(), or if that doesn't exist gethostbyname(). > If there's a problem of that ilk then it's those library routines' > fault. But AFAICT Kiyoshi's problem is not that ... unless maybe > localhost is incorrectly listed as something other than 127.0.0.1 > in one of those sources? It might depend on settings in /etc/host.conf or /etc/nsswitch.conf or something too? You can ussually tell the lib to use the files or not. It's always a good idea to put localhost into dns too. Kurt
Kurt Roeckx <Q@ping.be> writes: > It's always a good idea to put localhost into dns too. Yeah, but "localhost" *is* resolving as something on Kiyoshi's machine, else a different error message would have appeared. I'm wondering just what it resolved to though --- maybe we should have made the error messages more verbose, or added a debug-level message to show what addresses are being tried. regards, tom lane
Re: 7.4RC2 regression failur and not running stats collector process on Solaris
From
Kiyoshi Sawada
Date:
On Wed, 12 Nov 2003 13:46:52 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kurt Roeckx <Q@ping.be> writes: > > It's always a good idea to put localhost into dns too. > > Yeah, but "localhost" *is* resolving as something on Kiyoshi's > machine, else a different error message would have appeared. > > I'm wondering just what it resolved to though --- maybe we should > have made the error messages more verbose, or added a debug-level > message to show what addresses are being tried. > I tried nslookup on Kiyoshi's machine. -------------------------------- $ nslookup localhost Server: name.server.mydomain Address: xxx.xx.xx.xxx : : : (failed test) ^C $ nslookup 127.0.0.1 Server: mail.nagoya2.jrc.or.jp Address: 172.20.12.11 Name: localhost Address: 127.0.0.1 (succesful test) $ -------------------------------- /etc/resolv.conf domin mydomain nameserver xxx.xx.xx.xxx /etc/nsswitch.conf hosts: files dns ipnodes: files dns -------------------------------- Is it necessary to start a DNS server to bind 'localhost' in Kiyoshi's machine? Reference URL http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fsunone/3877 -- Kiyoshi Sawada
Re: 7.4RC2 regression failur and not running stats collector process on Solaris
From
Kiyoshi Sawada
Date:
On Thu, 13 Nov 2003 11:39:49 +0900 Kiyoshi Sawada <sawa@nagoya2.jrc.or.jp> wrote: > $ nslookup localhost > Server: name.server.mydomain > Address: xxx.xx.xx.xxx > : : : > (failed test) > Is it necessary to start a DNS server to bind 'localhost' in Kiyoshi's machine? > I got bind-9.2.2-sol8-intel-local pkg from sun freewear and install to /usr/local. /usr/local/bin/nslookup(ISC-nslookup) was tried on the state where /usr/local/bin/bind(ISC-bind) is not started yet. $ /usr/local/bin/nslookup localhost Note: nslookup is deprecated and may be removed from future releases. Consider using the `dig' or `host' programs instead. Run nslookup with the `-sil[ent]' option to prevent this message from appearing. Server: xxx.xx.xx.xxx Address: xxx.xx.xx.xxx#53 Name: localhost Address: 127.0.0.1 (succesful test) $ /usr/local/bin/nslookup 127.0.0.1 Note: nslookup is deprecated and may be removed from future releases. Consider using the `dig' or `host' programs instead. Run nslookup with the `-sil[ent]' option to prevent this message from appearing. Server: xxx.xx.xx.xxx Address: xxx.xx.xx.xxx#53 1.0.0.127.in-addr.arpa name = localhost. (succesful test) -- Kiyoshi Sawada
Kiyoshi Sawada <sawa@nagoya2.jrc.or.jp> writes: > $ /usr/local/bin/nslookup localhost > Note: nslookup is deprecated and may be removed from future releases. > Consider using the `dig' or `host' programs instead. Run nslookup with > the `-sil[ent]' option to prevent this message from appearing. > Server: xxx.xx.xx.xxx > Address: xxx.xx.xx.xxx#53 > Name: localhost > Address: 127.0.0.1 Hmm ... that's certainly evidence that "localhost" will resolve correctly on your machine, but then why is the bind() failing? If you have strace or ktrace or some other tool for watching the kernel calls issued by a particular process, please try tracing postmaster startup and look to see exactly what arguments are being passed to bind(). (Note: IIRC we first bind the postmaster listen socket and only later try to create the UDP socket for statistics, so this won't be the very first bind() in the trace.) regards, tom lane
Re: 7.4RC2 regression failur and not running stats collector process on Solaris
From
Kiyoshi Sawada
Date:
Thanks to Tom Lane, Kurt Roeckx, Zeugswetter Andreas and Shigehiro. It was solved. It reports. On Thu, 13 Nov 2003 09:50:59 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Hmm ... that's certainly evidence that "localhost" will resolve > correctly on your machine, but then why is the bind() failing? > > If you have strace or ktrace or some other tool for watching the > kernel calls issued by a particular process, please try tracing > postmaster startup and look to see exactly what arguments are being > passed to bind(). > I was got suggestion from Shigehiro. On Fri, 14 Nov 2003 02:46:05 +0900 (JST) Shigehiro Honda wrote: > > They are x86 and sparc if truss is applied to postmaster, > It was going to bind on UDP by IPv6. > It was succeeded to bind on sparc : > so_socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP, "", 1) = 5 > bind(5, 0x003B6A90, 32, 3) = 0 > It was failed to bind on x86 : > so_socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP, "", 1) = 4 > bind(4, 0x083301B8, 32, 3) Err#126 EADDRNOTAVAIL More he wrote : It seems that this flag has looked at and bind to the address which should be given to localhost. IPv4 and IPv6 are given to the cause which confirmed IPv6 in the direction of sparc, or lo0, and only IPv4 is given to thedirection of x86. Function called by src/backend/postmaster/pgstat.c I feel fault the library function getaddrinfo() on x86 solaris calledfrom getaddrinfo_all(). The address of IPv4 and IPv6 is stored in /etc/inet/ipnodes file on solaris. Then, I tried to remove IPv6 localhost address '::1' in /etc/inet/ipnodes. ---------------------------------------------- $ make cheke ====================== All 93 tests passed. ====================== $pg_ctl start ; ps -ef | grep postmaster postgres 20937 1 1 12:10:40 pts/4 0:00 /usr/local/pgsql/bin/postmaster postgres 20939 20937 0 12:10:41 pts/4 0:00 /usr/local/pgsql/bin/postmaster postgres 20940 20939 0 12:10:41 pts/4 0:00 /usr/local/pgsql/bin/postmaster to show the PIDs and current queries of all backends: regression=# SELECT pg_stat_get_backend_pid(S.backendid) AS procpid, pg_stat_get_backend_activity(S.backendid) AS current_query FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS S; procpid | current_query ---------+--------------- 5482 | (1 row) ---------------------------------------------- This method may be effective in the environment of only IPv4, and fault the library function getaddrinfo() on solaris . Thank you. -- Kiyoshi Sawada