Thread: 7.4RC2 regression failur and not running stats collector process on Solaris

7.4RC2 regression failur and not running stats collector process on Solaris

From
Kiyoshi Sawada
Date:
Failed to build on Solaris.

Summary
1. Checking for pstat... no
2. Regression Failur  stats ..... FAILED .
3. Not running stats buffer process and stats collector process.

------------------------------------------------
EnvironmentsSunOS 5.8 Generic_108528-15 sun4m sparcSunOS 5.8 Generic_108529-23 i86pc i386 i86pc
Both sparc and i386PostgreSQL 7.4 RC2gcc (GCC) 3.3.2autoconf (GNU Autoconf) 2.57 bison (GNU Bison) 1.875GNU Make 3.80
------------------------------------------------

(1) checking for pstat... no
$ ./configure --enable-integer-datetimes \             --without-readline --with-openssl
----------------------- : : : :
checking sys/pstat.h usability... no
checking sys/pstat.h presence... no : : : :
checking for pstat... no : : : :
-----------------------

(2) Regression Failur  stats ..... FAILED
$ make check    : : : :    sequence             ... ok    polymorphism         ... ok    stats                ...
FAILED
============== shutting down postmaster  ==============
=======================1 of 93 tests failed.
=======================

(3) Not running stats buffer process and stats collector process.
$ pg_ctl start -D /usr/local/pgsql/data
$ ps -ef | grep postmaster
postgres 15912 15899  0 11:32:59 pts/2    0:00 grep postmaster
postgres 15864     1  0 11:17:03 pts/1    0:00 /usr/local/pgsql/bin/postmaster
$

--
Kiyoshi Sawada



Kiyoshi Sawada <sawa@nagoya2.jrc.or.jp> writes:
> 2. Regression Failur  stats ..... FAILED .
> 3. Not running stats buffer process and stats collector process.

So why not?  Try looking in the postmaster log for errors related to
stats collector startup.  (pstat is irrelevant, btw.)
        regards, tom lane


Re: 7.4RC2 regression failur and not running stats collector process on Solaris

From
Kiyoshi Sawada
Date:
Dear Tom Lane.

On Tue, 11 Nov 2003 09:18:48 -0500  Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Kiyoshi Sawada <sawa@nagoya2.jrc.or.jp> writes:
> > 2. Regression Failur  stats ..... FAILED .
> > 3. Not running stats buffer process and stats collector process.
> 
> So why not?  Try looking in the postmaster log for errors related to
> stats collector startup.  (pstat is irrelevant, btw.)
> 

There are 'could not bind socket for statistics collector' in the postmaster log.
postmaster log
----------------------------
LOG:  could not bind socket for statistics collector: Cannot assign requested address
LOG:  database system was shut down at 2003-11-12 08:56:59 JST
LOG:  checkpoint record is at 0/D743BCC
LOG:  redo record is at 0/D743BCC; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 25593; next OID: 684071
LOG:  database system is ready
----------------------------

Networks and logs are shown below...
----------------------------
# netstat -a | grep 5432     *.5432               *.*                0      0 65536      0 LISTEN     *.5432
  *.*                0      0 65536      0 LISTEN     *.5432               *.*                0      0 65536      0
LISTEN
e1135ea8 stream-ord e1853918 00000000    /tmp/.s.PGSQL.5432

# cat /etc/inet/hosts
127.0.0.1  localhost mebius
127.0.0.1  mebius localhost
172.20.12.109   mebius

# ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1       inet 127.0.0.1 netmask ff000000
pcni0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3       inet 172.20.12.109 netmask ffff0000
broadcast172.20.255.255       ether 0:90:99:51:70:53
 

#cat pg_hba.conf
# TYPE  DATABASE    USER        IP-ADDRESS        IP-MASK           METHOD
local   all         all                                             trust
# IPv4-style local connections:
host    all         all         127.0.0.1         255.255.255.255   trust
# IPv6-style local connections:
host    all         all         ::1               ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff        tru
st
----------------------------

--
Kiyoshi Sawada





Kiyoshi Sawada <sawa@nagoya2.jrc.or.jp> writes:
> On Tue, 11 Nov 2003 09:18:48 -0500  Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So why not?  Try looking in the postmaster log for errors related to
>> stats collector startup.  (pstat is irrelevant, btw.)

> LOG:  could not bind socket for statistics collector: Cannot assign requested address

Hmm ... that's sure the problem, but what can we do about it?  ISTM that
any non-broken system ought to be able to resolve "localhost".  Actually
it's worse than that: your system resolved "localhost" and then refused
to bind to any of the IP addresses it had resolved.  Look at the logic
in pgstat_init() in src/backend/postmaster/pgstat.c.  I think this
suggests a misconfiguration in your system ... but if you can suggest a
more robust way of setting up that connection, I'm all ears ...
        regards, tom lane


Re: 7.4RC2 regression failur and not running stats collector process on Solaris

From
"Zeugswetter Andreas SB SD"
Date:
> > LOG:  could not bind socket for statistics collector: Cannot assign requested address
>
> Hmm ... that's sure the problem, but what can we do about it? ISTM that
> any non-broken system ought to be able to resolve "localhost".  Actually
> it's worse than that: your system resolved "localhost" and then refused

Are we using an api that only returns nslookup responses and not
/etc/hosts entries ? At least on AIX it looks like it.

Andreas


"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> Are we using an api that only returns nslookup responses and not
> /etc/hosts entries ? At least on AIX it looks like it.

We use getaddrinfo(), or if that doesn't exist gethostbyname().
If there's a problem of that ilk then it's those library routines'
fault.  But AFAICT Kiyoshi's problem is not that ... unless maybe
localhost is incorrectly listed as something other than 127.0.0.1
in one of those sources?
        regards, tom lane


Re: 7.4RC2 regression failur and not running stats collector process on Solaris

From
Kurt Roeckx
Date:
On Wed, Nov 12, 2003 at 10:32:38AM -0500, Tom Lane wrote:
> "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> > Are we using an api that only returns nslookup responses and not
> > /etc/hosts entries ? At least on AIX it looks like it.
> 
> We use getaddrinfo(), or if that doesn't exist gethostbyname().
> If there's a problem of that ilk then it's those library routines'
> fault.  But AFAICT Kiyoshi's problem is not that ... unless maybe
> localhost is incorrectly listed as something other than 127.0.0.1
> in one of those sources?

It might depend on settings in /etc/host.conf or
/etc/nsswitch.conf or something too?

You can ussually tell the lib to use the files or not.

It's always a good idea to put localhost into dns too.


Kurt



Kurt Roeckx <Q@ping.be> writes:
> It's always a good idea to put localhost into dns too.

Yeah, but "localhost" *is* resolving as something on Kiyoshi's
machine, else a different error message would have appeared.

I'm wondering just what it resolved to though --- maybe we should
have made the error messages more verbose, or added a debug-level
message to show what addresses are being tried.
        regards, tom lane


Re: 7.4RC2 regression failur and not running stats collector process on Solaris

From
Kiyoshi Sawada
Date:
On Wed, 12 Nov 2003 13:46:52 -0500  Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Kurt Roeckx <Q@ping.be> writes:
> > It's always a good idea to put localhost into dns too.
> 
> Yeah, but "localhost" *is* resolving as something on Kiyoshi's
> machine, else a different error message would have appeared.
> 
> I'm wondering just what it resolved to though --- maybe we should
> have made the error messages more verbose, or added a debug-level
> message to show what addresses are being tried.
> 

I tried nslookup on Kiyoshi's machine.
--------------------------------
$ nslookup localhost
Server:  name.server.mydomain
Address:  xxx.xx.xx.xxx : : :
(failed test)
^C

$ nslookup 127.0.0.1
Server:  mail.nagoya2.jrc.or.jp
Address:  172.20.12.11

Name:    localhost
Address:  127.0.0.1

(succesful test)

$
--------------------------------
/etc/resolv.conf
domin mydomain
nameserver xxx.xx.xx.xxx

/etc/nsswitch.conf
hosts:      files dns
ipnodes:    files dns
--------------------------------
Is it necessary to start a DNS server to bind 'localhost' in Kiyoshi's machine?

Reference URL
http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fsunone/3877

--
Kiyoshi Sawada



Re: 7.4RC2 regression failur and not running stats collector process on Solaris

From
Kiyoshi Sawada
Date:
On Thu, 13 Nov 2003 11:39:49 +0900  Kiyoshi Sawada <sawa@nagoya2.jrc.or.jp> wrote:
> $ nslookup localhost
> Server:  name.server.mydomain
> Address:  xxx.xx.xx.xxx
>   : : :
> (failed test)
> Is it necessary to start a DNS server to bind 'localhost' in Kiyoshi's machine?
> 

I got bind-9.2.2-sol8-intel-local pkg from sun freewear and install to /usr/local.
/usr/local/bin/nslookup(ISC-nslookup) was tried on the state where /usr/local/bin/bind(ISC-bind) is not started yet.

$ /usr/local/bin/nslookup localhost
Note:  nslookup is deprecated and may be removed from future releases.
Consider using the `dig' or `host' programs instead.  Run nslookup with
the `-sil[ent]' option to prevent this message from appearing.
Server:         xxx.xx.xx.xxx
Address:        xxx.xx.xx.xxx#53

Name:   localhost
Address: 127.0.0.1

(succesful test)


$ /usr/local/bin/nslookup 127.0.0.1
Note:  nslookup is deprecated and may be removed from future releases.
Consider using the `dig' or `host' programs instead.  Run nslookup with
the `-sil[ent]' option to prevent this message from appearing.
Server:         xxx.xx.xx.xxx
Address:        xxx.xx.xx.xxx#53

1.0.0.127.in-addr.arpa  name = localhost.

(succesful test)

--
Kiyoshi Sawada



Kiyoshi Sawada <sawa@nagoya2.jrc.or.jp> writes:
> $ /usr/local/bin/nslookup localhost
> Note:  nslookup is deprecated and may be removed from future releases.
> Consider using the `dig' or `host' programs instead.  Run nslookup with
> the `-sil[ent]' option to prevent this message from appearing.
> Server:         xxx.xx.xx.xxx
> Address:        xxx.xx.xx.xxx#53

> Name:   localhost
> Address: 127.0.0.1

Hmm ... that's certainly evidence that "localhost" will resolve
correctly on your machine, but then why is the bind() failing?

If you have strace or ktrace or some other tool for watching the
kernel calls issued by a particular process, please try tracing
postmaster startup and look to see exactly what arguments are being
passed to bind().

(Note: IIRC we first bind the postmaster listen socket and only later
try to create the UDP socket for statistics, so this won't be the
very first bind() in the trace.)
        regards, tom lane


Re: 7.4RC2 regression failur and not running stats collector process on Solaris

From
Kiyoshi Sawada
Date:
Thanks to Tom Lane, Kurt Roeckx, Zeugswetter Andreas and Shigehiro.

It was solved. It reports.

On Thu, 13 Nov 2003 09:50:59 -0500  Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> Hmm ... that's certainly evidence that "localhost" will resolve
> correctly on your machine, but then why is the bind() failing?
> 
> If you have strace or ktrace or some other tool for watching the
> kernel calls issued by a particular process, please try tracing
> postmaster startup and look to see exactly what arguments are being
> passed to bind().
> 

I was got suggestion from Shigehiro.

On Fri, 14 Nov 2003 02:46:05 +0900 (JST) Shigehiro Honda wrote:
> 
> They are x86 and sparc if truss is applied to postmaster,
> It was going to bind on UDP by IPv6.
> It was succeeded to bind on sparc :
>   so_socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP, "", 1) = 5
>   bind(5, 0x003B6A90, 32, 3)                      = 0
> It was failed to bind on x86 :
>   so_socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP, "", 1) = 4
>   bind(4, 0x083301B8, 32, 3)                      Err#126 EADDRNOTAVAIL

More he wrote :
It seems that this flag has looked at and bind to the address which should be given to localhost.
IPv4 and IPv6 are given to the cause which confirmed IPv6 in the direction of sparc, or lo0, and only IPv4 is given to
thedirection of x86.
 
Function called by src/backend/postmaster/pgstat.c I feel fault the library function getaddrinfo() on x86 solaris
calledfrom getaddrinfo_all().
 

The address of IPv4 and IPv6 is stored in /etc/inet/ipnodes file on solaris.
Then, I tried to remove IPv6 localhost address '::1' in /etc/inet/ipnodes. 
----------------------------------------------
$ make cheke
======================
All 93 tests passed.
======================

$pg_ctl start ; ps -ef | grep postmaster
postgres 20937     1  1 12:10:40 pts/4    0:00 /usr/local/pgsql/bin/postmaster
postgres 20939 20937  0 12:10:41 pts/4    0:00 /usr/local/pgsql/bin/postmaster
postgres 20940 20939  0 12:10:41 pts/4    0:00 /usr/local/pgsql/bin/postmaster

to show the PIDs and current queries of all backends: 
regression=# SELECT pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS S;
procpid | current_query
---------+---------------   5482 |
(1 row)
----------------------------------------------

This method may be effective in the environment of only IPv4, and fault the library function getaddrinfo() on solaris
.

Thank you.

--
Kiyoshi Sawada