Thread: Tests randomly failed
Hi all. First time I execute `make check' 10 tests failed: float8 ... FAILED test numerology ... FAILED point ... FAILED lseg ... FAILED interval ... FAILED test geometry ... FAILED test horology ... FAILED subselect ... FAILED union ... FAILED test misc ... FAILED the second time it was only 5: abstime ... FAILED test horology ... FAILED subselect ... FAILED union ... FAILED test misc ... FAILED the third time is was 10 again: abstime ... FAILED tinterval ... FAILED inet ... FAILED comments ... FAILED oidjoins ... FAILED test horology ... FAILED case ... FAILED join ... FAILED portals ... FAILED test misc ... FAILED Results of second and third passes are in the attachment. It is looks like failed tests are due to ! psql: connectDBStart() -- connect() failed: Connection refused ! Is the postmaster running locally ! and accepting connections on Unix socket '/tmp/.s.PGSQL.65432'? My guess is that this could be due to high load of my box, but w said 11:29am up 24 day(s), 18:30, 2 users, load average: 0.00, 0.18, 0.29 and I shut down my production postmaster before tests, and I have 256MB of RAM, SunOS iridium 5.6 Generic_105181-20 sun4u sparc SUNW,Ultra-5_10 gcc version 2.95.2 19991024 (release) psql (PostgreSQL) 7.1RC1 (actualy from CVS) So, the question is: what is the reason of such behaviour, and how to fight against it? Regards, ASK
Alexander Klimov <ask@wisdom.weizmann.ac.il> writes: > It is looks like failed tests are due to > ! psql: connectDBStart() -- connect() failed: Connection refused > ! Is the postmaster running locally > ! and accepting connections on Unix socket '/tmp/.s.PGSQL.65432'? What I see is a lot of ! psql: Backend startup failed which suggests a fork() failure. Look in the postmaster logfile to see the exact kernel error code --- but probably you are out of swap space or up against the kernel's limit on number of processes for one userid. regards, tom lane
Alexander Klimov writes: > Results of second and third passes are in the attachment. > It is looks like failed tests are due to > ! psql: connectDBStart() -- connect() failed: Connection refused > ! Is the postmaster running locally > ! and accepting connections on Unix socket '/tmp/.s.PGSQL.65432'? > > My guess is that this could be due to high load of my box, but > w said > 11:29am up 24 day(s), 18:30, 2 users, load average: 0.00, 0.18, 0.29 > and I shut down my production postmaster before tests, and I have 256MB of > RAM, > SunOS iridium 5.6 Generic_105181-20 sun4u sparc SUNW,Ultra-5_10 > gcc version 2.95.2 19991024 (release) > psql (PostgreSQL) 7.1RC1 (actualy from CVS) In src/test/regress/pg_regress[.sh], line 163, change *-*-qnx* | *beos*) to *-*-qnx* | *beos* | *solaris*) and rerun the tests. This will avoid using Unix domain sockets, which are broken on Solaris. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut <peter_e@gmx.net> writes: > In src/test/regress/pg_regress[.sh], line 163, change > *-*-qnx* | *beos*) > to > *-*-qnx* | *beos* | *solaris*) > and rerun the tests. This will avoid using Unix domain sockets, which are > broken on Solaris. I was just thinking that maybe pg_regress should have a command line option to set unix_sockets=no, so that both connection options could be exercised when there's doubt. regards, tom lane
Hey guys, I don't understand what you mean by "This will avoid using Unix domain sockets, which are broken on Solaris.". If this were the case, then the errors which are described would happen on ALL solaris platforms wouldn't they? And other packages using Unix domain sockets would have problems too wouldn't they? If it's of any help, I get the same types of regression testing failures on Solaris, with the same "is the backend running?" type error messages.. when the installation of solaris HAS NOT had it's /etc/system file altered to change the amount of shared memory segments and semaphores. Whenever I have those problems, I insert the updated (higher) values for shared memory and semaphores, reboot the system, then the tests pass as the backend is able to start fine. Hope this is helpful. Regards and best wishes, Justin Clift Tom Lane wrote: > > Peter Eisentraut <peter_e@gmx.net> writes: > > In src/test/regress/pg_regress[.sh], line 163, change > > *-*-qnx* | *beos*) > > to > > *-*-qnx* | *beos* | *solaris*) > > > and rerun the tests. This will avoid using Unix domain sockets, which are > > broken on Solaris. > > I was just thinking that maybe pg_regress should have a command line > option to set unix_sockets=no, so that both connection options could > be exercised when there's doubt. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Justin Clift <jclift@iprimus.com.au> writes: > If it's of any help, I get the same types of regression testing failures > on Solaris, with the same "is the backend running?" type error > messages.. when the installation of solaris HAS NOT had it's /etc/system > file altered to change the amount of shared memory segments and > semaphores. > Whenever I have those problems, I insert the updated (higher) values for > shared memory and semaphores, reboot the system, then the tests pass as > the backend is able to start fine. Hm. That's interesting, but it's fairly hard to believe. For at least a couple releases past, Postgres has grabbed all the shared memory and semaphores that it wants at postmaster start. Insufficient shmem/sema resources should result in postmaster abort, not in occasional failures to start backends. regards, tom lane
Hi Tom, I know what you're saying, but I've come across it multiple times. The process for building a Solaris server for PostgreSQL is (from memory) : A) Install the OS B) Install the latest Maintenance Update C) Install the latest recommended patches D) Adjust system values for semaphores and shared memory E) Do an initial lockdown for system security F) Reboot for the new settings to take effect G) Create postgres group and postgres user H) Compile postgres I) Run the regression tests J) Lockdown system again K) Reboot, test startup scripts, etc <etc> If I'm working very late and can't find the semaphore settings, then sometimes I'll do them out-of-order. A number of times I've totally forgotten to change things until PostgreSQL complains either in the regression tests (as described in this thread) or during normal startup. We're talking a few times anyway, probably about.... um... 15 - 20 times or so that I've forgotten. Regards and best wishes, Justin Clift Tom Lane wrote: > > Justin Clift <jclift@iprimus.com.au> writes: > > If it's of any help, I get the same types of regression testing failures > > on Solaris, with the same "is the backend running?" type error > > messages.. when the installation of solaris HAS NOT had it's /etc/system > > file altered to change the amount of shared memory segments and > > semaphores. > > > Whenever I have those problems, I insert the updated (higher) values for > > shared memory and semaphores, reboot the system, then the tests pass as > > the backend is able to start fine. > > Hm. That's interesting, but it's fairly hard to believe. For at least > a couple releases past, Postgres has grabbed all the shared memory and > semaphores that it wants at postmaster start. Insufficient shmem/sema > resources should result in postmaster abort, not in occasional failures > to start backends. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://www.postgresql.org/search.mpl
Justin Clift writes: > I don't understand what you mean by "This will avoid using Unix domain > sockets, which are broken on Solaris.". > > If this were the case, then the errors which are described would happen > on ALL solaris platforms wouldn't they? I suppose things are a bit more complicated than that. We once has a brief suspicion that it could be related to Sun's tmpfs file system that /tmp often resides on, but I don't think this turned out to be the case. > And other packages using Unix domain sockets would have problems too > wouldn't they? Indeed. A while ago I looked around and found at least two packages (INN and Postfix) that had similar-sounding problems. In fact, one of the two ended up disabling it with the words "more trouble than it's worth". You could argue that X and KDE and what else should be broken as well. This is a good question. It could perhaps be related to a buffer problem, under the assumption that X usually passes small amounts of data through the pipe, whereas PostgreSQL can pass megabytes in a very short time. (Don't know what INN and Postfix would want to do with a local socket.) The bottom line here is that the switch from local sockets to TCP/IP invariably fixes the identical failure pattern. Make of that what you will. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
On Thu, 22 Mar 2001, Peter Eisentraut wrote: > In src/test/regress/pg_regress[.sh], line 163, change > > *-*-qnx* | *beos*) > > to > > *-*-qnx* | *beos* | *solaris*) > > and rerun the tests. This will avoid using Unix domain sockets, which are > broken on Solaris. Yes, it works now: ====================== All 76 tests passed. ====================== From the other hand, my production version uses Unix domain sockets without problems Regards, ASK
On Thu, 22 Mar 2001, Tom Lane wrote: > What I see is a lot of > > ! psql: Backend startup failed > > which suggests a fork() failure. Look in the postmaster logfile to see > the exact kernel error code --- but probably you are out of swap space > or up against the kernel's limit on number of processes for one userid. Strange, but this solution *also* works: I raise in /etc/system from 64 to set maxuprc=256 revert pg_regress.sh in original state (with unix sockets for solaris), and now all tests are passed. Regards, ASK