Thread: Re: [BUGS] Tests randomly failed
On Mon, 26 Mar 2001, Justin Clift wrote: > Out of curiosity, how many times are you running the tests? > > I've been building 7.1RC1 over the weekend, and from one compiled > version I ran the regression tests 5 times before getting things to > pass. No changes anywhere, just re-ran the tests. > > So... it might be just co-incidence that the tests passed for you after > the change you mentioned below. > Alexander Klimov wrote: > > > > On Thu, 22 Mar 2001, Tom Lane wrote: > > > What I see is a lot of > > > > > > ! psql: Backend startup failed > > > > > > which suggests a fork() failure. Look in the postmaster logfile to see > > > the exact kernel error code --- but probably you are out of swap space > > > or up against the kernel's limit on number of processes for one userid. > > Strange, but this solution *also* works: I raise in /etc/system from 64 to > > set maxuprc=256 > > revert pg_regress.sh in original state (with unix sockets for solaris), > > and now all tests are passed. Yes, it was really just incidence -- I try again, and 15 of 15 `make check' passed with TCP sockets, but only 3 of 15 passed with UNIX sockets. So, final decision is `Unix sockets are not relaible on Solaris' Regards, ASK
Alexander Klimov <ask@wisdom.weizmann.ac.il> writes: > Yes, it was really just incidence -- I try again, and 15 of 15 `make > check' passed with TCP sockets, but only 3 of 15 passed with UNIX > sockets. So, final decision is `Unix sockets are not relaible on Solaris' So, shall we change pg_regress.sh to not use Unix sockets on Solaris? This would potentially cause problems for "make installcheck", if the postmaster was not started with -i. I suspect the socket problems are only seen when many clients try to connect at the same time, so the parallel regression tests are more prone to trouble than serial. Perhaps for Solaris, go to TCP only if it's parallel mode? regards, tom lane
Tom Lane writes: > Alexander Klimov <ask@wisdom.weizmann.ac.il> writes: > > Yes, it was really just incidence -- I try again, and 15 of 15 `make > > check' passed with TCP sockets, but only 3 of 15 passed with UNIX > > sockets. So, final decision is `Unix sockets are not relaible on Solaris' What become up 'set maxuprc=256'? I thought that made it work. Could other people try it or has it been disproven? > So, shall we change pg_regress.sh to not use Unix sockets on Solaris? This would hide problems during the test phase which would reappear in the production phase, no? > Perhaps for Solaris, go to TCP only if it's parallel mode? Unfortunately, it's not possible to detect this globally, only when you're actually parsing the schedule file and encouter a parallel group. This would mean running some tests this way and some tests another way. That might not be the worst of ideas, but it should be done on all platforms then. Additionally, it don't think it will really fix things, because some tests that failed were not in a parallel group (and I firmly recall that some of those were *not* follow-up failures). I think it is more related to a "high load" situation. If I were a Solaris user and had a bit more insight into this problem I would probably vote for #undef HAVE_UNIX_SOCKETS. But I'm not... -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
On Tue, Mar 27, 2001 at 07:17:47PM +0200, Peter Eisentraut allegedly wrote: > Tom Lane writes: > > > Alexander Klimov <ask@wisdom.weizmann.ac.il> writes: > > > Yes, it was really just incidence -- I try again, and 15 of 15 `make > > > check' passed with TCP sockets, but only 3 of 15 passed with UNIX > > > sockets. So, final decision is `Unix sockets are not relaible on Solaris' > > What become up 'set maxuprc=256'? I thought that made it work. Could > other people try it or has it been disproven? I'm giving this a test now... Cheers, Mathijs -- $_='while(read+STDIN,$_,2048){$a=29;$c=142;if((@a=unx"C*",$_)[20]&48){$h=5; $_=unxb24,join"",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d= unxV,xb25,$_;$b=73;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=($t=255)&($d >>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q*8^$q<<6))<<9 ,$_=(map{$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110;$t ^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z)[$_%8]}(16..271)) [$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]}print+x"C*",@a}';s/x/pack+/g;eval
On Wed, Mar 28, 2001 at 02:40:00AM +0200, Mathijs Brands allegedly wrote: > On Tue, Mar 27, 2001 at 07:17:47PM +0200, Peter Eisentraut allegedly wrote: > > Tom Lane writes: > > > > > Alexander Klimov <ask@wisdom.weizmann.ac.il> writes: > > > > Yes, it was really just incidence -- I try again, and 15 of 15 `make > > > > check' passed with TCP sockets, but only 3 of 15 passed with UNIX > > > > sockets. So, final decision is `Unix sockets are not relaible on Solaris' > > > > What become up 'set maxuprc=256'? I thought that made it work. Could > > other people try it or has it been disproven? > > I'm giving this a test now... No luck :( Tests still randomly crash. (This is an Ultra 10 machine.) 7.1RC1 on Solaris 7 SPARC does pass the regression tests (apart from the random test, which seems to be ignored on Solaris). (This is an Ultra 420 machine.) Cheers, Mathijs -- It's not that perl programmers are idiots, it's that the language rewards idiotic behavior in a way that no other language or tool has ever done. Erik Naggum
Mathijs Brands <mathijs@ilse.nl> writes: > No luck :( Tests still randomly crash. (This is an Ultra 10 machine.) How about if you change the pg_regress script to use TCP connections? (Look for the bit that forces unix_sockets=no for certain OSes, and add solaris) regards, tom lane
On 3/27/01 8:05 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Alexander Klimov <ask@wisdom.weizmann.ac.il> writes: >> Yes, it was really just incidence -- I try again, and 15 of 15 `make >> check' passed with TCP sockets, but only 3 of 15 passed with UNIX >> sockets. So, final decision is `Unix sockets are not relaible on Solaris' > > So, shall we change pg_regress.sh to not use Unix sockets on Solaris? Doh! I just submitted a patch to change pg_regress.sh before reading all of today's posts. Oh well, it was small. > This would potentially cause problems for "make installcheck", if the > postmaster was not started with -i. I suspect the socket problems are > only seen when many clients try to connect at the same time, so the > parallel regression tests are more prone to trouble than serial. > Perhaps for Solaris, go to TCP only if it's parallel mode? So just ignore my patch, as it makes pg_regress never use sockets with any solaris in any mode. However, if you don't want to go with Peter E's #ifdef solution there may be an easier way which is not dangerous at all. Adding a --with-sockets option to pg_regress.sh would be trivial and allow one to test either kind of socket on any platform. Most people who just want to feel good about the build aren't going to do much about fixing the unix sockets, and are probably not using them anyway. Since a default will have to be picked, the one that shows all tests passing will save some noise. However, both sockets may need to be checked during regression to detect the problems Peter mentions. Since there is also a conflict with OpenSSL and Solaris' crypt, I can see some sun folks opting for running the database locally as a hedge for trust in a single tier (DMZ) app scenario. Anyway, after messing around with the script tonight I just wanted to chime in that pg_regress.sh could use some improvement: - A more specific postmaster startup for a normal make check which says whether inet or unix sockets will be used. If you aren't aware of the problem on solaris and don't check netstat, the message is generic and there is a socket file in /tmp regardless of what type of socket gets used. Kind of subtle. - Maybe consistency in the script itself. Judging by the different styles of testing and the output between the --temp-install conditions, it appears as if each section was written by two different people. It could be cleaned up pretty fast and quite safely. Not anything that important, but in the interest of making things easier to understand these changes could be helpful (IMO). At the very least a mention of the socket thing in regress/README or the Solaris FAQ would be handy. I'd be happy to do any of the above if you think they are good ideas. If you're already on it, nevermind and thank you. Cheers, -- Rick
Richard T. Robino writes: > - A more specific postmaster startup for a normal make check which says > whether inet or unix sockets will be used. If you aren't aware of the > problem on solaris and don't check netstat, the message is generic and there > is a socket file in /tmp regardless of what type of socket gets used. Kind > of subtle. Can be done. I guess the TCP vs Unix domain issue was never this important before. The difference is also that the "installcheck" mode can be used against either kind of socket using the standard --host and --port options, depending on the requirements of the running server, whereas the temp install mode handles this issue internally -- and it never used to make a difference. > - Maybe consistency in the script itself. Judging by the different styles of > testing and the output between the --temp-install conditions, it appears as > if each section was written by two different people. It could be cleaned up > pretty fast and quite safely. Although large portions of actual code were copied over from the two separate predecessors to this script, the conventions and formatting should tend to be fairly consistent. Just taking a quick glance now, I would probably still write it this way, although some ideas for cosmetic changes, such as the one above, may arise through actual use. > Not anything that important, but in the interest of making things easier to > understand these changes could be helpful (IMO). At the very least a mention > of the socket thing in regress/README or the Solaris FAQ would be handy. I'd > be happy to do any of the above if you think they are good ideas. If you're > already on it, nevermind and thank you. Unless we decide on any code measures, it will end up being documented in FAQ_Solaris. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Hi Tom, My guess is that it would be possible to insert a check to see if the installed Postmaster was started with -i, and then choose between Unix domain sockets or TCP. BUT, whether trying to explain this in the installation document to the novice user who is setting up PostgreSQL for about the 1st, 2nd or 3rd time is something to think about... ??? Regards and best wishes, Justin Clift Tom Lane wrote: > > Alexander Klimov <ask@wisdom.weizmann.ac.il> writes: > > Yes, it was really just incidence -- I try again, and 15 of 15 `make > > check' passed with TCP sockets, but only 3 of 15 passed with UNIX > > sockets. So, final decision is `Unix sockets are not relaible on Solaris' > > So, shall we change pg_regress.sh to not use Unix sockets on Solaris? > > This would potentially cause problems for "make installcheck", if the > postmaster was not started with -i. I suspect the socket problems are > only seen when many clients try to connect at the same time, so the > parallel regression tests are more prone to trouble than serial. > Perhaps for Solaris, go to TCP only if it's parallel mode? > > regards, tom lane -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
> On Tue, Mar 27, 2001 at 08:08:47PM -0500, Tom Lane wrote: > Mathijs Brands <mathijs@ilse.nl> writes: > > No luck :( Tests still randomly crash. (This is an Ultra 10 machine.) > > How about if you change the pg_regress script to use TCP connections? > (Look for the bit that forces unix_sockets=no for certain OSes, and > add solaris) > > regards, tom lane Someone ran into this again yesterday with Solaris x86. The unix socket problem is probably the same for both architectures, so why not change pg_regress.sh to include *solaris* as part of the same case statement that excludes QNX and BeOS for unix sockets? It is safe to say that Solaris does have this problem. The postmaster startup test could say something a bit more useful this way too, as a standard "make check" does not report which type of sockets are being used (but it does when --temp-install=""). Some folks may want that to be recorded in the output consistently. A very small patch to do both of those things is attached. Cheers, -Rick