Thread: Re: [BUGS] Tests randomly failed

Re: [BUGS] Tests randomly failed

From
Alexander Klimov
Date:
On Mon, 26 Mar 2001, Justin Clift wrote:
> Out of curiosity, how many times are you running the tests?
> 
> I've been building 7.1RC1 over the weekend, and from one compiled
> version I ran the regression tests 5 times before getting things to
> pass.  No changes anywhere, just re-ran the tests.
> 
> So... it might be just co-incidence that the tests passed for you after
> the change you mentioned below.

> Alexander Klimov wrote:
> > 
> > On Thu, 22 Mar 2001, Tom Lane wrote:
> > > What I see is a lot of
> > >
> > > ! psql: Backend startup failed
> > >
> > > which suggests a fork() failure.  Look in the postmaster logfile to see
> > > the exact kernel error code --- but probably you are out of swap space
> > > or up against the kernel's limit on number of processes for one userid.
> > Strange, but this solution *also* works: I raise in /etc/system from 64 to
> > set maxuprc=256
> > revert pg_regress.sh in original state (with unix sockets for solaris),
> > and now all tests are passed.

Yes, it was really just incidence -- I try again, and 15 of 15 `make
check' passed with TCP sockets, but only 3 of 15 passed with UNIX
sockets. So, final decision is `Unix sockets are not relaible on Solaris'

Regards,
ASK




Re: Re: [BUGS] Tests randomly failed

From
Tom Lane
Date:
Alexander Klimov <ask@wisdom.weizmann.ac.il> writes:
> Yes, it was really just incidence -- I try again, and 15 of 15 `make
> check' passed with TCP sockets, but only 3 of 15 passed with UNIX
> sockets. So, final decision is `Unix sockets are not relaible on Solaris'

So, shall we change pg_regress.sh to not use Unix sockets on Solaris?

This would potentially cause problems for "make installcheck", if the
postmaster was not started with -i.  I suspect the socket problems are
only seen when many clients try to connect at the same time, so the
parallel regression tests are more prone to trouble than serial.
Perhaps for Solaris, go to TCP only if it's parallel mode?
        regards, tom lane


Re: Re: [BUGS] Tests randomly failed

From
Peter Eisentraut
Date:
Tom Lane writes:

> Alexander Klimov <ask@wisdom.weizmann.ac.il> writes:
> > Yes, it was really just incidence -- I try again, and 15 of 15 `make
> > check' passed with TCP sockets, but only 3 of 15 passed with UNIX
> > sockets. So, final decision is `Unix sockets are not relaible on Solaris'

What become up 'set maxuprc=256'?  I thought that made it work.  Could
other people try it or has it been disproven?

> So, shall we change pg_regress.sh to not use Unix sockets on Solaris?

This would hide problems during the test phase which would reappear in the
production phase, no?

> Perhaps for Solaris, go to TCP only if it's parallel mode?

Unfortunately, it's not possible to detect this globally, only when you're
actually parsing the schedule file and encouter a parallel group.  This
would mean running some tests this way and some tests another way.  That
might not be the worst of ideas, but it should be done on all platforms
then.  Additionally, it don't think it will really fix things, because
some tests that failed were not in a parallel group (and I firmly recall
that some of those were *not* follow-up failures).  I think it is more
related to a "high load" situation.


If I were a Solaris user and had a bit more insight into this problem I
would probably vote for #undef HAVE_UNIX_SOCKETS.  But I'm not...

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Re: [BUGS] Tests randomly failed

From
Mathijs Brands
Date:
On Tue, Mar 27, 2001 at 07:17:47PM +0200, Peter Eisentraut allegedly wrote:
> Tom Lane writes:
> 
> > Alexander Klimov <ask@wisdom.weizmann.ac.il> writes:
> > > Yes, it was really just incidence -- I try again, and 15 of 15 `make
> > > check' passed with TCP sockets, but only 3 of 15 passed with UNIX
> > > sockets. So, final decision is `Unix sockets are not relaible on Solaris'
> 
> What become up 'set maxuprc=256'?  I thought that made it work.  Could
> other people try it or has it been disproven?

I'm giving this a test now...

Cheers,

Mathijs
-- 
$_='while(read+STDIN,$_,2048){$a=29;$c=142;if((@a=unx"C*",$_)[20]&48){$h=5;
$_=unxb24,join"",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d=
unxV,xb25,$_;$b=73;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=($t=255)&($d
>>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q*8^$q<<6))<<9
,$_=(map{$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110;$t
^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z)[$_%8]}(16..271))
[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]}print+x"C*",@a}';s/x/pack+/g;eval 


Solaris 7 SPARC passes tests (was Re: Re: [BUGS] Tests randomly failed)

From
Mathijs Brands
Date:
On Wed, Mar 28, 2001 at 02:40:00AM +0200, Mathijs Brands allegedly wrote:
> On Tue, Mar 27, 2001 at 07:17:47PM +0200, Peter Eisentraut allegedly wrote:
> > Tom Lane writes:
> > 
> > > Alexander Klimov <ask@wisdom.weizmann.ac.il> writes:
> > > > Yes, it was really just incidence -- I try again, and 15 of 15 `make
> > > > check' passed with TCP sockets, but only 3 of 15 passed with UNIX
> > > > sockets. So, final decision is `Unix sockets are not relaible on Solaris'
> > 
> > What become up 'set maxuprc=256'?  I thought that made it work.  Could
> > other people try it or has it been disproven?
> 
> I'm giving this a test now...

No luck :( Tests still randomly crash. (This is an Ultra 10 machine.)

7.1RC1 on Solaris 7 SPARC does pass the regression tests (apart from the
random test, which seems to be ignored on Solaris). (This is an Ultra
420 machine.)

Cheers,

Mathijs
-- 
It's not that perl programmers are idiots, it's that the language
rewards idiotic behavior in a way that no other language or tool has
ever done.                                                   Erik Naggum


Mathijs Brands <mathijs@ilse.nl> writes:
> No luck :( Tests still randomly crash. (This is an Ultra 10 machine.)

How about if you change the pg_regress script to use TCP connections?
(Look for the bit that forces unix_sockets=no for certain OSes, and
add solaris)
        regards, tom lane


Re: Re: [BUGS] Tests randomly failed

From
"Richard T. Robino"
Date:
On 3/27/01 8:05 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Alexander Klimov <ask@wisdom.weizmann.ac.il> writes:
>> Yes, it was really just incidence -- I try again, and 15 of 15 `make
>> check' passed with TCP sockets, but only 3 of 15 passed with UNIX
>> sockets. So, final decision is `Unix sockets are not relaible on Solaris'
> 
> So, shall we change pg_regress.sh to not use Unix sockets on Solaris?

Doh! I just submitted a patch to change pg_regress.sh before reading all of
today's posts. Oh well, it was small.

> This would potentially cause problems for "make installcheck", if the
> postmaster was not started with -i.  I suspect the socket problems are
> only seen when many clients try to connect at the same time, so the
> parallel regression tests are more prone to trouble than serial.
> Perhaps for Solaris, go to TCP only if it's parallel mode?

So just ignore my patch, as it makes pg_regress never use sockets with any
solaris in any mode. However, if you don't want to go with Peter E's #ifdef
solution there may be an easier way which is not dangerous at all. Adding a
--with-sockets option to pg_regress.sh would be trivial and allow one to
test either kind of socket on any platform.

Most people who just want to feel good about the build aren't going to do
much about fixing the unix sockets, and are probably not using them anyway.
Since a default will have to be picked, the one that shows all tests passing
will save some noise.

However, both sockets may need to be checked during regression to detect the
problems Peter mentions. Since there is also a conflict with OpenSSL and
Solaris' crypt, I can see some sun folks opting for running the database
locally as a hedge for trust in a single tier (DMZ) app scenario.

Anyway, after messing around with the script tonight I just wanted to chime
in that pg_regress.sh could use some improvement:

- A more specific postmaster startup for a normal make check which says
whether inet or unix sockets will be used. If you aren't aware of the
problem on solaris and don't check netstat, the message is generic and there
is a socket file in /tmp regardless of what type of socket gets used. Kind
of subtle.

- Maybe consistency in the script itself. Judging by the different styles of
testing and the output between the --temp-install conditions, it appears as
if each section was written by two different people. It could be cleaned up
pretty fast and quite safely.

Not anything that important, but in the interest of making things easier to
understand these changes could be helpful (IMO). At the very least a mention
of the socket thing in regress/README or the Solaris FAQ would be handy. I'd
be happy to do any of the above if you think they are good ideas. If you're
already on it, nevermind and thank you.

Cheers,

-- Rick





Re: Re: [BUGS] Tests randomly failed

From
Peter Eisentraut
Date:
Richard T. Robino writes:

> - A more specific postmaster startup for a normal make check which says
> whether inet or unix sockets will be used. If you aren't aware of the
> problem on solaris and don't check netstat, the message is generic and there
> is a socket file in /tmp regardless of what type of socket gets used. Kind
> of subtle.

Can be done.  I guess the TCP vs Unix domain issue was never this
important before.  The difference is also that the "installcheck" mode can
be used against either kind of socket using the standard --host and --port
options, depending on the requirements of the running server, whereas the
temp install mode handles this issue internally -- and it never used to
make a difference.

> - Maybe consistency in the script itself. Judging by the different styles of
> testing and the output between the --temp-install conditions, it appears as
> if each section was written by two different people. It could be cleaned up
> pretty fast and quite safely.

Although large portions of actual code were copied over from the two
separate predecessors to this script, the conventions and formatting
should tend to be fairly consistent.  Just taking a quick glance now, I
would probably still write it this way, although some ideas for cosmetic
changes, such as the one above, may arise through actual use.

> Not anything that important, but in the interest of making things easier to
> understand these changes could be helpful (IMO). At the very least a mention
> of the socket thing in regress/README or the Solaris FAQ would be handy. I'd
> be happy to do any of the above if you think they are good ideas. If you're
> already on it, nevermind and thank you.

Unless we decide on any code measures, it will end up being documented in
FAQ_Solaris.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Re: [BUGS] Tests randomly failed

From
Justin Clift
Date:
Hi Tom,

My guess is that it would be possible to insert a check to see if the
installed Postmaster was started with -i, and then choose between Unix
domain sockets or TCP.  BUT, whether trying to explain this in the
installation document to the novice user who is setting up PostgreSQL
for about the 1st, 2nd or 3rd time is something to think about...

???

Regards and best wishes,

Justin Clift

Tom Lane wrote:
> 
> Alexander Klimov <ask@wisdom.weizmann.ac.il> writes:
> > Yes, it was really just incidence -- I try again, and 15 of 15 `make
> > check' passed with TCP sockets, but only 3 of 15 passed with UNIX
> > sockets. So, final decision is `Unix sockets are not relaible on Solaris'
> 
> So, shall we change pg_regress.sh to not use Unix sockets on Solaris?
> 
> This would potentially cause problems for "make installcheck", if the
> postmaster was not started with -i.  I suspect the socket problems are
> only seen when many clients try to connect at the same time, so the
> parallel regression tests are more prone to trouble than serial.
> Perhaps for Solaris, go to TCP only if it's parallel mode?
> 
>                         regards, tom lane

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."    - Indira Gandhi


Re: Solaris 7 SPARC passes tests (was Re: Re: [BUGS] Tests randomly failed)

From
Rick Robino
Date:
> On Tue, Mar 27, 2001 at 08:08:47PM -0500, Tom Lane wrote:
> Mathijs Brands <mathijs@ilse.nl> writes:
> > No luck :( Tests still randomly crash. (This is an Ultra 10 machine.)
>
> How about if you change the pg_regress script to use TCP connections?
> (Look for the bit that forces unix_sockets=no for certain OSes, and
> add solaris)
>
>             regards, tom lane

Someone ran into this again yesterday with Solaris x86. The unix
socket problem is probably the same for both architectures, so why
not change pg_regress.sh to include *solaris* as part of the same
case statement that excludes QNX and BeOS for unix sockets? It is
safe to say that Solaris does have this problem.

The postmaster startup test could say something a bit more useful
this way too, as a standard "make check" does not report which type
of sockets are being used (but it does when --temp-install="").
Some folks may want that to be recorded in the output consistently.

A very small patch to do both of those things is attached.

Cheers,

-Rick


Attachment