Thread: minor windows & cygwin regression failures on stable branch

minor windows & cygwin regression failures on stable branch

From
Andrew Dunstan
Date:
I have seen some small regression failures on REL8_0_STABLE - I thought 
as we're coming up to a release I'd better run the stable branch through 
on my buildfarm clients.

Windows has ordering failures on the join and rules tests - Cygwin has a 
failures on the stats test. See buildfarm for details.

Does anyone know why these might have happened? Both these platforms had 
no errors on this branch 3 weeks ago.

cheers

andrew


Re: minor windows & cygwin regression failures on stable branch

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Windows has ordering failures on the join and rules tests - Cygwin has a 
> failures on the stats test. See buildfarm for details.

The ordering failures seem to be because the recent planner hacking has
taken us back to preferring merge joins for these tests, and Windows'
version of qsort has bizarre behavior for equal keys.

I put an ORDER BY in the rules test.  For join, I'm inclined to think
that the best bet is to resurrect the join_1.out variant comparison
file that we had awhile ago.  Unfortunately, what's in the CVS archives
is out of date and can't be used directly.  Could you send me the actual
rules.out you get on Windows to use for a comparison file?

Dunno about the stats failure.  It looks like the stats collector just
isn't working on Cygwin, but AFAIR no one has touched that code lately.

Do these machines fail on HEAD too?  (There don't seem to be any active
Windows buildfarm machines for HEAD, which is surely ungood.  Won't
someone step up and put one into the regular rotation?)
        regards, tom lane


Re: minor windows & cygwin regression failures on stable

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>Windows has ordering failures on the join and rules tests - Cygwin has a 
>>failures on the stats test. See buildfarm for details.
>>    
>>
>
>The ordering failures seem to be because the recent planner hacking has
>taken us back to preferring merge joins for these tests, and Windows'
>version of qsort has bizarre behavior for equal keys.
>
>I put an ORDER BY in the rules test.  For join, I'm inclined to think
>that the best bet is to resurrect the join_1.out variant comparison
>file that we had awhile ago.  Unfortunately, what's in the CVS archives
>is out of date and can't be used directly.  Could you send me the actual
>rules.out you get on Windows to use for a comparison file?
>  
>

join.out sent off list

>Dunno about the stats failure.  It looks like the stats collector just
>isn't working on Cygwin, but AFAIR no one has touched that code lately.
>  
>

It's worked before, that's the strange thing. I'll check some more.

>Do these machines fail on HEAD too?  (There don't seem to be any active
>Windows buildfarm machines for HEAD, which is surely ungood.  Won't
>someone step up and put one into the regular rotation?)
>
>
>  
>

Sufficient unto the day is the evil thereof (appropriate quotation for 
Good friday). I will address HEAD in due course.

The buildfarm members for both of these are in reality my laptop, which 
doesn't even run Windows all the time, and has lots of other duties 
anyway. We (or rather Josh Berkus and Bruce, at my request) are looking 
for a replacement.

cheers

andrew


Re: minor windows & cygwin regression failures on stable

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Dunno about the stats failure.  It looks like the stats collector just
>isn't working on Cygwin, but AFAIR no one has touched that code lately.
>  
>

Well, it seems at least to be running. When I fire up postmaster there 
are 4 processes running and no indication of failure that I could see on 
the log. (There is a complaint about failing to dup(0) after 3195 
successes - I assume that has nothing to do with it?)

The good news is that the regression result fixes made Windows native 
green again.

cheers

andrew




Re: minor windows & cygwin regression failures on stable

From
"Jim Buttafuoco"
Date:
Andrew,

I can set one up a dedicated windows XP system on monday.  I also have some w2k systems that can be used.Are there 
directions anywhere?

Jim


---------- Original Message -----------
From: Andrew Dunstan <andrew@dunslane.net>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Sent: Fri, 25 Mar 2005 22:19:25 -0500
Subject: Re: [HACKERS] minor windows & cygwin regression failures on stable

> Tom Lane wrote:
> 
> >Andrew Dunstan <andrew@dunslane.net> writes:
> >  
> >
> >>Windows has ordering failures on the join and rules tests - Cygwin has a 
> >>failures on the stats test. See buildfarm for details.
> >>    
> >>
> >
> >The ordering failures seem to be because the recent planner hacking has
> >taken us back to preferring merge joins for these tests, and Windows'
> >version of qsort has bizarre behavior for equal keys.
> >
> >I put an ORDER BY in the rules test.  For join, I'm inclined to think
> >that the best bet is to resurrect the join_1.out variant comparison
> >file that we had awhile ago.  Unfortunately, what's in the CVS archives
> >is out of date and can't be used directly.  Could you send me the actual
> >rules.out you get on Windows to use for a comparison file?
> >  
> >
> 
> join.out sent off list
> 
> >Dunno about the stats failure.  It looks like the stats collector just
> >isn't working on Cygwin, but AFAIR no one has touched that code lately.
> >  
> >
> 
> It's worked before, that's the strange thing. I'll check some more.
> 
> >Do these machines fail on HEAD too?  (There don't seem to be any active
> >Windows buildfarm machines for HEAD, which is surely ungood.  Won't
> >someone step up and put one into the regular rotation?)
> >
> >
> >  
> >
> 
> Sufficient unto the day is the evil thereof (appropriate quotation for 
> Good friday). I will address HEAD in due course.
> 
> The buildfarm members for both of these are in reality my laptop, which 
> doesn't even run Windows all the time, and has lots of other duties 
> anyway. We (or rather Josh Berkus and Bruce, at my request) are looking 
> for a replacement.
> 
> cheers
> 
> andrew
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq
------- End of Original Message -------



Re: minor windows & cygwin regression failures on stable

From
Andrew Dunstan
Date:
Jim,

that is just execllent! Thank you so much! I assume you mean XP-Pro - I 
gather that user permissions get in the way on XP-HE.

We can make the one machine do double duty for Windows and Cygwin.

You will need installed: . cygwin, including whatever it takes to build cygwin postgres . native perl (ActiveState perl
worksjust fine) . normal native windows postgres build environment (MSys, Mingw, 
 
MSysDTK etc).

After that we'll explore a bit what it takes to automate the buildfarm 
processes on Windows. You and I can do that offline.

Let's get this going first and then look at W2K.

cheers

andrew



Jim Buttafuoco wrote:

>Andrew,
>
>I can set one up a dedicated windows XP system on monday.  I also have some w2k systems that can be used.Are there 
>directions anywhere?
>
>Jim
>
>
>---------- Original Message -----------
>From: Andrew Dunstan <andrew@dunslane.net>
>To: Tom Lane <tgl@sss.pgh.pa.us>
>Cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
>Sent: Fri, 25 Mar 2005 22:19:25 -0500
>Subject: Re: [HACKERS] minor windows & cygwin regression failures on stable
>
>  
>
>>Tom Lane wrote:
>>
>>    
>>
>>>Andrew Dunstan <andrew@dunslane.net> writes:
>>> 
>>>
>>>      
>>>
>>>>Windows has ordering failures on the join and rules tests - Cygwin has a 
>>>>failures on the stats test. See buildfarm for details.
>>>>   
>>>>
>>>>        
>>>>
>>>The ordering failures seem to be because the recent planner hacking has
>>>taken us back to preferring merge joins for these tests, and Windows'
>>>version of qsort has bizarre behavior for equal keys.
>>>
>>>I put an ORDER BY in the rules test.  For join, I'm inclined to think
>>>that the best bet is to resurrect the join_1.out variant comparison
>>>file that we had awhile ago.  Unfortunately, what's in the CVS archives
>>>is out of date and can't be used directly.  Could you send me the actual
>>>rules.out you get on Windows to use for a comparison file?
>>> 
>>>
>>>      
>>>
>>join.out sent off list
>>
>>    
>>
>>>Dunno about the stats failure.  It looks like the stats collector just
>>>isn't working on Cygwin, but AFAIR no one has touched that code lately.
>>> 
>>>
>>>      
>>>
>>It's worked before, that's the strange thing. I'll check some more.
>>
>>    
>>
>>>Do these machines fail on HEAD too?  (There don't seem to be any active
>>>Windows buildfarm machines for HEAD, which is surely ungood.  Won't
>>>someone step up and put one into the regular rotation?)
>>>
>>>
>>> 
>>>
>>>      
>>>
>>Sufficient unto the day is the evil thereof (appropriate quotation for 
>>Good friday). I will address HEAD in due course.
>>
>>The buildfarm members for both of these are in reality my laptop, which 
>>doesn't even run Windows all the time, and has lots of other duties 
>>anyway. We (or rather Josh Berkus and Bruce, at my request) are looking 
>>for a replacement.
>>
>>cheers
>>
>>andrew
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 5: Have you checked our extensive FAQ?
>>
>>               http://www.postgresql.org/docs/faq
>>    
>>
>------- End of Original Message -------
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend
>
>  
>


Re: minor windows & cygwin regression failures on stable branch

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Well, it seems at least to be running. When I fire up postmaster there 
> are 4 processes running and no indication of failure that I could see on 
> the log. (There is a complaint about failing to dup(0) after 3195 
> successes - I assume that has nothing to do with it?)

No, that's some code that's trying to measure the number of files we are
allowed to open.  It expects to fail, it just thought the particular
errno it got was odd enough to report.  Might be worth an #ifdef to tell
it that that errno is expected on Cygwin?

As far as the test failure, maybe we are just not allowing enough time
for the stats collector to run?  The thing sits there for 2 sec, which
theoretically is plenty, but it's a busy-wait loop and if the Cygwin
scheduler is not aggressive about taking away timeslices then maybe
the stats processes don't get to run.  Try doing the test script by
hand, with just a manual delay instead of the sleep function, and see
if it passes.
        regards, tom lane


Re: minor windows & cygwin regression failures on stable

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>Well, it seems at least to be running. When I fire up postmaster there 
>>are 4 processes running and no indication of failure that I could see on 
>>the log. (There is a complaint about failing to dup(0) after 3195 
>>successes - I assume that has nothing to do with it?)
>>    
>>
>
>No, that's some code that's trying to measure the number of files we are
>allowed to open.  It expects to fail, it just thought the particular
>errno it got was odd enough to report.  Might be worth an #ifdef to tell
>it that that errno is expected on Cygwin?
>  
>

Yes, looks like the error is EBADF.

>As far as the test failure, maybe we are just not allowing enough time
>for the stats collector to run?  The thing sits there for 2 sec, which
>theoretically is plenty, but it's a busy-wait loop and if the Cygwin
>scheduler is not aggressive about taking away timeslices then maybe
>the stats processes don't get to run.  Try doing the test script by
>hand, with just a manual delay instead of the sleep function, and see
>if it passes.
>
>            
>  
>

Yes, when I do that it works. But even when I increase the interval to 
30 secs the regression script fails. I tried to use a sleep function 
that didn't do a busy-wait loop, but plperl seems to segfault on this 
platform :-( What fun.

What has changed in the last 3 weeks is that I refreshed my Cygwin 
installation, I think when I was wrestling with the NLS thing.  If 
nothing in postgres has changed in this area I assume that platform  
changes account for the regression.

cheers

andrew.


Re: minor windows & cygwin regression failures on stable branch

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> What has changed in the last 3 weeks is that I refreshed my Cygwin 
> installation, I think when I was wrestling with the NLS thing.  If 
> nothing in postgres has changed in this area I assume that platform  
> changes account for the regression.

Sounds that way to me too, but it's disturbing.  One would say they
broke their scheduler :-(.  Possibly you should try to stir up some
interest among the Cygwin hackers in looking into this.
        regards, tom lane


Re: minor windows & cygwin regression failures on stable

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>What has changed in the last 3 weeks is that I refreshed my Cygwin 
>>installation, I think when I was wrestling with the NLS thing.  If 
>>nothing in postgres has changed in this area I assume that platform  
>>changes account for the regression.
>>    
>>
>
>Sounds that way to me too, but it's disturbing.  One would say they
>broke their scheduler :-(.  Possibly you should try to stir up some
>interest among the Cygwin hackers in looking into this.
>
>
>  
>

I'd like somebody else to report the same phenomenon first. Reini?

cheers

andrew


Re: minor windows & cygwin regression failures on stable

From
Andrew Dunstan
Date:

Andrew Dunstan wrote:

>> As far as the test failure, maybe we are just not allowing enough time
>> for the stats collector to run?  The thing sits there for 2 sec, which
>> theoretically is plenty, but it's a busy-wait loop and if the Cygwin
>> scheduler is not aggressive about taking away timeslices then maybe
>> the stats processes don't get to run.  Try doing the test script by
>> hand, with just a manual delay instead of the sleep function, and see
>> if it passes.
>>
>>            
>>  
>>
>
> Yes, when I do that it works. But even when I increase the interval to 
> 30 secs the regression script fails. I tried to use a sleep function 
> that didn't do a busy-wait loop, but plperl seems to segfault on this 
> platform :-( What fun.
>
>

Further data point - the expected result appears when I set the sleep 
interval at 1 minute, but not at 40 secs. That does indicate that the 
stats collector is actually running and doing its job (kinda).

cheers

andrew


Re: minor windows & cygwin regression failures on stable

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Further data point - the expected result appears when I set the sleep 
> interval at 1 minute, but not at 40 secs. That does indicate that the 
> stats collector is actually running and doing its job (kinda).

Hmm ... maybe the intentional sleep in the stats collector is delaying
much longer than it is supposed to?
        regards, tom lane


Re: minor windows & cygwin regression failures on stable

From
"Reini Urban"
Date:
> Tom Lane wrote:
>>Andrew Dunstan <andrew@dunslane.net> writes:
>>
>>>What has changed in the last 3 weeks is that I refreshed my Cygwin
>>> installation, I think when I was wrestling with the NLS thing.  If
>>> nothing in postgres has changed in this area I assume that platform
>>> changes account for the regression.
>>
>>Sounds that way to me too, but it's disturbing.  One would say they
>> broke their scheduler :-(.  Possibly you should try to stir up some
>> interest among the Cygwin hackers in looking into this.
>>
>
> I'd like somebody else to report the same phenomenon first. Reini?

Why plperl is broken I cannot say yet.

I still have the same general IPC permission problem since about beta3.
Only very few cygwin hackers have this also.
I only got confirmation that the problem is in postgresql, not in cygwin.
-- 
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/




Re: minor windows & cygwin regression failures on stable

From
"Andrew Dunstan"
Date:
Reini Urban said:
>> Tom Lane wrote:
>>>Andrew Dunstan <andrew@dunslane.net> writes:
>>>
>>>>What has changed in the last 3 weeks is that I refreshed my Cygwin
>>>> installation, I think when I was wrestling with the NLS thing.  If
>>>> nothing in postgres has changed in this area I assume that platform
>>>> changes account for the regression.
>>>
>>>Sounds that way to me too, but it's disturbing.  One would say they
>>> broke their scheduler :-(.  Possibly you should try to stir up some
>>> interest among the Cygwin hackers in looking into this.
>>>
>>
>> I'd like somebody else to report the same phenomenon first. Reini?
>
> Why plperl is broken I cannot say yet.
>
> I still have the same general IPC permission problem since about beta3.
> Only very few cygwin hackers have this also.
> I only got confirmation that the problem is in postgresql, not in
> cygwin.

Hmm. Well, JimB got the same result yesterday that I have been seeing (the
stats regression test failure), so I consider that sufficient confoirmation.

I'm not quite sure what question I should be asking of the Cygwin people.
Tom, Can you suggest something?

I'll look at the plperl thing. That has also prompted me to add a buildfarm
feature request to test perl, python and tcl if they are configured in.

cheers

andrew




Re: minor windows & cygwin regression failures on stable

From
Tom Lane
Date:
"Andrew Dunstan" <andrew@dunslane.net> writes:
> I'm not quite sure what question I should be asking of the Cygwin people.
> Tom, Can you suggest something?

It sounds to me like the problem is that the backend executing the test
script is in a tight loop (due to the half-baked implementation of sleep())
and for some reason this prevents the stats processes from running ---
for a far longer period than it by rights ought to.  Ask about recent
changes in process scheduling policy.  (I suppose that actually it's
Windows doing the scheduling, but what we want to know about is cygwin
changes that might have affected Windows scheduling parameters.)
        regards, tom lane


Re: minor windows & cygwin regression failures on stable

From
Andrew Dunstan
Date:

Tom Lane wrote:

>"Andrew Dunstan" <andrew@dunslane.net> writes:
>  
>
>>I'm not quite sure what question I should be asking of the Cygwin people.
>>Tom, Can you suggest something?
>>    
>>
>
>It sounds to me like the problem is that the backend executing the test
>script is in a tight loop (due to the half-baked implementation of sleep())
>and for some reason this prevents the stats processes from running ---
>for a far longer period than it by rights ought to.  Ask about recent
>changes in process scheduling policy.  (I suppose that actually it's
>Windows doing the scheduling, but what we want to know about is cygwin
>changes that might have affected Windows scheduling parameters.)
>  
>

The only answer so far received says:

Sounds to me like yet another case of
http://cygwin.com/ml/cygwin/2005-03/msg00730.html

cheers

andrew




Re: minor windows & cygwin regression failures on stable

From
"Jim Buttafuoco"
Date:
Andrew,

I can confirm that the latest cygwin snapshot (cygwin1-20050328.dll) corrects the stats regression failure.

Jim



---------- Original Message -----------
From: Andrew Dunstan <andrew@dunslane.net>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: rurban@x-ray.at, pgsql-hackers@postgresql.org
Sent: Wed, 30 Mar 2005 07:21:50 -0500
Subject: Re: [HACKERS] minor windows & cygwin regression failures on stable

> Tom Lane wrote:
> 
> >"Andrew Dunstan" <andrew@dunslane.net> writes:
> >  
> >
> >>I'm not quite sure what question I should be asking of the Cygwin people.
> >>Tom, Can you suggest something?
> >>    
> >>
> >
> >It sounds to me like the problem is that the backend executing the test
> >script is in a tight loop (due to the half-baked implementation of sleep())
> >and for some reason this prevents the stats processes from running ---
> >for a far longer period than it by rights ought to.  Ask about recent
> >changes in process scheduling policy.  (I suppose that actually it's
> >Windows doing the scheduling, but what we want to know about is cygwin
> >changes that might have affected Windows scheduling parameters.)
> >  
> >
> 
> The only answer so far received says:
> 
> Sounds to me like yet another case of
> http://cygwin.com/ml/cygwin/2005-03/msg00730.html
> 
> cheers
> 
> andrew
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
------- End of Original Message -------



Re: minor windows & cygwin regression failures on stable

From
Andrew Dunstan
Date:

Jim Buttafuoco wrote:

>I can confirm that the latest cygwin snapshot (cygwin1-20050328.dll) corrects the stats regression failure.
>
>  
>

Yes, it does for me too. Thanks

andrew