Thread: minor windows & cygwin regression failures on stable branch
I have seen some small regression failures on REL8_0_STABLE - I thought as we're coming up to a release I'd better run the stable branch through on my buildfarm clients. Windows has ordering failures on the join and rules tests - Cygwin has a failures on the stats test. See buildfarm for details. Does anyone know why these might have happened? Both these platforms had no errors on this branch 3 weeks ago. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Windows has ordering failures on the join and rules tests - Cygwin has a > failures on the stats test. See buildfarm for details. The ordering failures seem to be because the recent planner hacking has taken us back to preferring merge joins for these tests, and Windows' version of qsort has bizarre behavior for equal keys. I put an ORDER BY in the rules test. For join, I'm inclined to think that the best bet is to resurrect the join_1.out variant comparison file that we had awhile ago. Unfortunately, what's in the CVS archives is out of date and can't be used directly. Could you send me the actual rules.out you get on Windows to use for a comparison file? Dunno about the stats failure. It looks like the stats collector just isn't working on Cygwin, but AFAIR no one has touched that code lately. Do these machines fail on HEAD too? (There don't seem to be any active Windows buildfarm machines for HEAD, which is surely ungood. Won't someone step up and put one into the regular rotation?) regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>Windows has ordering failures on the join and rules tests - Cygwin has a >>failures on the stats test. See buildfarm for details. >> >> > >The ordering failures seem to be because the recent planner hacking has >taken us back to preferring merge joins for these tests, and Windows' >version of qsort has bizarre behavior for equal keys. > >I put an ORDER BY in the rules test. For join, I'm inclined to think >that the best bet is to resurrect the join_1.out variant comparison >file that we had awhile ago. Unfortunately, what's in the CVS archives >is out of date and can't be used directly. Could you send me the actual >rules.out you get on Windows to use for a comparison file? > > join.out sent off list >Dunno about the stats failure. It looks like the stats collector just >isn't working on Cygwin, but AFAIR no one has touched that code lately. > > It's worked before, that's the strange thing. I'll check some more. >Do these machines fail on HEAD too? (There don't seem to be any active >Windows buildfarm machines for HEAD, which is surely ungood. Won't >someone step up and put one into the regular rotation?) > > > > Sufficient unto the day is the evil thereof (appropriate quotation for Good friday). I will address HEAD in due course. The buildfarm members for both of these are in reality my laptop, which doesn't even run Windows all the time, and has lots of other duties anyway. We (or rather Josh Berkus and Bruce, at my request) are looking for a replacement. cheers andrew
Tom Lane wrote: >Dunno about the stats failure. It looks like the stats collector just >isn't working on Cygwin, but AFAIR no one has touched that code lately. > > Well, it seems at least to be running. When I fire up postmaster there are 4 processes running and no indication of failure that I could see on the log. (There is a complaint about failing to dup(0) after 3195 successes - I assume that has nothing to do with it?) The good news is that the regression result fixes made Windows native green again. cheers andrew
Andrew, I can set one up a dedicated windows XP system on monday. I also have some w2k systems that can be used.Are there directions anywhere? Jim ---------- Original Message ----------- From: Andrew Dunstan <andrew@dunslane.net> To: Tom Lane <tgl@sss.pgh.pa.us> Cc: PostgreSQL-development <pgsql-hackers@postgresql.org> Sent: Fri, 25 Mar 2005 22:19:25 -0500 Subject: Re: [HACKERS] minor windows & cygwin regression failures on stable > Tom Lane wrote: > > >Andrew Dunstan <andrew@dunslane.net> writes: > > > > > >>Windows has ordering failures on the join and rules tests - Cygwin has a > >>failures on the stats test. See buildfarm for details. > >> > >> > > > >The ordering failures seem to be because the recent planner hacking has > >taken us back to preferring merge joins for these tests, and Windows' > >version of qsort has bizarre behavior for equal keys. > > > >I put an ORDER BY in the rules test. For join, I'm inclined to think > >that the best bet is to resurrect the join_1.out variant comparison > >file that we had awhile ago. Unfortunately, what's in the CVS archives > >is out of date and can't be used directly. Could you send me the actual > >rules.out you get on Windows to use for a comparison file? > > > > > > join.out sent off list > > >Dunno about the stats failure. It looks like the stats collector just > >isn't working on Cygwin, but AFAIR no one has touched that code lately. > > > > > > It's worked before, that's the strange thing. I'll check some more. > > >Do these machines fail on HEAD too? (There don't seem to be any active > >Windows buildfarm machines for HEAD, which is surely ungood. Won't > >someone step up and put one into the regular rotation?) > > > > > > > > > > Sufficient unto the day is the evil thereof (appropriate quotation for > Good friday). I will address HEAD in due course. > > The buildfarm members for both of these are in reality my laptop, which > doesn't even run Windows all the time, and has lots of other duties > anyway. We (or rather Josh Berkus and Bruce, at my request) are looking > for a replacement. > > cheers > > andrew > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq ------- End of Original Message -------
Jim, that is just execllent! Thank you so much! I assume you mean XP-Pro - I gather that user permissions get in the way on XP-HE. We can make the one machine do double duty for Windows and Cygwin. You will need installed: . cygwin, including whatever it takes to build cygwin postgres . native perl (ActiveState perl worksjust fine) . normal native windows postgres build environment (MSys, Mingw, MSysDTK etc). After that we'll explore a bit what it takes to automate the buildfarm processes on Windows. You and I can do that offline. Let's get this going first and then look at W2K. cheers andrew Jim Buttafuoco wrote: >Andrew, > >I can set one up a dedicated windows XP system on monday. I also have some w2k systems that can be used.Are there >directions anywhere? > >Jim > > >---------- Original Message ----------- >From: Andrew Dunstan <andrew@dunslane.net> >To: Tom Lane <tgl@sss.pgh.pa.us> >Cc: PostgreSQL-development <pgsql-hackers@postgresql.org> >Sent: Fri, 25 Mar 2005 22:19:25 -0500 >Subject: Re: [HACKERS] minor windows & cygwin regression failures on stable > > > >>Tom Lane wrote: >> >> >> >>>Andrew Dunstan <andrew@dunslane.net> writes: >>> >>> >>> >>> >>>>Windows has ordering failures on the join and rules tests - Cygwin has a >>>>failures on the stats test. See buildfarm for details. >>>> >>>> >>>> >>>> >>>The ordering failures seem to be because the recent planner hacking has >>>taken us back to preferring merge joins for these tests, and Windows' >>>version of qsort has bizarre behavior for equal keys. >>> >>>I put an ORDER BY in the rules test. For join, I'm inclined to think >>>that the best bet is to resurrect the join_1.out variant comparison >>>file that we had awhile ago. Unfortunately, what's in the CVS archives >>>is out of date and can't be used directly. Could you send me the actual >>>rules.out you get on Windows to use for a comparison file? >>> >>> >>> >>> >>join.out sent off list >> >> >> >>>Dunno about the stats failure. It looks like the stats collector just >>>isn't working on Cygwin, but AFAIR no one has touched that code lately. >>> >>> >>> >>> >>It's worked before, that's the strange thing. I'll check some more. >> >> >> >>>Do these machines fail on HEAD too? (There don't seem to be any active >>>Windows buildfarm machines for HEAD, which is surely ungood. Won't >>>someone step up and put one into the regular rotation?) >>> >>> >>> >>> >>> >>> >>Sufficient unto the day is the evil thereof (appropriate quotation for >>Good friday). I will address HEAD in due course. >> >>The buildfarm members for both of these are in reality my laptop, which >>doesn't even run Windows all the time, and has lots of other duties >>anyway. We (or rather Josh Berkus and Bruce, at my request) are looking >>for a replacement. >> >>cheers >> >>andrew >> >>---------------------------(end of broadcast)--------------------------- >>TIP 5: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faq >> >> >------- End of Original Message ------- > > >---------------------------(end of broadcast)--------------------------- >TIP 8: explain analyze is your friend > > >
Andrew Dunstan <andrew@dunslane.net> writes: > Well, it seems at least to be running. When I fire up postmaster there > are 4 processes running and no indication of failure that I could see on > the log. (There is a complaint about failing to dup(0) after 3195 > successes - I assume that has nothing to do with it?) No, that's some code that's trying to measure the number of files we are allowed to open. It expects to fail, it just thought the particular errno it got was odd enough to report. Might be worth an #ifdef to tell it that that errno is expected on Cygwin? As far as the test failure, maybe we are just not allowing enough time for the stats collector to run? The thing sits there for 2 sec, which theoretically is plenty, but it's a busy-wait loop and if the Cygwin scheduler is not aggressive about taking away timeslices then maybe the stats processes don't get to run. Try doing the test script by hand, with just a manual delay instead of the sleep function, and see if it passes. regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>Well, it seems at least to be running. When I fire up postmaster there >>are 4 processes running and no indication of failure that I could see on >>the log. (There is a complaint about failing to dup(0) after 3195 >>successes - I assume that has nothing to do with it?) >> >> > >No, that's some code that's trying to measure the number of files we are >allowed to open. It expects to fail, it just thought the particular >errno it got was odd enough to report. Might be worth an #ifdef to tell >it that that errno is expected on Cygwin? > > Yes, looks like the error is EBADF. >As far as the test failure, maybe we are just not allowing enough time >for the stats collector to run? The thing sits there for 2 sec, which >theoretically is plenty, but it's a busy-wait loop and if the Cygwin >scheduler is not aggressive about taking away timeslices then maybe >the stats processes don't get to run. Try doing the test script by >hand, with just a manual delay instead of the sleep function, and see >if it passes. > > > > Yes, when I do that it works. But even when I increase the interval to 30 secs the regression script fails. I tried to use a sleep function that didn't do a busy-wait loop, but plperl seems to segfault on this platform :-( What fun. What has changed in the last 3 weeks is that I refreshed my Cygwin installation, I think when I was wrestling with the NLS thing. If nothing in postgres has changed in this area I assume that platform changes account for the regression. cheers andrew.
Andrew Dunstan <andrew@dunslane.net> writes: > What has changed in the last 3 weeks is that I refreshed my Cygwin > installation, I think when I was wrestling with the NLS thing. If > nothing in postgres has changed in this area I assume that platform > changes account for the regression. Sounds that way to me too, but it's disturbing. One would say they broke their scheduler :-(. Possibly you should try to stir up some interest among the Cygwin hackers in looking into this. regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>What has changed in the last 3 weeks is that I refreshed my Cygwin >>installation, I think when I was wrestling with the NLS thing. If >>nothing in postgres has changed in this area I assume that platform >>changes account for the regression. >> >> > >Sounds that way to me too, but it's disturbing. One would say they >broke their scheduler :-(. Possibly you should try to stir up some >interest among the Cygwin hackers in looking into this. > > > > I'd like somebody else to report the same phenomenon first. Reini? cheers andrew
Andrew Dunstan wrote: >> As far as the test failure, maybe we are just not allowing enough time >> for the stats collector to run? The thing sits there for 2 sec, which >> theoretically is plenty, but it's a busy-wait loop and if the Cygwin >> scheduler is not aggressive about taking away timeslices then maybe >> the stats processes don't get to run. Try doing the test script by >> hand, with just a manual delay instead of the sleep function, and see >> if it passes. >> >> >> >> > > Yes, when I do that it works. But even when I increase the interval to > 30 secs the regression script fails. I tried to use a sleep function > that didn't do a busy-wait loop, but plperl seems to segfault on this > platform :-( What fun. > > Further data point - the expected result appears when I set the sleep interval at 1 minute, but not at 40 secs. That does indicate that the stats collector is actually running and doing its job (kinda). cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Further data point - the expected result appears when I set the sleep > interval at 1 minute, but not at 40 secs. That does indicate that the > stats collector is actually running and doing its job (kinda). Hmm ... maybe the intentional sleep in the stats collector is delaying much longer than it is supposed to? regards, tom lane
> Tom Lane wrote: >>Andrew Dunstan <andrew@dunslane.net> writes: >> >>>What has changed in the last 3 weeks is that I refreshed my Cygwin >>> installation, I think when I was wrestling with the NLS thing. If >>> nothing in postgres has changed in this area I assume that platform >>> changes account for the regression. >> >>Sounds that way to me too, but it's disturbing. One would say they >> broke their scheduler :-(. Possibly you should try to stir up some >> interest among the Cygwin hackers in looking into this. >> > > I'd like somebody else to report the same phenomenon first. Reini? Why plperl is broken I cannot say yet. I still have the same general IPC permission problem since about beta3. Only very few cygwin hackers have this also. I only got confirmation that the problem is in postgresql, not in cygwin. -- Reini Urban http://xarch.tu-graz.ac.at/home/rurban/
Reini Urban said: >> Tom Lane wrote: >>>Andrew Dunstan <andrew@dunslane.net> writes: >>> >>>>What has changed in the last 3 weeks is that I refreshed my Cygwin >>>> installation, I think when I was wrestling with the NLS thing. If >>>> nothing in postgres has changed in this area I assume that platform >>>> changes account for the regression. >>> >>>Sounds that way to me too, but it's disturbing. One would say they >>> broke their scheduler :-(. Possibly you should try to stir up some >>> interest among the Cygwin hackers in looking into this. >>> >> >> I'd like somebody else to report the same phenomenon first. Reini? > > Why plperl is broken I cannot say yet. > > I still have the same general IPC permission problem since about beta3. > Only very few cygwin hackers have this also. > I only got confirmation that the problem is in postgresql, not in > cygwin. Hmm. Well, JimB got the same result yesterday that I have been seeing (the stats regression test failure), so I consider that sufficient confoirmation. I'm not quite sure what question I should be asking of the Cygwin people. Tom, Can you suggest something? I'll look at the plperl thing. That has also prompted me to add a buildfarm feature request to test perl, python and tcl if they are configured in. cheers andrew
"Andrew Dunstan" <andrew@dunslane.net> writes: > I'm not quite sure what question I should be asking of the Cygwin people. > Tom, Can you suggest something? It sounds to me like the problem is that the backend executing the test script is in a tight loop (due to the half-baked implementation of sleep()) and for some reason this prevents the stats processes from running --- for a far longer period than it by rights ought to. Ask about recent changes in process scheduling policy. (I suppose that actually it's Windows doing the scheduling, but what we want to know about is cygwin changes that might have affected Windows scheduling parameters.) regards, tom lane
Tom Lane wrote: >"Andrew Dunstan" <andrew@dunslane.net> writes: > > >>I'm not quite sure what question I should be asking of the Cygwin people. >>Tom, Can you suggest something? >> >> > >It sounds to me like the problem is that the backend executing the test >script is in a tight loop (due to the half-baked implementation of sleep()) >and for some reason this prevents the stats processes from running --- >for a far longer period than it by rights ought to. Ask about recent >changes in process scheduling policy. (I suppose that actually it's >Windows doing the scheduling, but what we want to know about is cygwin >changes that might have affected Windows scheduling parameters.) > > The only answer so far received says: Sounds to me like yet another case of http://cygwin.com/ml/cygwin/2005-03/msg00730.html cheers andrew
Andrew, I can confirm that the latest cygwin snapshot (cygwin1-20050328.dll) corrects the stats regression failure. Jim ---------- Original Message ----------- From: Andrew Dunstan <andrew@dunslane.net> To: Tom Lane <tgl@sss.pgh.pa.us> Cc: rurban@x-ray.at, pgsql-hackers@postgresql.org Sent: Wed, 30 Mar 2005 07:21:50 -0500 Subject: Re: [HACKERS] minor windows & cygwin regression failures on stable > Tom Lane wrote: > > >"Andrew Dunstan" <andrew@dunslane.net> writes: > > > > > >>I'm not quite sure what question I should be asking of the Cygwin people. > >>Tom, Can you suggest something? > >> > >> > > > >It sounds to me like the problem is that the backend executing the test > >script is in a tight loop (due to the half-baked implementation of sleep()) > >and for some reason this prevents the stats processes from running --- > >for a far longer period than it by rights ought to. Ask about recent > >changes in process scheduling policy. (I suppose that actually it's > >Windows doing the scheduling, but what we want to know about is cygwin > >changes that might have affected Windows scheduling parameters.) > > > > > > The only answer so far received says: > > Sounds to me like yet another case of > http://cygwin.com/ml/cygwin/2005-03/msg00730.html > > cheers > > andrew > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) ------- End of Original Message -------
Jim Buttafuoco wrote: >I can confirm that the latest cygwin snapshot (cygwin1-20050328.dll) corrects the stats regression failure. > > > Yes, it does for me too. Thanks andrew