Thread: Buildfarm issues on specific machines
I spent a little time today cleaning up easily-fixed problems that are causing buildfarm failures in various back branches. Hopefully that will result in a few more "green" entries over the new few days. While I was looking, I noticed several machines that seem to be failing because of local issues: potoroo [HEAD, 7.4]: lock file "/tmp/.s.PGSQL.65432.lock" already exists I'm not sure if this is a problem with a stale lock file left around from an old run, or if it happens because the machine is configured to try to build/test several branches in parallel. In any case, it might be worthwhile to try to hack the buildfarm script so that the Unix socket files are allocated in a per-branch scratch directory, not in /tmp. Or try to change pg_regress to use different port numbers for different branches? osprey [HEAD]: could not create shared memory segment: Cannot allocate memory DETAIL: Failed system call was shmget(key=2, size=1957888, 03600). Kernel shmem settings too small... dragonfly [HEAD]: libz link error As per earlier discussion, I maintain this is local misconfiguration. cobra [7.4, 7.3, 7.2]: --with-tcl but no Tk Possibly adding --without-tk to the configure options is the right answer. Otherwise, install Tk or remove --with-tcl. cuckoo [7.3, 7.2]: --enable-nls without OS support This looks like pilot error; but the later branches don't fail on this machine, so did we change something in this area? caribou [7.2]: no "flex" installed This looks like pilot error as well, though again I don't understand why the later branches seem to work. Are we sure the same PATH is being used for every branch here? Why doesn't the buildfarm report for 7.2 show the PATH? regards, tom lane
"Pete St. Onge" <pete@economics.utoronto.ca> writes: > Perhaps this will help in the diagnosis of why REL7_2_STABLE fails on > arbor (aka caribou). Please let me know if there is anything I can try > on this side, or if there is any other info you could use. > pete@arbor:~$ flex -V > flex 2.5.31 Ah. PG 7.2's configure script thinks this version of flex is "too old" to use, and so ignores it. It's actually too new --- 2.5.4 is the only flex version safe to use with older PG releases. I recall that we have seen some apparent bugs in 2.5.31; I don't recall at the moment whether we've worked around all the issues as of recent releases, but we surely had not as of 7.2. The short answer is that you should install flex 2.5.4, or else forget about testing the 7.2 branch. I don't think anyone will be very interested in making 7.2 work with flex 2.5.31. regards, tom lane
On Sun, 17 Jul 2005, Tom Lane wrote: > The short answer is that you should install flex 2.5.4, or else forget > about testing the 7.2 branch. I don't think anyone will be very > interested in making 7.2 work with flex 2.5.31. > Actually there are problems in the 7.3 branch as well in the cube, tsearch, and seg modules. Here are some patches for the 7.2 version check and 7.2 and 7.3 tsearch code. I'll work on getting cube and seg up to speed as well if people agree we want these fixes. Kris Jurka
First off, thanks for looking into this, Tom, and thanks to Andrew for all the stellar work on the buildfarm, I'm glad to be a part of it. Perhaps this will help in the diagnosis of why REL7_2_STABLE fails on arbor (aka caribou). Please let me know if there is anything I can try on this side, or if there is any other info you could use. Thanks, Pete arbor:~# whereis flex flex: /usr/bin/flex /usr/share/man/man1/flex.1.gz arbor:~# whereis gcc gcc: /usr/bin/gcc /usr/share/man/man1/gcc.1.gz pete@arbor:~$ flex -V flex 2.5.31 arbor:~# crontab -l 30 2 * * * PATH=/bin:/usr/bin:/home/pete/bin:/home/pete/projects/pg_build_farm/build-farm-2.05 cd ~/projects/pg_build_farm/build-farm-2.05&& for foo in HEAD REL7_2_STABLE REL7_3_STABLE REL7_4_STABLE REL8_0_STABLE REL8_1_STABLE;do ./run_build.pl --verbose $foo; done On Sat, Jul 16, 2005 at 11:17:29PM -0400, Tom Lane wrote: [Clip] > caribou [7.2]: no "flex" installed > > This looks like pilot error as well, though again I don't understand why the > later branches seem to work. Are we sure the same PATH is being used for > every branch here? Why doesn't the buildfarm report for 7.2 show the PATH? > > regards, tom lane -- Peter D. St. Onge, http://pete.economics.utoronto.ca/ IT Admin, Department of Economics pete@economics.utoronto.ca SB01, 150 St. George, University of Toronto 416-946-3594
Kris Jurka <books@ejurka.com> writes: > On Sun, 17 Jul 2005, Tom Lane wrote: >> The short answer is that you should install flex 2.5.4, or else forget >> about testing the 7.2 branch. I don't think anyone will be very >> interested in making 7.2 work with flex 2.5.31. > Actually there are problems in the 7.3 branch as well in the cube, > tsearch, and seg modules. Here are some patches for the 7.2 version check > and 7.2 and 7.3 tsearch code. I'll work on getting cube and seg up to > speed as well if people agree we want these fixes. The cube and seg fixes were pretty invasive, and I have a feeling that there were other changes needed in all the .l files to play nice with 2.5.31 (though a quick troll of the CVS logs failed to show any other patches specifically marked that way). My opinion is that we should just say "flex 2.5.31 is unsupported before PG 7.4" --- and possibly fix the configure tests to reject it in 7.3 too. This is a considerably bigger issue for the buildfarm than it would be for ordinary users of our distribution, since in the distro it's only the contrib modules that you actually need to run through your local flex. On a buildfarm machine you had better be running the project-approved versions of bison and flex. At this writing, the approved version of flex is still 2.5.4, even for CVS tip. regards, tom lane
Tom, thanks for this. I regularly send out private emails about what appear to be local issues. Tom Lane wrote: >I spent a little time today cleaning up easily-fixed problems that are >causing buildfarm failures in various back branches. Hopefully that >will result in a few more "green" entries over the new few days. While >I was looking, I noticed several machines that seem to be failing >because of local issues: > >potoroo [HEAD, 7.4]: lock file "/tmp/.s.PGSQL.65432.lock" already exists > >I'm not sure if this is a problem with a stale lock file left around >from an old run, or if it happens because the machine is configured to >try to build/test several branches in parallel. In any case, it might >be worthwhile to try to hack the buildfarm script so that the Unix >socket files are allocated in a per-branch scratch directory, not in >/tmp. Or try to change pg_regress to use different port numbers for >different branches? > > Buildfarm is set up so that each branch gets its own non-standard port. It also prevents you (via a lock file) from running concurrently on a branch unless you have multiple repositories. So only "make check" should have any problems here, not any of the other tests, and only because the port for that set of tests is hardcoded. So I think the right (and certainly simplest) thing is to make the tmp port relative to the default port. Something like what is below - I chose 5 instead of 6 as the leading digit to avoid possible overflow where the default port number is a high 4 digit number. I guess if we were paranoid we could also check for 5 digit default port numbers. cheers andrew Index: src/test/regress/GNUmakefile =================================================================== RCS file: /projects/cvsroot/pgsql/src/test/regress/GNUmakefile,v retrieving revision 1.49 diff -c -r1.49 GNUmakefile *** src/test/regress/GNUmakefile 11 May 2005 21:52:03 -0000 1.49 --- src/test/regress/GNUmakefile 17 Jul 2005 14:52:59 -0000 *************** *** 45,50 **** --- 45,51 ---- -e 's,@libdir@,$(libdir),g' \ -e 's,@pkglibdir@,$(pkglibdir),g' \ -e 's,@datadir@,$(datadir),g'\ + -e 's,@default_port@,$(default_port),g' \ -e 's/@VERSION@/$(VERSION)/g' \ -e 's/@host_tuple@/$(host_tuple)/g'\ -e 's,@GMAKE@,$(MAKE),g' \ Index: src/test/regress/pg_regress.sh =================================================================== RCS file: /projects/cvsroot/pgsql/src/test/regress/pg_regress.sh,v retrieving revision 1.58 diff -c -r1.58 pg_regress.sh *** src/test/regress/pg_regress.sh 25 Jun 2005 23:04:06 -0000 1.58 --- src/test/regress/pg_regress.sh 17 Jul 2005 14:52:59 -0000 *************** *** 342,348 **** unset PGHOST unset PGHOSTADDR fi ! PGPORT=65432 export PGPORT # Get rid of environment stuff that might cause psql to misbehave --- 342,348 ---- unset PGHOST unset PGHOSTADDR fi ! PGPORT=5@default_port@ export PGPORT # Get rid of environment stuff that might cause psql to misbehave >osprey [HEAD]: could not create shared memory segment: Cannot allocate memory >DETAIL: Failed system call was shmget(key=2, size=1957888, 03600). > >Kernel shmem settings too small... > >dragonfly [HEAD]: libz link error > >As per earlier discussion, I maintain this is local misconfiguration. > >cobra [7.4, 7.3, 7.2]: --with-tcl but no Tk > >Possibly adding --without-tk to the configure options is the right answer. >Otherwise, install Tk or remove --with-tcl. > >cuckoo [7.3, 7.2]: --enable-nls without OS support > >This looks like pilot error; but the later branches don't fail on this >machine, so did we change something in this area? > >caribou [7.2]: no "flex" installed > >This looks like pilot error as well, though again I don't understand why the >later branches seem to work. Are we sure the same PATH is being used for >every branch here? Why doesn't the buildfarm report for 7.2 show the PATH? > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > > >
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> potoroo [HEAD, 7.4]: lock file "/tmp/.s.PGSQL.65432.lock" already exists >> >> I'm not sure if this is a problem with a stale lock file left around >> from an old run, or if it happens because the machine is configured to >> try to build/test several branches in parallel. In any case, it might >> be worthwhile to try to hack the buildfarm script so that the Unix >> socket files are allocated in a per-branch scratch directory, not in >> /tmp. Or try to change pg_regress to use different port numbers for >> different branches? > [ snip ] > So I think the right (and certainly simplest) thing is to make the tmp > port relative to the default port. I was thinking more along the lines of altering pg_regress to let the temp port be specified as a script parameter, though certainly there is nothing wrong with letting the Makefile supply 5xxxx as the normal value. I believe that if we do something like TEMP_PORT = 5$(default_port) check:pg_regress ... --temp_port=$(TEMP_PORT) then the port could be overridden without any source code hacks by "gmake TEMP_PORT=nnn check". In any case, there's still a question in my mind of exactly what is happening on potoroo. The times of the recent reports don't look like it's been configured to try to do multiple builds in parallel, so why is it failing like this? Has a temporary-installation postmaster somehow been left running somewhere? (The mere existence of the lock file would not cause this failure --- the PID in the lock file must correspond to an existing process matching the postmaster's UID, else it'll be deemed a stale lock file.) regards, tom lane
Tom Lane wrote: > I believe that if we do something like > >TEMP_PORT = 5$(default_port) > >check: > pg_regress ... --temp_port=$(TEMP_PORT) > >then the port could be overridden without any source code hacks by >"gmake TEMP_PORT=nnn check". > > > > Works for me. Let's do it. If I understand this right it would not need any changes on the buildfarm side. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> I believe that if we do something like >> >> TEMP_PORT = 5$(default_port) >> >> check: >> pg_regress ... --temp_port=$(TEMP_PORT) >> >> then the port could be overridden without any source code hacks by >> "gmake TEMP_PORT=nnn check". > Works for me. Let's do it. If I understand this right it would not need > any changes on the buildfarm side. Done as far back as 7.4, so that this should actually work as long as you don't try to parallel-test 7.3 and 7.2. The buildfarm config stuff should recommend choosing 4-digit port numbers, because the patch I put in will fall back to 65432 if the configuration port is 5 digits. regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>Tom Lane wrote: >> >> >>>I believe that if we do something like >>> >>>TEMP_PORT = 5$(default_port) >>> >>>check: >>>pg_regress ... --temp_port=$(TEMP_PORT) >>> >>>then the port could be overridden without any source code hacks by >>>"gmake TEMP_PORT=nnn check". >>> >>> > > > >>Works for me. Let's do it. If I understand this right it would not need >>any changes on the buildfarm side. >> >> > >Done as far back as 7.4, so that this should actually work as long as >you don't try to parallel-test 7.3 and 7.2. > > Excellent. Thankyou. >The buildfarm config stuff should recommend choosing 4-digit port >numbers, because the patch I put in will fall back to 65432 if the >configuration port is 5 digits. > > > > We alread had this in the distributed sample config: branch_ports => { REL8_0_STABLE => 5682, REL7_4_STABLE => 5681, REL7_3_STABLE => 5680, REL7_2_STABLE => 5679, HEAD => 5678, } I have added your recommendation to the comments. cheers andrew
On Sat, Jul 16, 2005 at 11:17:29PM -0400, Tom Lane wrote: > cuckoo [7.3, 7.2]: --enable-nls without OS support > > This looks like pilot error; but the later branches don't fail on this > machine, so did we change something in this area? Should I just stop using nls on 7.2 and 7.3 or does someone want to poke around with this further? Also, is -hackers the best place to ask about buildfarm issues, or should I post questions elsewhere? -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <decibel@decibel.org> writes: > On Sat, Jul 16, 2005 at 11:17:29PM -0400, Tom Lane wrote: >> cuckoo [7.3, 7.2]: --enable-nls without OS support >> >> This looks like pilot error; but the later branches don't fail on this >> machine, so did we change something in this area? > Should I just stop using nls on 7.2 and 7.3 or does someone want to poke > around with this further? I don't intend to look at it; but perhaps someone who cares more about the NLS code will ... > Also, is -hackers the best place to ask about buildfarm issues, or > should I post questions elsewhere? hackers is definitely the place. regards, tom lane
On Tue, Jul 19, 2005 at 02:29:08PM -0400, Tom Lane wrote: > "Jim C. Nasby" <decibel@decibel.org> writes: > > On Sat, Jul 16, 2005 at 11:17:29PM -0400, Tom Lane wrote: > >> cuckoo [7.3, 7.2]: --enable-nls without OS support > >> > >> This looks like pilot error; but the later branches don't fail on this > >> machine, so did we change something in this area? > > > Should I just stop using nls on 7.2 and 7.3 or does someone want to poke > > around with this further? > > I don't intend to look at it; but perhaps someone who cares more about > the NLS code will ... Then I guess the question is... is it more valuable to have a working buildfarm environment for 7.2 and 7.3, or is the obnoxious failure better to spur someone into looking at it? :) Should this maybe be made a TODO and I'll adjust my config until someone tackles the TODO? Also, what do people think about having the buildfarm track different compile/build options on each environment? ISTM there's value in being able to change-up config options to make sure that different combinations work. My thought is having the buildfarm configured so that it knows what options on a machine should work (based on external dependancies) and then the script can cycle through different configs. Of course this means the server would have to do a better job of tracking per-config-setting info... -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Jim C. Nasby wrote: >Then I guess the question is... is it more valuable to have a working >buildfarm environment for 7.2 and 7.3, or is the obnoxious failure >better to spur someone into looking at it? :) Should this maybe be made >a TODO and I'll adjust my config until someone tackles the TODO? > > I don't think 7.2 and 7.3 deserve heroic efforts to get every possible build in a green state. The main reason to run buildfarm at all on these branches is to make sure that any maintenance changes don't break things. >Also, what do people think about having the buildfarm track different >compile/build options on each environment? ISTM there's value in being >able to change-up config options to make sure that different >combinations work. My thought is having the buildfarm configured so that >it knows what options on a machine should work (based on external >dependancies) and then the script can cycle through different configs. >Of course this means the server would have to do a better job of >tracking per-config-setting info... > > I actually abandoned an earlier attempt to create a buildfarm because I tried to cater for every possible combination. I do not want to get into that again. Buildfarm is not going to find every problem, no matter how hard we try. So I want to follow the KISS principle, if for no other reason than that I would far rather be working on cool postgres features than on the buildfarm :-) There is already a good list of features to be worked on. cheers andrew
On Tue, Jul 19, 2005 at 03:17:49PM -0400, Andrew Dunstan wrote: > > > Jim C. Nasby wrote: > > >Then I guess the question is... is it more valuable to have a working > >buildfarm environment for 7.2 and 7.3, or is the obnoxious failure > >better to spur someone into looking at it? :) Should this maybe be made > >a TODO and I'll adjust my config until someone tackles the TODO? > > > > > > I don't think 7.2 and 7.3 deserve heroic efforts to get every possible > build in a green state. The main reason to run buildfarm at all on these > branches is to make sure that any maintenance changes don't break things. OK, I'll tweak cuckoo's config accordingly then. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
(trimming cc list...) On Tue, Jul 19, 2005 at 02:25:38PM -0500, Jim C. Nasby wrote: > OK, I'll tweak cuckoo's config accordingly then. And now it's failing on make, at least for 7.2... ccache gcc -O3 -pipe -traditional-cpp -g -O2 -g -Wall -Wmissing-prototypes -Wmissing-declarations -I. -I../../../src/include -I/opt/local/include -c -o bootparse.o bootparse.c y.tab.c:953: warning: no previous prototype for `Int_yyparse' bootparse.y: In function `Int_yyparse': bootparse.y:278: error: syntax error at '##' token bootparse.y:278: error: `T_' undeclared (first use in this function) bootparse.y:278: error: (Each undeclared identifier is reported only once bootparse.y:278: error: for each function it appears in.) bootparse.y:278: error: parse error before "IndexElem" make[3]: *** [bootparse.o] Error 1 make[2]: *** [bootstrap-recursive] Error 2 make[1]: *** [all] Error 2 make: *** [all] Error 2 -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
And 7.3 is also failing, with a different error: ccache gcc -traditional-cpp -g -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -I../../../../src/include-I/opt/local/include -c -o printtup.o printtup.c In file included from /usr/include/machine/param.h:30, from /usr/include/sys/socket.h:67, from../../../../src/include/libpq/pqcomm.h:28, from ../../../../src/include/libpq/libpq-be.h:24, from ../../../../src/include/libpq/libpq.h:21, from printtup.c:20: /usr/include/ppc/param.h:98: macro "btodb" requires 2 arguments, but only 1 given /usr/include/ppc/param.h:100: macro "dbtob" requires 2 arguments, but only 1 given make[4]: *** [printtup.o] Error 1 make[3]: *** [common-recursive] Error 2 make[2]: *** [access-recursive] Error 2 make[1]: *** [all] Error 2 make: *** [all] Error 2 See http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=cuckoo&dt=2005-07-19%2019:38:17 and http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=cuckoo&dt=2005-07-19%2019:28:30 -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <decibel@decibel.org> writes: > And 7.3 is also failing, with a different error: > ccache gcc -traditional-cpp -g -O2 -fno-strict-aliasing -g -Wall -Wmissing-prototypes -Wmissing-declarations -I../../../../src/include-I/opt/local/include -c -o printtup.o printtup.c > In file included from /usr/include/machine/param.h:30, > from /usr/include/sys/socket.h:67, > from ../../../../src/include/libpq/pqcomm.h:28, > from ../../../../src/include/libpq/libpq-be.h:24, > from ../../../../src/include/libpq/libpq.h:21, > from printtup.c:20: > /usr/include/ppc/param.h:98: macro "btodb" requires 2 arguments, but only 1 given > /usr/include/ppc/param.h:100: macro "dbtob" requires 2 arguments, but only 1 given This has been seen before, eg http://archives.postgresql.org/pgsql-hackers/2003-10/msg00579.php http://archives.postgresql.org/pgsql-hackers/2003-11/msg00163.php At the time we felt this was Apple's fault and didn't want to back-patch any changes: http://archives.postgresql.org/pgsql-hackers/2003-11/msg00170.php I'm not sure I want to invest a lot of work in extracting and back-patching the necessary changes to make pre-7.4 PG build on Apple's recent toolchains. I think that changing -traditional-cpp to -no-cpp-precomp in template/darwin would get you past the above-quoted failure, but there are probably more problems waiting after that. A quick look through the 7.4 CVS history suggests there were quite a few fixes for Darwin, many of them done in a way with possible side-effects on other platforms, so just back-patching would probably be out of the question anyway. In short, OS X > 10.2 wasn't a supported platform when 7.2/7.3 came out, and I don't want to retroactively try to make it so. regards, tom lane
Tom Lane wrote: > This is a considerably bigger issue for the buildfarm than it would > be for ordinary users of our distribution, since in the distro it's > only the contrib modules that you actually need to run through your > local flex. Couldn't we just run the distprep actions (flex, bison) through contrib as well? That wouldn't hurt anyone, I think. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > Couldn't we just run the distprep actions (flex, bison) through contrib > as well? That wouldn't hurt anyone, I think. No objection here (though of course it doesn't affect the buildfarm issue). regards, tom lane
On Wed, Jul 20, 2005 at 10:32:00AM -0400, Tom Lane wrote: > In short, OS X > 10.2 wasn't a supported platform when 7.2/7.3 came out, > and I don't want to retroactively try to make it so. All I needed to hear. I'll pull those from cuckoo's config. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"