Thread: Three animals fail test-decoding-check on REL_10_STABLE
Hi, Only gaur shows useful logs: SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding'); ! ERROR: could not access file "test_decoding": No such file or directory Does this mean it didn't build the test_decoding module? Of the failing animals, damselfly builds with the highest frequency, and it reports the following 4 commits between the first failure[1] and the preceding success (and has been failing ever since): 962da60591 Tue Jan 1 01:39:34 2019 UTC Fix generation of padding message before encrypting Elgamal in pgcrypto bedda9fbb7 Mon Dec 31 21:57:57 2018 UTC Process EXTRA_INSTALL serially, during the first temp-install. e7ebc8c285 Mon Dec 31 21:55:04 2018 UTC Send EXTRA_INSTALL errors to install.log, not stderr. 7c97b0f55e Mon Dec 31 21:51:18 2018 UTC pg_regress: Promptly detect failed postmaster startup. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2019-01-01%2010%3A39%3A41 -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Only gaur shows useful logs: > SELECT 'init' FROM > pg_create_logical_replication_slot('regression_slot', > 'test_decoding'); > ! ERROR: could not access file "test_decoding": No such file or directory > Does this mean it didn't build the test_decoding module? I'm wondering if it built it but didn't install it, as a result of some problem with > bedda9fbb7 Mon Dec 31 21:57:57 2018 UTC Process EXTRA_INSTALL > serially, during the first temp-install. Will take a look later, but since gaur is so slow, it may be awhile before I have any answers. regards, tom lane
I wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> Does this mean it didn't build the test_decoding module? > I'm wondering if it built it but didn't install it, as a result of > some problem with >> bedda9fbb7 Mon Dec 31 21:57:57 2018 UTC Process EXTRA_INSTALL >> serially, during the first temp-install. So it appears that in v10, ./configure ... --enable-tap-tests ... make make install cd contrib/test_decoding make check fails due to failure to install test_decoding into the tmp_install tree, while it works in v11. Moreover, that's not specific to gaur: it happens on my Linux box too. I'm not very sure why only three buildfarm animals are unhappy --- maybe in the buildfarm context it requires a specific combination of options to show the problem. There's no obvious difference between bedda9fbb and 6dd690be3, so I surmise that that patch depended somehow on some previous work that only went into v11 not v10. Haven't found what, yet. regards, tom lane
I wrote: > There's no obvious difference between bedda9fbb and 6dd690be3, > so I surmise that that patch depended somehow on some previous > work that only went into v11 not v10. Haven't found what, yet. Ah, looks like it was 42e61c774. I'll push a fix shortly. regards, tom lane
I wrote: > So it appears that in v10, > ./configure ... --enable-tap-tests ... > make > make install > cd contrib/test_decoding > make check > fails due to failure to install test_decoding into the tmp_install > tree, while it works in v11. Moreover, that's not specific to > gaur: it happens on my Linux box too. I'm not very sure why only > three buildfarm animals are unhappy --- maybe in the buildfarm > context it requires a specific combination of options to show the > problem. While I think I've fixed this bug, I'm still quite confused about why only some buildfarm animals showed the problem. Comparing log files, it seems that the ones that were working were relying on having done a complete temp-install at a higher level, while the ones that were failing were trying to make a temp install from scratch in contrib/test_decoding and hence seeing the bug. For example, longfin's test-decoding-check log starts out napshot: 2019-01-11 21:12:17 /Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../../src/test/regress all /Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../../../src/port all /Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../backend submake-errcodes make[3]: Nothing to be done for `submake-errcodes'. while gaur's starts out Snapshot: 2019-01-11 07:30:45 rm -rf '/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install /bin/sh ../../config/install-sh -c -d '/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log make -C '../..' DESTDIR='/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install install >'/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log/install.log2>&1 make -j1 checkprep >>'/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log/install.log 2>&1 make -C ../../src/test/regress all make[1]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/test/regress' make -C ../../../src/port all make[2]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/port' make -C ../backend submake-errcodes make[3]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/backend' make[3]: Nothing to be done for `submake-errcodes'. These two animals are running the same buildfarm client version, and I don't see any relevant difference in their configurations, so why are they behaving differently? Andrew, any ideas? regards, tom lane
On 1/11/19 6:33 PM, Tom Lane wrote: > I wrote: >> So it appears that in v10, >> ./configure ... --enable-tap-tests ... >> make >> make install >> cd contrib/test_decoding >> make check >> fails due to failure to install test_decoding into the tmp_install >> tree, while it works in v11. Moreover, that's not specific to >> gaur: it happens on my Linux box too. I'm not very sure why only >> three buildfarm animals are unhappy --- maybe in the buildfarm >> context it requires a specific combination of options to show the >> problem. > While I think I've fixed this bug, I'm still quite confused about why > only some buildfarm animals showed the problem. Comparing log files, > it seems that the ones that were working were relying on having > done a complete temp-install at a higher level, while the ones that > were failing were trying to make a temp install from scratch in > contrib/test_decoding and hence seeing the bug. For example, > longfin's test-decoding-check log starts out > > napshot: 2019-01-11 21:12:17 > > /Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../../src/test/regress all > /Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../../../src/port all > /Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../backend submake-errcodes > make[3]: Nothing to be done for `submake-errcodes'. > > while gaur's starts out > > Snapshot: 2019-01-11 07:30:45 > > rm -rf '/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install > /bin/sh ../../config/install-sh -c -d '/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log > make -C '../..' DESTDIR='/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install install >'/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log/install.log2>&1 > make -j1 checkprep >>'/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log/install.log 2>&1 > make -C ../../src/test/regress all > make[1]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/test/regress' > make -C ../../../src/port all > make[2]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/port' > make -C ../backend submake-errcodes > make[3]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/backend' > make[3]: Nothing to be done for `submake-errcodes'. > > These two animals are running the same buildfarm client version, > and I don't see any relevant difference in their configurations, > so why are they behaving differently? Andrew, any ideas? > > Possibly an error in https://github.com/PGBuildFarm/client-code/commit/3026438dcefebcc6fe2d44eb7b60812e257a0614 It looks like longfin detects that it has all it needs to proceed, and so calls make with "NO_INSTALL=yes", but gaur doesn't. Not sure why that would be - if anything I'd expect the test to fail on OSX rather than HP-UX. Is there something weird about naming of library files on HP-UX? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 1/11/19 6:33 PM, Tom Lane wrote: >> While I think I've fixed this bug, I'm still quite confused about why >> only some buildfarm animals showed the problem. > ... Is there something weird about naming of library files on HP-UX? Doh! I looked right at this code last night, but it failed to click: # these files should be present if we've temp_installed everything, # and not if we haven't. The represent core, contrib and test_modules. return ( (-d $tmp_loc) && (-f "$bindir/postgres" || -f "$bindir/postgres.exe") && (-f "$libdir/hstore.so" || -f "$libdir/hstore.dll") && (-f "$libdir/test_parser.so" || -f "$libdir/test_parser.dll")); On HPUX (at least the version gaur is running), the extension for shared libraries is ".sl" not ".so". That doesn't explain the failures on damselfly and koreaceratops, but they're both running very old buildfarm clients, which most likely just don't have the optimization to share a temp-install. I wonder if it's practical to scrape DLSUFFIX out of src/Makefile.port instead of listing all the possibilities here. But I'm not sure how you'd deal with this bit in Makefile.hpux: ifeq ($(host_cpu), ia64) DLSUFFIX = .so else DLSUFFIX = .sl endif Anyway, the bigger picture here is that the shared-temp-install optimization is masking bugs in local "make check" rules. Not sure how much we care about that, though. Any such bug is only of interest to developers, and it only matters if someone actually stumbles over it. regards, tom lane
On 1/12/19 2:03 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 1/11/19 6:33 PM, Tom Lane wrote: >>> While I think I've fixed this bug, I'm still quite confused about why >>> only some buildfarm animals showed the problem. >> ... Is there something weird about naming of library files on HP-UX? > Doh! I looked right at this code last night, but it failed to click: > > # these files should be present if we've temp_installed everything, > # and not if we haven't. The represent core, contrib and test_modules. > return ( (-d $tmp_loc) > && (-f "$bindir/postgres" || -f "$bindir/postgres.exe") > && (-f "$libdir/hstore.so" || -f "$libdir/hstore.dll") > && (-f "$libdir/test_parser.so" || -f "$libdir/test_parser.dll")); > > On HPUX (at least the version gaur is running), the extension for > shared libraries is ".sl" not ".so". > > That doesn't explain the failures on damselfly and koreaceratops, > but they're both running very old buildfarm clients, which most > likely just don't have the optimization to share a temp-install. Yes, they are on an older version that doesn't use the NO_TEMP_INSTALL flag at all. > I wonder if it's practical to scrape DLSUFFIX out of src/Makefile.port > instead of listing all the possibilities here. But I'm not sure how > you'd deal with this bit in Makefile.hpux: > > ifeq ($(host_cpu), ia64) > DLSUFFIX = .so > else > DLSUFFIX = .sl > endif I'd rather get make to tell us directly, something like: .PHONY: show_dl_suffix show_dl_suffix: @echo $(DLSUFFIX) I can arrange something like that in the buildfarm code if we think the use case is too narrow. > Anyway, the bigger picture here is that the shared-temp-install > optimization is masking bugs in local "make check" rules. Not > sure how much we care about that, though. Any such bug is only > of interest to developers, and it only matters if someone actually > stumbles over it. > > right. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 1/12/19 2:03 PM, Tom Lane wrote: >> I wonder if it's practical to scrape DLSUFFIX out of src/Makefile.port >> instead of listing all the possibilities here. > I'd rather get make to tell us directly, something like: > .PHONY: show_dl_suffix > show_dl_suffix: > @echo $(DLSUFFIX) No objection here, but of course you'd have to back-patch that into all active branches. (The Darwin case is slightly exciting, but it looks like you'd get the right answer as long as Makefile.shlib doesn't get involved.) regards, tom lane
On 1/13/19 9:24 AM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On 1/12/19 2:03 PM, Tom Lane wrote: >>> I wonder if it's practical to scrape DLSUFFIX out of src/Makefile.port >>> instead of listing all the possibilities here. >> I'd rather get make to tell us directly, something like: >> .PHONY: show_dl_suffix >> show_dl_suffix: >> @echo $(DLSUFFIX) > No objection here, but of course you'd have to back-patch that into > all active branches. > > (The Darwin case is slightly exciting, but it looks like you'd get > the right answer as long as Makefile.shlib doesn't get involved.) > > OK, I'll make that happen. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services