Thread: Preventing abort() and exit() calls in libpq
[ starting a new thread so as not to confuse the cfbot ] I wrote: > Michael Paquier <michael@paquier.xyz> writes: >> Good point. That's worse than just pfree() which is just a plain call >> to free() in the frontend. We could have more policies here, but my >> take is that we'd better move fe_memutils.o to OBJS_FRONTEND in >> src/common/Makefile so as shared libraries don't use those routines in >> the long term. > Ugh. Not only is that bad, but your proposed fix doesn't fix it. > At least in psql, and probably in most/all of our other clients, > removing fe_memutils.o from libpq's link just causes it to start > relying on the copy in the psql executable :-(. So I agree that > some sort of mechanical enforcement would be a really good thing, > but I'm not sure what it would look like. After some thought I propose that what we really want is to prevent any calls of abort() or exit() from inside libpq. Attached is a draft patch to do that. This can't be committed as-is, because we still have some abort() calls in there in HEAD, but if we could get that cleaned up it'd work. Alternatively we could just disallow exit(), which'd be enough to catch the problematic src/common files. This relies on "nm" being able to work on shlibs, which it's not required to by POSIX. However, it seems to behave as desired even on my oldest dinosaurs. In any case, if "nm" doesn't work then we'll just not detect such problems on that platform, which should be OK as long as the test does work on common platforms. Other than that point I think it's relying only on POSIX-spec features. I'll stick this into the CF list to see if the cfbot agrees that it finds the abort() problems... regards, tom lane diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile index 0c4e55b6ad..3d992fdc78 100644 --- a/src/interfaces/libpq/Makefile +++ b/src/interfaces/libpq/Makefile @@ -96,12 +96,18 @@ SHLIB_EXPORTS = exports.txt PKG_CONFIG_REQUIRES_PRIVATE = libssl libcrypto -all: all-lib +all: all-lib check-libpq-refs # Shared library stuff include $(top_srcdir)/src/Makefile.shlib backend_src = $(top_srcdir)/src/backend +# Check for functions that libpq must not call. +# (If nm doesn't exist or doesn't work on shlibs, this test will silently +# do nothing, which is fine.) +.PHONY: check-libpq-refs +check-libpq-refs: $(shlib) + @! nm -A -g -u $< 2>/dev/null | grep -e abort -e exit # Make dependencies on pg_config_paths.h visible in all builds. fe-connect.o: fe-connect.c $(top_builddir)/src/port/pg_config_paths.h
On Sat, Jun 26, 2021 at 05:29:29PM -0400, Tom Lane wrote: > I'll stick this into the CF list to see if the cfbot agrees that > it finds the abort() problems... The CF Bot is finding those problems. > +# Check for functions that libpq must not call. > +# (If nm doesn't exist or doesn't work on shlibs, this test will silently > +# do nothing, which is fine.) > +.PHONY: check-libpq-refs > +check-libpq-refs: $(shlib) > + @! nm -A -g -u $< 2>/dev/null | grep -e abort -e exit "abort" and "exit" could be generic terms present in some other libraries. Could be be better to match with "U abort" and "U exit" instead? MinGW has a nm command, and it has a compatible option set, so I think that it should work. -- Michael
Attachment
>> +# Check for functions that libpq must not call. >> +# (If nm doesn't exist or doesn't work on shlibs, this test will silently >> +# do nothing, which is fine.) >> +.PHONY: check-libpq-refs >> +check-libpq-refs: $(shlib) >> + @! nm -A -g -u $< 2>/dev/null | grep -e abort -e exit > > "abort" and "exit" could be generic terms present in some other > libraries. Could be be better to match with "U abort" and "U exit" > instead? MinGW has a nm command, and it has a compatible option set, > so I think that it should work. A possible trick is to add ccp flags such as: -Dexit=exit_BAD -Dabort=abort_BAD. -- Fabien.
Michael Paquier <michael@paquier.xyz> writes: > On Sat, Jun 26, 2021 at 05:29:29PM -0400, Tom Lane wrote: >> I'll stick this into the CF list to see if the cfbot agrees that >> it finds the abort() problems... > The CF Bot is finding those problems. >> +# Check for functions that libpq must not call. >> +# (If nm doesn't exist or doesn't work on shlibs, this test will silently >> +# do nothing, which is fine.) >> +.PHONY: check-libpq-refs >> +check-libpq-refs: $(shlib) >> + @! nm -A -g -u $< 2>/dev/null | grep -e abort -e exit Yeah, all except on Windows. Not sure if it's worth trying to build some way to make this check on Windows. > "abort" and "exit" could be generic terms present in some other > libraries. Could be be better to match with "U abort" and "U exit" > instead? No, for a couple of reasons: * nm's output format isn't all that well standardized * on some platforms, what appears here is "_abort". I would have liked to use "-w" in the grep call, but between the "_abort" case and the "abort@@GLIBC" case we see elsewhere, we'd be assuming way too much about what grep will consider to be a word. In practice I don't think it's too much of a problem. It doesn't matter whether libc has exported names containing "exit", unless libpq or something it imports from src/common or src/port actually attempts to call those names. Which I'm not expecting. A possible counterexample is atexit(3). If libpq ever grew a reason to call that then we'd have an issue. It wouldn't be that hard to work around, by adding a grep -v filter. But in any case I'm dubious that we could ever make correct use of atexit(3) in libpq, because we'd have no way to know whether the host application has its own atexit callbacks and if so whether they'll run before or after libpq's. Something like isolationtester's atexit callback to PQclose all its connections would risk breakage if libpq tried to clean up via atexit. regards, tom lane
Fabien COELHO <coelho@cri.ensmp.fr> writes: > A possible trick is to add ccp flags such as: -Dexit=exit_BAD > -Dabort=abort_BAD. Not really going to work, at least not without a lot of fragile kluges, because the main problem here is to prevent libpq from *indirectly* calling those functions via stuff it imports from src/port or src/common. It's possible that we could make it work by generalizing the policy that "libpq may not call abort/exit" into "no PG shlib may call abort/exit", and then apply the cpp #defines while compiling the xxx_shlib.o variants of those files. This does not seem more attractive than what I proposed, though. regards, tom lane
I wrote: > This relies on "nm" being able to work on shlibs, which it's not > required to by POSIX. However, it seems to behave as desired even > on my oldest dinosaurs. In any case, if "nm" doesn't work then > we'll just not detect such problems on that platform, which should > be OK as long as the test does work on common platforms. > Other than that point I think it's relying only on POSIX-spec > features. Further dinosaur-wrangling reveals a small problem on prairiedog (ancient macOS): $ nm -A -g -u libpq.5.14.dylib | grep abort libpq.5.14.dylib:fe-connect.o: _abort libpq.5.14.dylib:_eprintf.o: _abort The fe-connect.o reference is from PGTHREAD_ERROR of course, but what's that other thing? Investigation finds this: https://opensource.apple.com/source/clang/clang-800.0.38/src/projects/compiler-rt/lib/builtins/eprintf.c.auto.html IOW it seems that this file is pulled in to implement <assert.h>, and the abort call underlies uses of Assert. So that seems fine from a coding-rule perspective: it's okay for development builds to contain core-dumping assertions. It complicates matters for the proposed patch though. As far as old macOS goes, it seems like we can work around this pretty easily, since this version of nm helpfully breaks down the references by .o file: just add a "grep -v" pass to reject "_eprintf.o:". However, if there are any other platforms that similarly convert assert() calls into some direct reference to abort(), it may be harder to work around it elsewhere. I guess the only way to know is to see what the buildfarm says. Worst case, we might only be able to enforce the prohibition against exit(). That'd be annoying but it's still much better than nothing. regards, tom lane
I wrote: >> This relies on "nm" being able to work on shlibs, which it's not >> required to by POSIX. However, it seems to behave as desired even >> on my oldest dinosaurs. In any case, if "nm" doesn't work then >> we'll just not detect such problems on that platform, which should >> be OK as long as the test does work on common platforms. So I pushed that, and not very surprisingly, it's run into some portability problems. gombessa (recent OpenBSD) reports ! nm -A -g -u libpq.so.5.15 2>/dev/null | grep -v '_eprintf\\.o:' | grep -e abort -e exit libpq.so.5.15:__cxa_atexit So far as I can find, __cxa_atexit is a C++ support routine, so I wondered what the heck libpq.so is doing calling it. I managed to reproduce the failure here using an OpenBSD installation I had at hand, and confirmed that __cxa_atexit is *not* referenced by any of the .o files in src/port, src/common, or src/interfaces/libpq. So apparently it's being injected at some fairly low level of the shlib support on that platform. Probably the thing to do is adjust the grep filter to exclude __cxa_atexit, but I want to wait awhile and see whether any other buildfarm animals report this. morepork at least will be pretty interesting, since it's a slightly older OpenBSD vintage. regards, tom lane
I wrote: > So I pushed that, and not very surprisingly, it's run into some > portability problems. gombessa (recent OpenBSD) reports > ! nm -A -g -u libpq.so.5.15 2>/dev/null | grep -v '_eprintf\\.o:' | grep -e abort -e exit > libpq.so.5.15:__cxa_atexit After a few more hours, all of our OpenBSD animals have reported that, on several different OpenBSD releases and with both gcc and clang compilers. So at least it's a longstanding platform behavior. More troublingly, fossa reports this: ! nm -A -g -u libpq.so.5.15 2>/dev/null | grep -v '_eprintf\\.o:' | grep -e abort -e exit libpq.so.5.15: U abort@@GLIBC_2.2.5 Where is that coming from? hippopotamus and jay, which seem to be different compilers on the same physical machine, aren't showing it. That'd lead to the conclusion that icc is injecting abort() calls of its own accord, which seems quite nasty. Lacking an icc license, I can't poke into that more directly here. Perhaps we could wrap the test for abort() in something like 'if "$CC" != icc then ...', but ugh. regards, tom lane
On 2021-Jun-29, Tom Lane wrote: > More troublingly, fossa reports this: > > ! nm -A -g -u libpq.so.5.15 2>/dev/null | grep -v '_eprintf\\.o:' | grep -e abort -e exit > libpq.so.5.15: U abort@@GLIBC_2.2.5 > > Where is that coming from? hippopotamus and jay, which seem to > be different compilers on the same physical machine, aren't showing > it. That'd lead to the conclusion that icc is injecting abort() > calls of its own accord, which seems quite nasty. Lacking an icc > license, I can't poke into that more directly here. I noticed that the coverage report is not updating, and lo and behold it's failing this bit. I can inspect the built files ... what exactly are you looking for? -- Álvaro Herrera Valdivia, Chile <Schwern> It does it in a really, really complicated way <crab> why does it need to be complicated? <Schwern> Because it's MakeMaker.
Ah, I nm'd all files in src/interfaces/libpq and got no hits for abort. But I did get one in libpgport_shlib.a: path_shlib.o: U abort 0000000000000320 T canonicalize_path 0000000000000197 T cleanup_path 00000000000009e3 t dir_strcmp ... -- Álvaro Herrera Valdivia, Chile "People get annoyed when you try to debug them." (Larry Wall)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Ah, I nm'd all files in src/interfaces/libpq and got no hits for abort. > But I did get one in libpgport_shlib.a: > path_shlib.o: > U abort Yeah, there is one in get_progname(). But path.o shouldn't be getting pulled into libpq ... else why aren't all the animals failing? What platform does the coverage report run on exactly? regards, tom lane
On 2021-Jun-29, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Ah, I nm'd all files in src/interfaces/libpq and got no hits for abort. > > But I did get one in libpgport_shlib.a: > > > path_shlib.o: > > U abort > > Yeah, there is one in get_progname(). But path.o shouldn't be getting > pulled into libpq ... else why aren't all the animals failing? Maybe there's something about the linker flags being used. ... ah yeah, if I configure with coverage enabled on my machine, it fails in the same way. > What platform does the coverage report run on exactly? It's Debian Buster. libpq.so is linked as gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard-Wno-format-truncation -Wno-stringop-truncation -fprofile-arcs -ftest-coverage -O0 -pthread -D_REENTRANT-D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS -fPIC -shared -Wl,-soname,libpq.so.5 -Wl,--version-script=exports.list-o libpq.so.5.15 fe-auth-scram.o fe-connect.o fe-exec.o fe-lobj.o fe-misc.o fe-print.ofe-protocol3.o fe-secure.o fe-trace.o legacy-pqsignal.o libpq-events.o pqexpbuffer.o fe-auth.o fe-secure-common.ofe-secure-openssl.o -L../../../src/port -L../../../src/common -lpgcommon_shlib -lpgport_shlib -L/usr/lib/llvm-6.0/lib -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags -lssl -lcrypto -lm -lldap_r and libpgport was just ar crs libpgport_shlib.a fls_shlib.o getpeereid_shlib.o strlcat_shlib.o strlcpy_shlib.o pg_crc32c_sse42_shlib.o pg_crc32c_sb8_shlib.opg_crc32c_sse42_choose_shlib.o bsearch_arg_shlib.o chklocale_shlib.o erand48_shlib.o inet_net_ntop_shlib.onoblock_shlib.o path_shlib.o pg_bitutils_shlib.o pg_strong_random_shlib.o pgcheckdir_shlib.o pgmkdirp_shlib.opgsleep_shlib.o pgstrcasecmp_shlib.o pgstrsignal_shlib.o pqsignal_shlib.o qsort_shlib.o qsort_arg_shlib.oquotes_shlib.o snprintf_shlib.o strerror_shlib.o tar_shlib.o thread_shlib.o -- Álvaro Herrera Valdivia, Chile https://www.EnterpriseDB.com/
On 2021-Jun-30, Alvaro Herrera wrote: > On 2021-Jun-29, Tom Lane wrote: > > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > > Ah, I nm'd all files in src/interfaces/libpq and got no hits for abort. > > > But I did get one in libpgport_shlib.a: > > > > > path_shlib.o: > > > U abort > > > > Yeah, there is one in get_progname(). But path.o shouldn't be getting > > pulled into libpq ... else why aren't all the animals failing? > > Maybe there's something about the linker flags being used. > > ... ah yeah, if I configure with coverage enabled on my machine, it fails in the same way. If I remove -fprofile-arcs from CFLAGS, then abort is no longer present, but we still get a fail because of __gcov_exit. I suppose if you'd add an exception for __cxa_atexit, the same place could use one for __gcov_exit. I'm not sure what to make of the -fprofile-arcs stuff though. -- Álvaro Herrera Valdivia, Chile https://www.EnterpriseDB.com/ "Java is clearly an example of money oriented programming" (A. Stepanov)
On 2021-Jun-30, Alvaro Herrera wrote: > If I remove -fprofile-arcs from CFLAGS, then abort is no longer present, > but we still get a fail because of __gcov_exit. I suppose if you'd add > an exception for __cxa_atexit, the same place could use one for > __gcov_exit. I tried the attached patch, and while libpq.so now builds successfully, it causes anything that tries to link to libpq fail like gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard-Wno-format-truncation -Wno-stringop-truncation -g -fprofile-arcs -ftest-coverage findtimezone.oinitdb.o localtime.o -L../../../src/port -L../../../src/common -L../../../src/fe_utils -lpgfeutils -L../../../src/common-lpgcommon -L../../../src/port -lpgport -L../../../src/interfaces/libpq -lpq -Wl,--as-needed -Wl,-rpath,'/pgsql/install/master-coverage/lib',--enable-new-dtags -lpgcommon -lpgport -lpthread -lxml2 -lssl -lcrypto -lz-lreadline -lpthread -lrt -ldl -lm -o initdb /usr/bin/ld: initdb: hidden symbol `__gcov_merge_add' in /usr/lib/gcc/x86_64-linux-gnu/8/libgcov.a(_gcov_merge_add.o) isreferenced by DSO /usr/bin/ld: final link failed: bad value collect2: error: ld returned 1 exit status make[3]: *** [Makefile:43: initdb] Error 1 so this doesn't look too promising. -- Álvaro Herrera Valdivia, Chile https://www.EnterpriseDB.com/
Attachment
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Maybe there's something about the linker flags being used. > ... ah yeah, if I configure with coverage enabled on my machine, it fails in the same way. Ah-hah, yeah, I see it too if I enable profiling. I can confirm that it's not from the abort() call in path.c, because it's still there if I remove that. So this is another case where build infrastructure is injecting abort() calls we didn't ask for. Between this and the icc case, I'm now inclined to give up on trying to forbid abort() calls in libpq. I think the value-add for that is a lot lower than it is for exit() anyway. abort() is something one doesn't toss around lightly. You mentioned __gcov_exit, but I'm not sure if we need an exception for that. I see it referenced by the individual .o files, but the completed .so has no such reference, so at least on RHEL8 it's apparently satisfied during .so linkage. Do you see something different? regards, tom lane
On 2021-Jun-30, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Maybe there's something about the linker flags being used. > > ... ah yeah, if I configure with coverage enabled on my machine, it fails in the same way. > > Ah-hah, yeah, I see it too if I enable profiling. I can confirm > that it's not from the abort() call in path.c, because it's still > there if I remove that. So this is another case where build > infrastructure is injecting abort() calls we didn't ask for. Hah, I didn't think to try that. > Between this and the icc case, I'm now inclined to give up on > trying to forbid abort() calls in libpq. I think the value-add > for that is a lot lower than it is for exit() anyway. abort() > is something one doesn't toss around lightly. No objections to that. > You mentioned __gcov_exit, but I'm not sure if we need an > exception for that. I see it referenced by the individual .o > files, but the completed .so has no such reference, so at least > on RHEL8 it's apparently satisfied during .so linkage. Do you > see something different? Well, not really. I saw it but only after I removed -fprofile-arcs from Makefile.shlib's link line; but per my other email, that doesn't really work. Everything seems to work well for me after removing abort from that grep. -- Álvaro Herrera Valdivia, Chile https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2021-Jun-30, Tom Lane wrote: >> You mentioned __gcov_exit, but I'm not sure if we need an >> exception for that. I see it referenced by the individual .o >> files, but the completed .so has no such reference, so at least >> on RHEL8 it's apparently satisfied during .so linkage. Do you >> see something different? > Well, not really. I saw it but only after I removed -fprofile-arcs from > Makefile.shlib's link line; but per my other email, that doesn't really > work. > Everything seems to work well for me after removing abort from that grep. OK, thanks, will push a fix momentarily. regards, tom lane
On 2021-Jun-30, Tom Lane wrote: > OK, thanks, will push a fix momentarily. (BTW since the _eprintf.o stuff comes from _abort, I suppose you're going to remove that grep -v too?) -- Álvaro Herrera 39°49'30"S 73°17'W https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > (BTW since the _eprintf.o stuff comes from _abort, I suppose you're > going to remove that grep -v too?) Right, I did that. regards, tom lane
I wrote: > OK, thanks, will push a fix momentarily. Did so, and look what popped up on wrasse [1]: ! nm -A -g -u libpq.so.5.15 2>/dev/null | grep -v __cxa_atexit | grep exit libpq.so.5.15: [765] | 232544| 248|FUNC |GLOB |3 |14 |PQexitPipelineMode This makes no sense, because (a) wrasse was happy with the previous version, and (b) surely the "-u" switch should prevent nm from printing PQexitPipelineMode. Noah, did you change anything about wrasse's configuration today? regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2021-06-30%2014%3A58%3A15
On 26.06.21 23:29, Tom Lane wrote: > After some thought I propose that what we really want is to prevent > any calls of abort() or exit() from inside libpq. Attached is a > draft patch to do that. Could we set this rule up a little bit differently so that it is only run when the library is built. Right now, make world on a built tree makes 17 calls to this "nm" line, and make check-world calls it 81 times. I think once would be enough. ;-)
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > Could we set this rule up a little bit differently so that it is only > run when the library is built. > Right now, make world on a built tree makes 17 calls to this "nm" line, > and make check-world calls it 81 times. I think once would be enough. ;-) Hmm, didn't realize that would happen. Will see what can be done. regards, tom lane
I wrote: > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: >> Could we set this rule up a little bit differently so that it is only >> run when the library is built. >> Right now, make world on a built tree makes 17 calls to this "nm" line, >> and make check-world calls it 81 times. I think once would be enough. ;-) > Hmm, didn't realize that would happen. Will see what can be done. Looks like we'd have to make use of a dummy stamp-file, more or less as attached. Any objections? regards, tom lane diff --git a/src/interfaces/libpq/.gitignore b/src/interfaces/libpq/.gitignore index a4afe7c1c6..7478dc344a 100644 --- a/src/interfaces/libpq/.gitignore +++ b/src/interfaces/libpq/.gitignore @@ -1 +1,2 @@ /exports.list +/libpq-refs-stamp diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile index c2a35a488a..bb6ceec2c9 100644 --- a/src/interfaces/libpq/Makefile +++ b/src/interfaces/libpq/Makefile @@ -96,7 +96,7 @@ SHLIB_EXPORTS = exports.txt PKG_CONFIG_REQUIRES_PRIVATE = libssl libcrypto -all: all-lib check-libpq-refs +all: all-lib libpq-refs-stamp # Shared library stuff include $(top_srcdir)/src/Makefile.shlib @@ -108,9 +108,9 @@ backend_src = $(top_srcdir)/src/backend # If nm doesn't exist or doesn't work on shlibs, this test will do nothing, # which is fine. The exclusion of __cxa_atexit is necessary on OpenBSD, # which seems to insert references to that even in pure C code. -.PHONY: check-libpq-refs -check-libpq-refs: $(shlib) +libpq-refs-stamp: $(shlib) ! nm -A -g -u $< 2>/dev/null | grep -v __cxa_atexit | grep exit + touch $@ # Make dependencies on pg_config_paths.h visible in all builds. fe-connect.o: fe-connect.c $(top_builddir)/src/port/pg_config_paths.h @@ -141,7 +141,7 @@ uninstall: uninstall-lib clean distclean: clean-lib $(MAKE) -C test $@ - rm -f $(OBJS) pthread.h + rm -f $(OBJS) pthread.h libpq-refs-stamp # Might be left over from a Win32 client-only build rm -f pg_config_paths.h
On Wed, 2021-06-30 at 18:29 -0400, Tom Lane wrote: > I wrote: > > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > > > Could we set this rule up a little bit differently so that it is only > > > run when the library is built. > > > Right now, make world on a built tree makes 17 calls to this "nm" line, > > > and make check-world calls it 81 times. I think once would be enough. ;-) > > Hmm, didn't realize that would happen. Will see what can be done. > > Looks like we'd have to make use of a dummy stamp-file, more or less > as attached. Any objections? Spitballing -- if you don't like the stamp file, you could add the check to the end of the $(shlib) rule, surrounded by an ifeq check. Then .DELETE_ON_ERROR should take care of the rest, I think. --Jacob
Jacob Champion <pchampion@vmware.com> writes: > On Wed, 2021-06-30 at 18:29 -0400, Tom Lane wrote: >> Looks like we'd have to make use of a dummy stamp-file, more or less >> as attached. Any objections? > Spitballing -- if you don't like the stamp file, you could add the > check to the end of the $(shlib) rule, surrounded by an ifeq check. > Then .DELETE_ON_ERROR should take care of the rest, I think. Hmm ... I'd been thinking we don't use .DELETE_ON_ERROR, but on second look we do, so that could be a plausible approach. On balance though, the separate rule seems better, because .DELETE_ON_ERROR would destroy the evidence about why "nm" failed, which could be annoying when investigating problems. regards, tom lane
On Wed, 2021-06-30 at 18:56 -0400, Tom Lane wrote: > Jacob Champion <pchampion@vmware.com> writes: > > On Wed, 2021-06-30 at 18:29 -0400, Tom Lane wrote: > > > Looks like we'd have to make use of a dummy stamp-file, more or less > > > as attached. Any objections? > > Spitballing -- if you don't like the stamp file, you could add the > > check to the end of the $(shlib) rule, surrounded by an ifeq check. > > Then .DELETE_ON_ERROR should take care of the rest, I think. > > Hmm ... I'd been thinking we don't use .DELETE_ON_ERROR, but on > second look we do, so that could be a plausible approach. > > On balance though, the separate rule seems better, because > .DELETE_ON_ERROR would destroy the evidence about why "nm" > failed, which could be annoying when investigating problems. Good point. +1 to the stamp approach, then. --Jacob
On Wed, Jun 30, 2021 at 12:06:47PM -0400, Tom Lane wrote: > I wrote: > > OK, thanks, will push a fix momentarily. > > Did so, and look what popped up on wrasse [1]: > > ! nm -A -g -u libpq.so.5.15 2>/dev/null | grep -v __cxa_atexit | grep exit > libpq.so.5.15: [765] | 232544| 248|FUNC |GLOB |3 |14 |PQexitPipelineMode > > This makes no sense, because (a) wrasse was happy with the previous > version, and (b) surely the "-u" switch should prevent nm from > printing PQexitPipelineMode. Noah, did you change anything about > wrasse's configuration today? No, and wrasse still succeeds at "git checkout e45b0df^". Solaris /usr/bin/grep doesn't support "-e": [nm@gcc-solaris11 5:0 2021-06-30T22:23:29 postgresql 0]$ echo exit | grep -e exit grep: illegal option -- e Usage: grep [-c|-l|-q] -bhinsvw pattern file . . . [nm@gcc-solaris11 5:0 2021-06-30T22:23:43 postgresql 2]$ echo exit | grep exit exit [nm@gcc-solaris11 5:0 2021-06-30T22:24:16 postgresql 0]$ echo exit | /usr/xpg4/bin/grep -e exit exit That concealed things in the previous version. You can see those "illegal option" messages in the last passing run: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=wrasse&dt=2021-06-30%2008%3A57%3A27&stg=make
On 2021-Jun-30, Noah Misch wrote: > No, and wrasse still succeeds at "git checkout e45b0df^". Solaris > /usr/bin/grep doesn't support "-e": I think this means the rule should use $(GREP), which is /usr/bin/ggrep in wrasse, checking for grep that handles long lines and -e... /usr/bin/ggrep -- Álvaro Herrera 39°49'30"S 73°17'W https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2021-Jun-30, Noah Misch wrote: >> No, and wrasse still succeeds at "git checkout e45b0df^". Solaris >> /usr/bin/grep doesn't support "-e": > I think this means the rule should use $(GREP), which is /usr/bin/ggrep > in wrasse, Ah, my mistake. Although we're still left with the question of why Solaris' "nm" doesn't support the POSIX-required options. regards, tom lane
On Wed, Jun 30, 2021 at 11:45:10PM -0400, Tom Lane wrote: > we're still left with the question of why > Solaris' "nm" doesn't support the POSIX-required options. In POSIX, -g and -u are mutually exclusive. Solaris ignores all but the first of these in a command: [nm@gcc-solaris11 5:0 2021-07-01T06:48:54 postgresql 1]$ /usr/bin/nm -u -g src/interfaces/libpq/libpq.so|grep exec nm: -u or -e set, -g ignored [nm@gcc-solaris11 5:0 2021-07-01T06:49:41 postgresql 1]$ /usr/bin/nm -g -u src/interfaces/libpq/libpq.so|grep exec nm: -e or -g set, -u ignored [405] | 208320| 84|FUNC |GLOB |3 |14 |PQexec [818] | 208416| 128|FUNC |GLOB |3 |14 |PQexecParams [729] | 208672| 112|FUNC |GLOB |3 |14 |PQexecPrepared [nm@gcc-solaris11 5:0 2021-07-01T06:49:45 postgresql 0]$ /usr/bin/nm -u src/interfaces/libpq/libpq.so|grep exec [nm@gcc-solaris11 5:0 2021-07-01T06:49:48 postgresql 1]$
Noah Misch <noah@leadboat.com> writes: > On Wed, Jun 30, 2021 at 11:45:10PM -0400, Tom Lane wrote: >> we're still left with the question of why >> Solaris' "nm" doesn't support the POSIX-required options. > In POSIX, -g and -u are mutually exclusive. Solaris ignores all but the first > of these in a command: I've just re-read the POSIX spec for "nm", and I do not see anything there that would support that interpretation. Still, we can try it without -g and see what else breaks. regards, tom lane
On Thu, Jul 01, 2021 at 01:20:48AM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Wed, Jun 30, 2021 at 11:45:10PM -0400, Tom Lane wrote: > >> we're still left with the question of why > >> Solaris' "nm" doesn't support the POSIX-required options. > > > In POSIX, -g and -u are mutually exclusive. Solaris ignores all but the first > > of these in a command: > > I've just re-read the POSIX spec for "nm", and I do not see anything there > that would support that interpretation. Still, we can try it without -g > and see what else breaks. https://pubs.opengroup.org/onlinepubs/9699919799/utilities/nm.html says: nm [-APv] [-g|-u] [-t format] file... If the options weren't mutually-exclusive, it would say: nm [-APvgu] [-t format] file...
Noah Misch <noah@leadboat.com> writes: > On Thu, Jul 01, 2021 at 01:20:48AM -0400, Tom Lane wrote: >> I've just re-read the POSIX spec for "nm", and I do not see anything there >> that would support that interpretation. > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/nm.html says: > nm [-APv] [-g|-u] [-t format] file... Oh, right, I failed to look carefully at the syntax diagram. Local testing also supports the conclusion that -g isn't needed here, so pushed that way. Thanks for investigating that! regards, tom lane
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > I think this means the rule should use $(GREP), which is /usr/bin/ggrep > in wrasse, I didn't install this change, because it isn't actually needed at the moment, and we aren't using $(GREP) anywhere else. Might be a bridge to cross in future. regards, tom lane
On 01.07.21 00:41, Jacob Champion wrote: > On Wed, 2021-06-30 at 18:29 -0400, Tom Lane wrote: >> I wrote: >>> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: >>>> Could we set this rule up a little bit differently so that it is only >>>> run when the library is built. >>>> Right now, make world on a built tree makes 17 calls to this "nm" line, >>>> and make check-world calls it 81 times. I think once would be enough. ;-) >>> Hmm, didn't realize that would happen. Will see what can be done. >> >> Looks like we'd have to make use of a dummy stamp-file, more or less >> as attached. Any objections? > > Spitballing -- if you don't like the stamp file, you could add the > check to the end of the $(shlib) rule, surrounded by an ifeq check. > Then .DELETE_ON_ERROR should take care of the rest, I think. Somewhere in the $(shlib) rule would seem most appropriate. But I don't understand the rest: What ifeq, and why .DELETE_ON_ERROR?
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > On 01.07.21 00:41, Jacob Champion wrote: >> Spitballing -- if you don't like the stamp file, you could add the >> check to the end of the $(shlib) rule, surrounded by an ifeq check. >> Then .DELETE_ON_ERROR should take care of the rest, I think. > Somewhere in the $(shlib) rule would seem most appropriate. But I don't > understand the rest: What ifeq, and why .DELETE_ON_ERROR? The variant of this I'd been thinking of was $(shlib): $(OBJS) | $(SHLIB_PREREQS) $(LINK.shared) -o $@ $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) +ifneq (,$(SHLIB_EXTRA_ACTION)) + $(SHLIB_EXTRA_ACTION) +endif (and similarly in several other places); then libpq's Makefile could set SHLIB_EXTRA_ACTION to the desired thing. The problem then is, what happens when the extra action fails? Without .DELETE_ON_ERROR, the shlib is still there and the next make run will think everything's good. regards, tom lane
On Thu, 2021-07-01 at 14:14 -0400, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > > Somewhere in the $(shlib) rule would seem most appropriate. But I don't > > understand the rest: What ifeq, and why .DELETE_ON_ERROR? > > The variant of this I'd been thinking of was > > $(shlib): $(OBJS) | $(SHLIB_PREREQS) > $(LINK.shared) -o $@ $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) > +ifneq (,$(SHLIB_EXTRA_ACTION)) > + $(SHLIB_EXTRA_ACTION) > +endif > > (and similarly in several other places); then libpq's Makefile > could set SHLIB_EXTRA_ACTION to the desired thing. > > The problem then is, what happens when the extra action fails? > Without .DELETE_ON_ERROR, the shlib is still there and the next > make run will think everything's good. Yep, that was pretty much what was in my head. ifeq (or ifneq in your example) to gate the extra nm check, and .DELETE_ON_ERROR to make the failure stick for future make invocations. --Jacob
On 01.07.21 20:14, Tom Lane wrote: > The variant of this I'd been thinking of was > > $(shlib): $(OBJS) | $(SHLIB_PREREQS) > $(LINK.shared) -o $@ $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) > +ifneq (,$(SHLIB_EXTRA_ACTION)) > + $(SHLIB_EXTRA_ACTION) > +endif > > (and similarly in several other places); then libpq's Makefile > could set SHLIB_EXTRA_ACTION to the desired thing. Right, that looks sensible. (Maybe the ifneq isn't actually necessary, since if the variable is not set, nothing would happen.) > The problem then is, what happens when the extra action fails? > Without .DELETE_ON_ERROR, the shlib is still there and the next > make run will think everything's good. Right. .DELETE_ON_ERROR is already set in Makefile.global, so it's not necessary to set it again.
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > On 01.07.21 20:14, Tom Lane wrote: >> The problem then is, what happens when the extra action fails? >> Without .DELETE_ON_ERROR, the shlib is still there and the next >> make run will think everything's good. > Right. .DELETE_ON_ERROR is already set in Makefile.global, so it's not > necessary to set it again. Right. Since we use that, we don't actually have that problem. What we'd have instead is that debugging an unexpected failure of the "extra action" would be painful, because there would be no way short of modifying the Makefiles to create its input data. So I think the other solution with a separate rule is better. regards, tom lane
Now it's hoverfly: ! nm -A -u libpq.so.5 2>/dev/null | grep -v __cxa_atexit | grep exit libpq.so.5: atexit U - libpq.so.5: pthread_exit U - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2021-07-02%2010%3A10%3A29 -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Use it up, wear it out, make it do, or do without"
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Now it's hoverfly: > ! nm -A -u libpq.so.5 2>/dev/null | grep -v __cxa_atexit | grep exit > libpq.so.5: atexit U - > libpq.so.5: pthread_exit U - Ugh. What in the world is producing those references? (As I mentioned upthread, I'm quite suspicious of libpq trying to perform any actions in an atexit callback, because of the uncertainty about whether some later atexit callback could try to use libpq functions. So this seems like it might be an actual bug.) regards, tom lane
On Wed, 2021-06-30 at 10:42 -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > On 2021-Jun-30, Tom Lane wrote: > > > You mentioned __gcov_exit, but I'm not sure if we need an > > > exception for that. I see it referenced by the individual .o > > > files, but the completed .so has no such reference, so at least > > > on RHEL8 it's apparently satisfied during .so linkage. Do you > > > see something different? > > Well, not really. I saw it but only after I removed -fprofile-arcs from > > Makefile.shlib's link line; but per my other email, that doesn't really > > work. > > Everything seems to work well for me after removing abort from that grep. > > OK, thanks, will push a fix momentarily. With latest HEAD, building with --enable-coverage still fails on my Ubuntu 20.04: ! nm -A -u libpq.so.5.15 2>/dev/null | grep -v __cxa_atexit | grep exit libpq.so.5.15: U exit@@GLIBC_2.2.5 I don't see any exit references in the libpq objects or in libpgport_shlib, so it seems like libpgcommon_shlib is the culprit... I assume turning off optimizations leads to less dead code elimination? --Jacob
Jacob Champion <pchampion@vmware.com> writes: > With latest HEAD, building with --enable-coverage still fails on my > Ubuntu 20.04: > ! nm -A -u libpq.so.5.15 2>/dev/null | grep -v __cxa_atexit | grep exit > libpq.so.5.15: U exit@@GLIBC_2.2.5 Hm, weird. I don't see that here on RHEL8, and https://coverage.postgresql.org seems to be working so it doesn't fail on Alvaro's Debian setup either. What configure options are you using? Does "nm -u" report "exit" being referenced from any *.o in libpq, or from any *_shlib.o in src/port/ or src/common/ ? regards, tom lane
On Fri, 2021-07-02 at 18:20 -0400, Tom Lane wrote: > What configure options are you using? Just `./configure --enable-coverage`, nothing else. I distclean'd right before for good measure. > Does "nm -u" report "exit" being referenced from any *.o in libpq, > or from any *_shlib.o in src/port/ or src/common/ ? Only src/common: controldata_utils_shlib.o: U close U __errno_location U exit ... fe_memutils_shlib.o: U exit ... file_utils_shlib.o: U close U closedir U __errno_location U exit ... hex_shlib.o: U exit ... psprintf_shlib.o: U __errno_location U exit ... stringinfo_shlib.o: U __errno_location U exit ... username_shlib.o: U __errno_location U exit ... --Jacob
Jacob Champion <pchampion@vmware.com> writes: > On Fri, 2021-07-02 at 18:20 -0400, Tom Lane wrote: >> What configure options are you using? > Just `./configure --enable-coverage`, nothing else. I distclean'd right > before for good measure. Hmph. There's *something* different about your setup from what either Alvaro or I tried. What's the compiler (and version)? What's the platform exactly? regards, tom lane
On Fri, 2021-07-02 at 18:45 -0400, Tom Lane wrote: > Jacob Champion <pchampion@vmware.com> writes: > > On Fri, 2021-07-02 at 18:20 -0400, Tom Lane wrote: > > > What configure options are you using? > > Just `./configure --enable-coverage`, nothing else. I distclean'd right > > before for good measure. > > Hmph. There's *something* different about your setup from what > either Alvaro or I tried. What's the compiler (and version)? > What's the platform exactly? $ gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. ... $ cat /etc/os-release NAME="Ubuntu" VERSION="20.04.2 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.2 LTS" VERSION_ID="20.04" ... $ uname -a Linux HOSTNAME 5.8.0-59-generic #66~20.04.1-Ubuntu SMP Thu Jun 17 11:14:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux --Jacob
On 2021-Jul-02, Jacob Champion wrote: > Only src/common: > > controldata_utils_shlib.o: > U close > U __errno_location > U exit Actually, I do see these in the .o files as well, but they don't make it to the .a file. gcc here is 8.3.0. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"alvherre@alvh.no-ip.org" <alvherre@alvh.no-ip.org> writes: > gcc here is 8.3.0. Hmmm ... mine is 8.4.1. I'm about to go out to dinner, but will check into this with some newer gcc versions later. regards, tom lane
On Fri, Jul 02, 2021 at 11:20:17AM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Now it's hoverfly: > > ! nm -A -u libpq.so.5 2>/dev/null | grep -v __cxa_atexit | grep exit > > libpq.so.5: atexit U - > > libpq.so.5: pthread_exit U - > > Ugh. What in the world is producing those references? Those come from a statically-linked libldap_r: $ nm -A -u /home/nm/sw/nopath/openldap-64/lib/libldap_r.a|grep exit /home/nm/sw/nopath/openldap-64/lib/libldap_r.a[tpool.o]: .ldap_pvt_thread_exit U - /home/nm/sw/nopath/openldap-64/lib/libldap_r.a[thr_posix.o]: .pthread_exit U - /home/nm/sw/nopath/openldap-64/lib/libldap_r.a[init.o]: .atexit U -
Noah Misch <noah@leadboat.com> writes: > On Fri, Jul 02, 2021 at 11:20:17AM -0400, Tom Lane wrote: >> Ugh. What in the world is producing those references? > Those come from a statically-linked libldap_r: Blech! I wonder if there is some way to avoid counting that. It's not really hard to imagine that such a library might contain an exit() call, for example, thus negating our test altogether. I'm now wondering about applying the test to *.o in libpq, as well as libpgport_shlib.a and libpgcommon_shlib.a. The latter would require some code changes, and it would make the prohibition extend further than libpq alone. On the bright side, we could reinstate the check for abort(). regards, tom lane
I wrote: > I'm now wondering about applying the test to *.o in libpq, > as well as libpgport_shlib.a and libpgcommon_shlib.a. > The latter would require some code changes, and it would make > the prohibition extend further than libpq alone. On the bright > side, we could reinstate the check for abort(). After consuming a bit more caffeine, I'm afraid that won't work. I'd imagined leaving, e.g., psprintf.c out of libpgcommon_shlib.a. But if someone mistakenly introduced a psprintf call into libpq, it'd still compile just fine; the symbol would be resolved against psprintf in the calling application's code. We'd only detect a failure when trying to use libpq with an app that didn't contain that function, which feels like something that our own testing could miss. What I'm now thinking about is restricting the test to only be run on platforms where use of foo.a libraries is deprecated, so that we can be pretty sure that we won't hit this situation. Even if we only run the test on Linux, that'd be plenty to catch any mistakes. regards, tom lane
I wrote: > Hmmm ... mine is 8.4.1. > I'm about to go out to dinner, but will check into this with some > newer gcc versions later. Tried --enable-coverage on Fedora 34 (with gcc 11.1.1) and sure enough there's an exit() call being inserted. I've pushed a fix to just disable the check altogether in --enable-coverage builds. regards, tom lane
On Sat, Jul 03, 2021 at 10:45:59AM -0400, Tom Lane wrote: > I'd imagined leaving, e.g., psprintf.c out of libpgcommon_shlib.a. > But if someone mistakenly introduced a psprintf call into libpq, > it'd still compile just fine; the symbol would be resolved against > psprintf in the calling application's code. I think that would fail to compile on Windows, where such references need exported symbols. We don't make an exports file for applications other than postgres.exe. So the strategy that inspired this may work. > What I'm now thinking about is restricting the test to only be run on > platforms where use of foo.a libraries is deprecated, so that we can > be pretty sure that we won't hit this situation. Even if we only > run the test on Linux, that'd be plenty to catch any mistakes. Hmm. Static libraries are the rarer case on both AIX and Linux, but I'm not aware of a relevant deprecation on either platform. If it comes this to, I'd be more inclined to control the Makefile rule with an environment variable (e.g. ENFORCE_LIBC_CALL_RESTRICTIONS) instead of reacting to the platform.
Noah Misch <noah@leadboat.com> writes: > On Sat, Jul 03, 2021 at 10:45:59AM -0400, Tom Lane wrote: >> What I'm now thinking about is restricting the test to only be run on >> platforms where use of foo.a libraries is deprecated, so that we can >> be pretty sure that we won't hit this situation. Even if we only >> run the test on Linux, that'd be plenty to catch any mistakes. > Hmm. Static libraries are the rarer case on both AIX and Linux, but I'm not > aware of a relevant deprecation on either platform. If it comes this to, I'd > be more inclined to control the Makefile rule with an environment variable > (e.g. ENFORCE_LIBC_CALL_RESTRICTIONS) instead of reacting to the platform. That'd require buildfarm owner intervention, as well as intervention by users. Which seems like exporting our problems onto them. I'd really rather not go that way if we can avoid it. regards, tom lane
On Sat, Jul 03, 2021 at 06:44:20PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Sat, Jul 03, 2021 at 10:45:59AM -0400, Tom Lane wrote: > >> What I'm now thinking about is restricting the test to only be run on > >> platforms where use of foo.a libraries is deprecated, so that we can > >> be pretty sure that we won't hit this situation. Even if we only > >> run the test on Linux, that'd be plenty to catch any mistakes. > > > Hmm. Static libraries are the rarer case on both AIX and Linux, but I'm not > > aware of a relevant deprecation on either platform. If it comes this to, I'd > > be more inclined to control the Makefile rule with an environment variable > > (e.g. ENFORCE_LIBC_CALL_RESTRICTIONS) instead of reacting to the platform. > > That'd require buildfarm owner intervention, as well as intervention > by users. Which seems like exporting our problems onto them. I'd > really rather not go that way if we can avoid it. I like that goal, though we'll have to see how difficult it proves. As of today, a GNU/Linux user building against static OpenLDAP will get a failure, right? That would export work onto that user, spuriously. Since the non-AIX user count dwarfs the AIX user count, expect a user complaint from non-AIX first. We'd get something like 95% of the value by running the test on one Windows buildfarm member and one non-Windows buildfarm member. If you did gate the check on an environment variable, there would be no need to angle for broad adoption. Still, I agree avoiding that configuration step is nice, all else being equal. A strategy not having either of those drawbacks would be to skip the test if libpq.so contains a definition of libpq_unbind(). If any other dependency contains exit calls, we'd likewise probe for one symbol of that library and skip the test if presence of that symbol reveals static linking. (That's maintenance-prone in its own way, but a maintenance-free strategy has not appeared.)
Noah Misch <noah@leadboat.com> writes: > On Sat, Jul 03, 2021 at 06:44:20PM -0400, Tom Lane wrote: >> That'd require buildfarm owner intervention, as well as intervention >> by users. Which seems like exporting our problems onto them. I'd >> really rather not go that way if we can avoid it. > I like that goal, though we'll have to see how difficult it proves. As of > today, a GNU/Linux user building against static OpenLDAP will get a failure, > right? That would export work onto that user, spuriously. As a former packager for Red Hat, my response would be "you're doing it wrong". Nobody on any Linux distro should *ever* statically link code from one package into code from another, because they are going to create untold pain for themselves when (not if) the first package is updated. So I flat out reject that as a valid use-case. It may be that that ethos is not so strongly baked-in on other platforms. But I'm content to wait and see if there are complaints before rescinding the automatic test; and if there are, I'd prefer to deal with it by just backing off to running the test on Linux only. > We'd get something like 95% of the value by running the test on one Windows > buildfarm member and one non-Windows buildfarm member. True. But that just brings up the point that we aren't running the test at all on MSVC builds right now. I have no idea how to do that, do you? > ... A strategy not having either of those drawbacks would be to skip > the test if libpq.so contains a definition of libpq_unbind(). I assume you meant some OpenLDAP symbol? > If any other > dependency contains exit calls, we'd likewise probe for one symbol of that > library and skip the test if presence of that symbol reveals static linking. > (That's maintenance-prone in its own way, but a maintenance-free strategy has > not appeared.) I'm more worried about the risk of failing to detect problems at all, in case somebody fat-fingers things in a way that causes the test to be skipped everywhere. I'll keep that way in mind if we conclude that the existing way is unworkable, but so far I don't think it is. regards, tom lane
On Fri, Jul 09, 2021 at 10:06:18AM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Sat, Jul 03, 2021 at 06:44:20PM -0400, Tom Lane wrote: > >> That'd require buildfarm owner intervention, as well as intervention > >> by users. Which seems like exporting our problems onto them. I'd > >> really rather not go that way if we can avoid it. > > > I like that goal, though we'll have to see how difficult it proves. As of > > today, a GNU/Linux user building against static OpenLDAP will get a failure, > > right? That would export work onto that user, spuriously. > > As a former packager for Red Hat, my response would be "you're doing it > wrong". Nobody on any Linux distro should *ever* statically link code > from one package into code from another, because they are going to create > untold pain for themselves when (not if) the first package is updated. > So I flat out reject that as a valid use-case. > > It may be that that ethos is not so strongly baked-in on other platforms. Packagers do face more rules than users generally. > But I'm content to wait and see if there are complaints before rescinding > the automatic test; and if there are, I'd prefer to deal with it by just > backing off to running the test on Linux only. Okay. > > We'd get something like 95% of the value by running the test on one Windows > > buildfarm member and one non-Windows buildfarm member. > > True. But that just brings up the point that we aren't running the test > at all on MSVC builds right now. I have no idea how to do that, do you? I don't. But coverage via non-MSVC Windows is good enough. > > ... A strategy not having either of those drawbacks would be to skip > > the test if libpq.so contains a definition of libpq_unbind(). > > I assume you meant some OpenLDAP symbol? Yeah, that was supposed to say ldap_unbind().