Thread: Preventing abort() and exit() calls in libpq

Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
[ starting a new thread so as not to confuse the cfbot ]

I wrote:
> Michael Paquier <michael@paquier.xyz> writes:
>> Good point.  That's worse than just pfree() which is just a plain call
>> to free() in the frontend.  We could have more policies here, but my
>> take is that we'd better move fe_memutils.o to OBJS_FRONTEND in
>> src/common/Makefile so as shared libraries don't use those routines in
>> the long term.

> Ugh.  Not only is that bad, but your proposed fix doesn't fix it.
> At least in psql, and probably in most/all of our other clients,
> removing fe_memutils.o from libpq's link just causes it to start
> relying on the copy in the psql executable :-(.  So I agree that
> some sort of mechanical enforcement would be a really good thing,
> but I'm not sure what it would look like.

After some thought I propose that what we really want is to prevent
any calls of abort() or exit() from inside libpq.  Attached is a
draft patch to do that.  This can't be committed as-is, because
we still have some abort() calls in there in HEAD, but if we could
get that cleaned up it'd work.  Alternatively we could just disallow
exit(), which'd be enough to catch the problematic src/common files.

This relies on "nm" being able to work on shlibs, which it's not
required to by POSIX.  However, it seems to behave as desired even
on my oldest dinosaurs.  In any case, if "nm" doesn't work then
we'll just not detect such problems on that platform, which should
be OK as long as the test does work on common platforms.
Other than that point I think it's relying only on POSIX-spec
features.

I'll stick this into the CF list to see if the cfbot agrees that
it finds the abort() problems...

            regards, tom lane

diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index 0c4e55b6ad..3d992fdc78 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -96,12 +96,18 @@ SHLIB_EXPORTS = exports.txt
 
 PKG_CONFIG_REQUIRES_PRIVATE = libssl libcrypto
 
-all: all-lib
+all: all-lib check-libpq-refs
 
 # Shared library stuff
 include $(top_srcdir)/src/Makefile.shlib
 backend_src = $(top_srcdir)/src/backend
 
+# Check for functions that libpq must not call.
+# (If nm doesn't exist or doesn't work on shlibs, this test will silently
+# do nothing, which is fine.)
+.PHONY: check-libpq-refs
+check-libpq-refs: $(shlib)
+    @! nm -A -g -u $< 2>/dev/null | grep -e abort -e exit
 
 # Make dependencies on pg_config_paths.h visible in all builds.
 fe-connect.o: fe-connect.c $(top_builddir)/src/port/pg_config_paths.h

Re: Preventing abort() and exit() calls in libpq

From
Michael Paquier
Date:
On Sat, Jun 26, 2021 at 05:29:29PM -0400, Tom Lane wrote:
> I'll stick this into the CF list to see if the cfbot agrees that
> it finds the abort() problems...

The CF Bot is finding those problems.

> +# Check for functions that libpq must not call.
> +# (If nm doesn't exist or doesn't work on shlibs, this test will silently
> +# do nothing, which is fine.)
> +.PHONY: check-libpq-refs
> +check-libpq-refs: $(shlib)
> +    @! nm -A -g -u $< 2>/dev/null | grep -e abort -e exit

"abort" and "exit" could be generic terms present in some other
libraries.  Could be be better to match with "U abort" and "U exit"
instead?  MinGW has a nm command, and it has a compatible option set,
so I think that it should work.
--
Michael

Attachment

Re: Preventing abort() and exit() calls in libpq

From
Fabien COELHO
Date:
>> +# Check for functions that libpq must not call.
>> +# (If nm doesn't exist or doesn't work on shlibs, this test will silently
>> +# do nothing, which is fine.)
>> +.PHONY: check-libpq-refs
>> +check-libpq-refs: $(shlib)
>> +    @! nm -A -g -u $< 2>/dev/null | grep -e abort -e exit
>
> "abort" and "exit" could be generic terms present in some other
> libraries.  Could be be better to match with "U abort" and "U exit"
> instead?  MinGW has a nm command, and it has a compatible option set,
> so I think that it should work.

A possible trick is to add ccp flags such as: -Dexit=exit_BAD 
-Dabort=abort_BAD.

-- 
Fabien.



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Sat, Jun 26, 2021 at 05:29:29PM -0400, Tom Lane wrote:
>> I'll stick this into the CF list to see if the cfbot agrees that
>> it finds the abort() problems...

> The CF Bot is finding those problems.

>> +# Check for functions that libpq must not call.
>> +# (If nm doesn't exist or doesn't work on shlibs, this test will silently
>> +# do nothing, which is fine.)
>> +.PHONY: check-libpq-refs
>> +check-libpq-refs: $(shlib)
>> +    @! nm -A -g -u $< 2>/dev/null | grep -e abort -e exit

Yeah, all except on Windows.  Not sure if it's worth trying to build
some way to make this check on Windows.

> "abort" and "exit" could be generic terms present in some other
> libraries.  Could be be better to match with "U abort" and "U exit"
> instead?

No, for a couple of reasons:

* nm's output format isn't all that well standardized

* on some platforms, what appears here is "_abort".

I would have liked to use "-w" in the grep call, but between the
"_abort" case and the "abort@@GLIBC" case we see elsewhere, we'd
be assuming way too much about what grep will consider to be a word.

In practice I don't think it's too much of a problem.  It doesn't
matter whether libc has exported names containing "exit", unless
libpq or something it imports from src/common or src/port actually
attempts to call those names.  Which I'm not expecting.

A possible counterexample is atexit(3).  If libpq ever grew a
reason to call that then we'd have an issue.  It wouldn't be
that hard to work around, by adding a grep -v filter.  But in
any case I'm dubious that we could ever make correct use of
atexit(3) in libpq, because we'd have no way to know whether
the host application has its own atexit callbacks and if so
whether they'll run before or after libpq's.  Something like
isolationtester's atexit callback to PQclose all its connections
would risk breakage if libpq tried to clean up via atexit.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> A possible trick is to add ccp flags such as: -Dexit=exit_BAD 
> -Dabort=abort_BAD.

Not really going to work, at least not without a lot of fragile
kluges, because the main problem here is to prevent libpq from
*indirectly* calling those functions via stuff it imports from
src/port or src/common.

It's possible that we could make it work by generalizing the
policy that "libpq may not call abort/exit" into "no PG shlib
may call abort/exit", and then apply the cpp #defines while
compiling the xxx_shlib.o variants of those files.  This does
not seem more attractive than what I proposed, though.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
I wrote:
> This relies on "nm" being able to work on shlibs, which it's not
> required to by POSIX.  However, it seems to behave as desired even
> on my oldest dinosaurs.  In any case, if "nm" doesn't work then
> we'll just not detect such problems on that platform, which should
> be OK as long as the test does work on common platforms.
> Other than that point I think it's relying only on POSIX-spec
> features.

Further dinosaur-wrangling reveals a small problem on prairiedog
(ancient macOS):

$ nm -A -g -u libpq.5.14.dylib | grep abort
libpq.5.14.dylib:fe-connect.o: _abort
libpq.5.14.dylib:_eprintf.o: _abort

The fe-connect.o reference is from PGTHREAD_ERROR of course,
but what's that other thing?  Investigation finds this:

https://opensource.apple.com/source/clang/clang-800.0.38/src/projects/compiler-rt/lib/builtins/eprintf.c.auto.html

IOW it seems that this file is pulled in to implement <assert.h>,
and the abort call underlies uses of Assert.  So that seems fine
from a coding-rule perspective: it's okay for development builds
to contain core-dumping assertions.  It complicates matters for
the proposed patch though.

As far as old macOS goes, it seems like we can work around this
pretty easily, since this version of nm helpfully breaks down
the references by .o file: just add a "grep -v" pass to reject
"_eprintf.o:".  However, if there are any other platforms that
similarly convert assert() calls into some direct reference
to abort(), it may be harder to work around it elsewhere.
I guess the only way to know is to see what the buildfarm
says.

Worst case, we might only be able to enforce the prohibition
against exit().  That'd be annoying but it's still much better
than nothing.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
I wrote:
>> This relies on "nm" being able to work on shlibs, which it's not
>> required to by POSIX.  However, it seems to behave as desired even
>> on my oldest dinosaurs.  In any case, if "nm" doesn't work then
>> we'll just not detect such problems on that platform, which should
>> be OK as long as the test does work on common platforms.

So I pushed that, and not very surprisingly, it's run into some
portability problems.  gombessa (recent OpenBSD) reports

! nm -A -g -u libpq.so.5.15 2>/dev/null | grep -v '_eprintf\\.o:' | grep -e abort -e exit
libpq.so.5.15:__cxa_atexit

So far as I can find, __cxa_atexit is a C++ support routine, so
I wondered what the heck libpq.so is doing calling it.  I managed
to reproduce the failure here using an OpenBSD installation I had
at hand, and confirmed that __cxa_atexit is *not* referenced by any
of the .o files in src/port, src/common, or src/interfaces/libpq.
So apparently it's being injected at some fairly low level of the
shlib support on that platform.

Probably the thing to do is adjust the grep filter to exclude
__cxa_atexit, but I want to wait awhile and see whether any
other buildfarm animals report this.  morepork at least will
be pretty interesting, since it's a slightly older OpenBSD
vintage.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
I wrote:
> So I pushed that, and not very surprisingly, it's run into some
> portability problems.  gombessa (recent OpenBSD) reports

> ! nm -A -g -u libpq.so.5.15 2>/dev/null | grep -v '_eprintf\\.o:' | grep -e abort -e exit
> libpq.so.5.15:__cxa_atexit

After a few more hours, all of our OpenBSD animals have reported
that, on several different OpenBSD releases and with both gcc
and clang compilers.  So at least it's a longstanding platform
behavior.

More troublingly, fossa reports this:

! nm -A -g -u libpq.so.5.15 2>/dev/null | grep -v '_eprintf\\.o:' | grep -e abort -e exit
libpq.so.5.15:                 U abort@@GLIBC_2.2.5

Where is that coming from?  hippopotamus and jay, which seem to
be different compilers on the same physical machine, aren't showing
it.  That'd lead to the conclusion that icc is injecting abort()
calls of its own accord, which seems quite nasty.  Lacking an icc
license, I can't poke into that more directly here.

Perhaps we could wrap the test for abort() in something like
'if "$CC" != icc then ...', but ugh.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Alvaro Herrera
Date:
On 2021-Jun-29, Tom Lane wrote:

> More troublingly, fossa reports this:
> 
> ! nm -A -g -u libpq.so.5.15 2>/dev/null | grep -v '_eprintf\\.o:' | grep -e abort -e exit
> libpq.so.5.15:                 U abort@@GLIBC_2.2.5
> 
> Where is that coming from?  hippopotamus and jay, which seem to
> be different compilers on the same physical machine, aren't showing
> it.  That'd lead to the conclusion that icc is injecting abort()
> calls of its own accord, which seems quite nasty.  Lacking an icc
> license, I can't poke into that more directly here.

I noticed that the coverage report is not updating, and lo and behold
it's failing this bit.

I can inspect the built files ... what exactly are you looking for?

-- 
Álvaro Herrera       Valdivia, Chile
<Schwern> It does it in a really, really complicated way
<crab> why does it need to be complicated?
<Schwern> Because it's MakeMaker.



Re: Preventing abort() and exit() calls in libpq

From
Alvaro Herrera
Date:
Ah, I nm'd all files in src/interfaces/libpq and got no hits for abort.
But I did get one in libpgport_shlib.a:

path_shlib.o:
                 U abort
         0000000000000320 T canonicalize_path
         0000000000000197 T cleanup_path
         00000000000009e3 t dir_strcmp
         ...

-- 
Álvaro Herrera       Valdivia, Chile
"People get annoyed when you try to debug them."  (Larry Wall)



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Ah, I nm'd all files in src/interfaces/libpq and got no hits for abort.
> But I did get one in libpgport_shlib.a:

> path_shlib.o:
>                  U abort

Yeah, there is one in get_progname().  But path.o shouldn't be getting
pulled into libpq ... else why aren't all the animals failing?

What platform does the coverage report run on exactly?

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Alvaro Herrera
Date:
On 2021-Jun-29, Tom Lane wrote:

> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Ah, I nm'd all files in src/interfaces/libpq and got no hits for abort.
> > But I did get one in libpgport_shlib.a:
> 
> > path_shlib.o:
> >                  U abort
> 
> Yeah, there is one in get_progname().  But path.o shouldn't be getting
> pulled into libpq ... else why aren't all the animals failing?

Maybe there's something about the linker flags being used.

... ah yeah, if I configure with coverage enabled on my machine, it fails in the same way.

> What platform does the coverage report run on exactly?

It's Debian Buster.

libpq.so is linked as

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard-Wno-format-truncation -Wno-stringop-truncation -fprofile-arcs -ftest-coverage -O0 -pthread
-D_REENTRANT-D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS -fPIC -shared -Wl,-soname,libpq.so.5
-Wl,--version-script=exports.list-o libpq.so.5.15  fe-auth-scram.o fe-connect.o fe-exec.o fe-lobj.o fe-misc.o
fe-print.ofe-protocol3.o fe-secure.o fe-trace.o legacy-pqsignal.o libpq-events.o pqexpbuffer.o fe-auth.o
fe-secure-common.ofe-secure-openssl.o -L../../../src/port -L../../../src/common -lpgcommon_shlib -lpgport_shlib
-L/usr/lib/llvm-6.0/lib -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags  -lssl -lcrypto -lm
-lldap_r 
 

and libpgport was just

ar crs libpgport_shlib.a fls_shlib.o getpeereid_shlib.o strlcat_shlib.o strlcpy_shlib.o pg_crc32c_sse42_shlib.o
pg_crc32c_sb8_shlib.opg_crc32c_sse42_choose_shlib.o bsearch_arg_shlib.o chklocale_shlib.o erand48_shlib.o
inet_net_ntop_shlib.onoblock_shlib.o path_shlib.o pg_bitutils_shlib.o pg_strong_random_shlib.o pgcheckdir_shlib.o
pgmkdirp_shlib.opgsleep_shlib.o pgstrcasecmp_shlib.o pgstrsignal_shlib.o pqsignal_shlib.o qsort_shlib.o
qsort_arg_shlib.oquotes_shlib.o snprintf_shlib.o strerror_shlib.o tar_shlib.o thread_shlib.o
 

-- 
Álvaro Herrera                        Valdivia, Chile
                        https://www.EnterpriseDB.com/



Re: Preventing abort() and exit() calls in libpq

From
Alvaro Herrera
Date:
On 2021-Jun-30, Alvaro Herrera wrote:

> On 2021-Jun-29, Tom Lane wrote:
> 
> > Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > > Ah, I nm'd all files in src/interfaces/libpq and got no hits for abort.
> > > But I did get one in libpgport_shlib.a:
> > 
> > > path_shlib.o:
> > >                  U abort
> > 
> > Yeah, there is one in get_progname().  But path.o shouldn't be getting
> > pulled into libpq ... else why aren't all the animals failing?
> 
> Maybe there's something about the linker flags being used.
> 
> ... ah yeah, if I configure with coverage enabled on my machine, it fails in the same way.

If I remove -fprofile-arcs from CFLAGS, then abort is no longer present,
but we still get a fail because of __gcov_exit.  I suppose if you'd add
an exception for __cxa_atexit, the same place could use one for
__gcov_exit.

I'm not sure what to make of the -fprofile-arcs stuff though.

-- 
Álvaro Herrera                        Valdivia, Chile
                        https://www.EnterpriseDB.com/
"Java is clearly an example of money oriented programming"  (A. Stepanov)



Re: Preventing abort() and exit() calls in libpq

From
Alvaro Herrera
Date:
On 2021-Jun-30, Alvaro Herrera wrote:

> If I remove -fprofile-arcs from CFLAGS, then abort is no longer present,
> but we still get a fail because of __gcov_exit.  I suppose if you'd add
> an exception for __cxa_atexit, the same place could use one for
> __gcov_exit.

I tried the attached patch, and while libpq.so now builds successfully,
it causes anything that tries to link to libpq fail like

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard-Wno-format-truncation -Wno-stringop-truncation -g -fprofile-arcs -ftest-coverage
findtimezone.oinitdb.o localtime.o -L../../../src/port -L../../../src/common -L../../../src/fe_utils -lpgfeutils
-L../../../src/common-lpgcommon -L../../../src/port -lpgport -L../../../src/interfaces/libpq -lpq   -Wl,--as-needed
-Wl,-rpath,'/pgsql/install/master-coverage/lib',--enable-new-dtags -lpgcommon -lpgport -lpthread -lxml2 -lssl -lcrypto
-lz-lreadline -lpthread -lrt -ldl -lm  -o initdb
 
/usr/bin/ld: initdb: hidden symbol `__gcov_merge_add' in /usr/lib/gcc/x86_64-linux-gnu/8/libgcov.a(_gcov_merge_add.o)
isreferenced by DSO
 
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
make[3]: *** [Makefile:43: initdb] Error 1

so this doesn't look too promising.

-- 
Álvaro Herrera                        Valdivia, Chile
                        https://www.EnterpriseDB.com/

Attachment

Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Maybe there's something about the linker flags being used.
> ... ah yeah, if I configure with coverage enabled on my machine, it fails in the same way.

Ah-hah, yeah, I see it too if I enable profiling.  I can confirm
that it's not from the abort() call in path.c, because it's still
there if I remove that.  So this is another case where build
infrastructure is injecting abort() calls we didn't ask for.

Between this and the icc case, I'm now inclined to give up on
trying to forbid abort() calls in libpq.  I think the value-add
for that is a lot lower than it is for exit() anyway.  abort()
is something one doesn't toss around lightly.

You mentioned __gcov_exit, but I'm not sure if we need an
exception for that.  I see it referenced by the individual .o
files, but the completed .so has no such reference, so at least
on RHEL8 it's apparently satisfied during .so linkage.  Do you
see something different?

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Alvaro Herrera
Date:
On 2021-Jun-30, Tom Lane wrote:

> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Maybe there's something about the linker flags being used.
> > ... ah yeah, if I configure with coverage enabled on my machine, it fails in the same way.
> 
> Ah-hah, yeah, I see it too if I enable profiling.  I can confirm
> that it's not from the abort() call in path.c, because it's still
> there if I remove that.  So this is another case where build
> infrastructure is injecting abort() calls we didn't ask for.

Hah, I didn't think to try that.

> Between this and the icc case, I'm now inclined to give up on
> trying to forbid abort() calls in libpq.  I think the value-add
> for that is a lot lower than it is for exit() anyway.  abort()
> is something one doesn't toss around lightly.

No objections to that.

> You mentioned __gcov_exit, but I'm not sure if we need an
> exception for that.  I see it referenced by the individual .o
> files, but the completed .so has no such reference, so at least
> on RHEL8 it's apparently satisfied during .so linkage.  Do you
> see something different?

Well, not really.  I saw it but only after I removed -fprofile-arcs from
Makefile.shlib's link line; but per my other email, that doesn't really
work.

Everything seems to work well for me after removing abort from that grep.

-- 
Álvaro Herrera                        Valdivia, Chile
                        https://www.EnterpriseDB.com/



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2021-Jun-30, Tom Lane wrote:
>> You mentioned __gcov_exit, but I'm not sure if we need an
>> exception for that.  I see it referenced by the individual .o
>> files, but the completed .so has no such reference, so at least
>> on RHEL8 it's apparently satisfied during .so linkage.  Do you
>> see something different?

> Well, not really.  I saw it but only after I removed -fprofile-arcs from
> Makefile.shlib's link line; but per my other email, that doesn't really
> work.
> Everything seems to work well for me after removing abort from that grep.

OK, thanks, will push a fix momentarily.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Alvaro Herrera
Date:
On 2021-Jun-30, Tom Lane wrote:

> OK, thanks, will push a fix momentarily.

(BTW since the _eprintf.o stuff comes from _abort, I suppose you're
going to remove that grep -v too?)

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
                               https://www.EnterpriseDB.com/



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> (BTW since the _eprintf.o stuff comes from _abort, I suppose you're
> going to remove that grep -v too?)

Right, I did that.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
I wrote:
> OK, thanks, will push a fix momentarily.

Did so, and look what popped up on wrasse [1]:

! nm -A -g -u libpq.so.5.15 2>/dev/null | grep -v __cxa_atexit | grep exit
libpq.so.5.15: [765]    |              232544|                 248|FUNC |GLOB |3    |14     |PQexitPipelineMode

This makes no sense, because (a) wrasse was happy with the previous
version, and (b) surely the "-u" switch should prevent nm from
printing PQexitPipelineMode.  Noah, did you change anything about
wrasse's configuration today?

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2021-06-30%2014%3A58%3A15



Re: Preventing abort() and exit() calls in libpq

From
Peter Eisentraut
Date:
On 26.06.21 23:29, Tom Lane wrote:
> After some thought I propose that what we really want is to prevent
> any calls of abort() or exit() from inside libpq.  Attached is a
> draft patch to do that.

Could we set this rule up a little bit differently so that it is only 
run when the library is built.

Right now, make world on a built tree makes 17 calls to this "nm" line, 
and make check-world calls it 81 times.  I think once would be enough. ;-)



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> Could we set this rule up a little bit differently so that it is only 
> run when the library is built.
> Right now, make world on a built tree makes 17 calls to this "nm" line, 
> and make check-world calls it 81 times.  I think once would be enough. ;-)

Hmm, didn't realize that would happen.  Will see what can be done.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
I wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> Could we set this rule up a little bit differently so that it is only 
>> run when the library is built.
>> Right now, make world on a built tree makes 17 calls to this "nm" line, 
>> and make check-world calls it 81 times.  I think once would be enough. ;-)

> Hmm, didn't realize that would happen.  Will see what can be done.

Looks like we'd have to make use of a dummy stamp-file, more or less
as attached.  Any objections?

            regards, tom lane

diff --git a/src/interfaces/libpq/.gitignore b/src/interfaces/libpq/.gitignore
index a4afe7c1c6..7478dc344a 100644
--- a/src/interfaces/libpq/.gitignore
+++ b/src/interfaces/libpq/.gitignore
@@ -1 +1,2 @@
 /exports.list
+/libpq-refs-stamp
diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index c2a35a488a..bb6ceec2c9 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -96,7 +96,7 @@ SHLIB_EXPORTS = exports.txt
 
 PKG_CONFIG_REQUIRES_PRIVATE = libssl libcrypto
 
-all: all-lib check-libpq-refs
+all: all-lib libpq-refs-stamp
 
 # Shared library stuff
 include $(top_srcdir)/src/Makefile.shlib
@@ -108,9 +108,9 @@ backend_src = $(top_srcdir)/src/backend
 # If nm doesn't exist or doesn't work on shlibs, this test will do nothing,
 # which is fine.  The exclusion of __cxa_atexit is necessary on OpenBSD,
 # which seems to insert references to that even in pure C code.
-.PHONY: check-libpq-refs
-check-libpq-refs: $(shlib)
+libpq-refs-stamp: $(shlib)
     ! nm -A -g -u $< 2>/dev/null | grep -v __cxa_atexit | grep exit
+    touch $@
 
 # Make dependencies on pg_config_paths.h visible in all builds.
 fe-connect.o: fe-connect.c $(top_builddir)/src/port/pg_config_paths.h
@@ -141,7 +141,7 @@ uninstall: uninstall-lib
 
 clean distclean: clean-lib
     $(MAKE) -C test $@
-    rm -f $(OBJS) pthread.h
+    rm -f $(OBJS) pthread.h libpq-refs-stamp
 # Might be left over from a Win32 client-only build
     rm -f pg_config_paths.h


Re: Preventing abort() and exit() calls in libpq

From
Jacob Champion
Date:
On Wed, 2021-06-30 at 18:29 -0400, Tom Lane wrote:
> I wrote:
> > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> > > Could we set this rule up a little bit differently so that it is only 
> > > run when the library is built.
> > > Right now, make world on a built tree makes 17 calls to this "nm" line, 
> > > and make check-world calls it 81 times.  I think once would be enough. ;-)
> > Hmm, didn't realize that would happen.  Will see what can be done.
> 
> Looks like we'd have to make use of a dummy stamp-file, more or less
> as attached.  Any objections?

Spitballing -- if you don't like the stamp file, you could add the
check to the end of the $(shlib) rule, surrounded by an ifeq check.
Then .DELETE_ON_ERROR should take care of the rest, I think.

--Jacob

Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Jacob Champion <pchampion@vmware.com> writes:
> On Wed, 2021-06-30 at 18:29 -0400, Tom Lane wrote:
>> Looks like we'd have to make use of a dummy stamp-file, more or less
>> as attached.  Any objections?

> Spitballing -- if you don't like the stamp file, you could add the
> check to the end of the $(shlib) rule, surrounded by an ifeq check.
> Then .DELETE_ON_ERROR should take care of the rest, I think.

Hmm ... I'd been thinking we don't use .DELETE_ON_ERROR, but on
second look we do, so that could be a plausible approach.

On balance though, the separate rule seems better, because
.DELETE_ON_ERROR would destroy the evidence about why "nm"
failed, which could be annoying when investigating problems.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Jacob Champion
Date:
On Wed, 2021-06-30 at 18:56 -0400, Tom Lane wrote:
> Jacob Champion <pchampion@vmware.com> writes:
> > On Wed, 2021-06-30 at 18:29 -0400, Tom Lane wrote:
> > > Looks like we'd have to make use of a dummy stamp-file, more or less
> > > as attached.  Any objections?
> > Spitballing -- if you don't like the stamp file, you could add the
> > check to the end of the $(shlib) rule, surrounded by an ifeq check.
> > Then .DELETE_ON_ERROR should take care of the rest, I think.
> 
> Hmm ... I'd been thinking we don't use .DELETE_ON_ERROR, but on
> second look we do, so that could be a plausible approach.
> 
> On balance though, the separate rule seems better, because
> .DELETE_ON_ERROR would destroy the evidence about why "nm"
> failed, which could be annoying when investigating problems.

Good point. +1 to the stamp approach, then.

--Jacob

Re: Preventing abort() and exit() calls in libpq

From
Noah Misch
Date:
On Wed, Jun 30, 2021 at 12:06:47PM -0400, Tom Lane wrote:
> I wrote:
> > OK, thanks, will push a fix momentarily.
> 
> Did so, and look what popped up on wrasse [1]:
> 
> ! nm -A -g -u libpq.so.5.15 2>/dev/null | grep -v __cxa_atexit | grep exit
> libpq.so.5.15: [765]    |              232544|                 248|FUNC |GLOB |3    |14     |PQexitPipelineMode
> 
> This makes no sense, because (a) wrasse was happy with the previous
> version, and (b) surely the "-u" switch should prevent nm from
> printing PQexitPipelineMode.  Noah, did you change anything about
> wrasse's configuration today?

No, and wrasse still succeeds at "git checkout e45b0df^".  Solaris
/usr/bin/grep doesn't support "-e":

[nm@gcc-solaris11 5:0 2021-06-30T22:23:29 postgresql 0]$ echo exit | grep -e exit
grep: illegal option -- e
Usage: grep [-c|-l|-q] -bhinsvw pattern file . . .
[nm@gcc-solaris11 5:0 2021-06-30T22:23:43 postgresql 2]$ echo exit | grep exit
exit
[nm@gcc-solaris11 5:0 2021-06-30T22:24:16 postgresql 0]$ echo exit | /usr/xpg4/bin/grep -e exit
exit

That concealed things in the previous version.  You can see those "illegal
option" messages in the last passing run:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=wrasse&dt=2021-06-30%2008%3A57%3A27&stg=make



Re: Preventing abort() and exit() calls in libpq

From
Alvaro Herrera
Date:
On 2021-Jun-30, Noah Misch wrote:

> No, and wrasse still succeeds at "git checkout e45b0df^".  Solaris
> /usr/bin/grep doesn't support "-e":

I think this means the rule should use $(GREP), which is /usr/bin/ggrep
in wrasse,

checking for grep that handles long lines and -e... /usr/bin/ggrep

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
                               https://www.EnterpriseDB.com/



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2021-Jun-30, Noah Misch wrote:
>> No, and wrasse still succeeds at "git checkout e45b0df^".  Solaris
>> /usr/bin/grep doesn't support "-e":

> I think this means the rule should use $(GREP), which is /usr/bin/ggrep
> in wrasse,

Ah, my mistake.  Although we're still left with the question of why
Solaris' "nm" doesn't support the POSIX-required options.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Noah Misch
Date:
On Wed, Jun 30, 2021 at 11:45:10PM -0400, Tom Lane wrote:
> we're still left with the question of why
> Solaris' "nm" doesn't support the POSIX-required options.

In POSIX, -g and -u are mutually exclusive.  Solaris ignores all but the first
of these in a command:

[nm@gcc-solaris11 5:0 2021-07-01T06:48:54 postgresql 1]$ /usr/bin/nm -u -g src/interfaces/libpq/libpq.so|grep exec
nm: -u or -e set, -g ignored
[nm@gcc-solaris11 5:0 2021-07-01T06:49:41 postgresql 1]$ /usr/bin/nm -g -u src/interfaces/libpq/libpq.so|grep exec
nm: -e or -g set, -u ignored
[405]   |              208320|                  84|FUNC |GLOB |3    |14     |PQexec
[818]   |              208416|                 128|FUNC |GLOB |3    |14     |PQexecParams
[729]   |              208672|                 112|FUNC |GLOB |3    |14     |PQexecPrepared
[nm@gcc-solaris11 5:0 2021-07-01T06:49:45 postgresql 0]$ /usr/bin/nm -u src/interfaces/libpq/libpq.so|grep exec
[nm@gcc-solaris11 5:0 2021-07-01T06:49:48 postgresql 1]$ 



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Wed, Jun 30, 2021 at 11:45:10PM -0400, Tom Lane wrote:
>> we're still left with the question of why
>> Solaris' "nm" doesn't support the POSIX-required options.

> In POSIX, -g and -u are mutually exclusive.  Solaris ignores all but the first
> of these in a command:

I've just re-read the POSIX spec for "nm", and I do not see anything there
that would support that interpretation.  Still, we can try it without -g
and see what else breaks.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Noah Misch
Date:
On Thu, Jul 01, 2021 at 01:20:48AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Wed, Jun 30, 2021 at 11:45:10PM -0400, Tom Lane wrote:
> >> we're still left with the question of why
> >> Solaris' "nm" doesn't support the POSIX-required options.
> 
> > In POSIX, -g and -u are mutually exclusive.  Solaris ignores all but the first
> > of these in a command:
> 
> I've just re-read the POSIX spec for "nm", and I do not see anything there
> that would support that interpretation.  Still, we can try it without -g
> and see what else breaks.

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/nm.html says:

  nm [-APv] [-g|-u] [-t format] file...

If the options weren't mutually-exclusive, it would say:

  nm [-APvgu] [-t format] file...



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Thu, Jul 01, 2021 at 01:20:48AM -0400, Tom Lane wrote:
>> I've just re-read the POSIX spec for "nm", and I do not see anything there
>> that would support that interpretation.

> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/nm.html says:
>   nm [-APv] [-g|-u] [-t format] file...

Oh, right, I failed to look carefully at the syntax diagram.
Local testing also supports the conclusion that -g isn't needed
here, so pushed that way.  Thanks for investigating that!

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> I think this means the rule should use $(GREP), which is /usr/bin/ggrep
> in wrasse,

I didn't install this change, because it isn't actually needed at the
moment, and we aren't using $(GREP) anywhere else.  Might be a bridge
to cross in future.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Peter Eisentraut
Date:
On 01.07.21 00:41, Jacob Champion wrote:
> On Wed, 2021-06-30 at 18:29 -0400, Tom Lane wrote:
>> I wrote:
>>> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>>>> Could we set this rule up a little bit differently so that it is only
>>>> run when the library is built.
>>>> Right now, make world on a built tree makes 17 calls to this "nm" line,
>>>> and make check-world calls it 81 times.  I think once would be enough. ;-)
>>> Hmm, didn't realize that would happen.  Will see what can be done.
>>
>> Looks like we'd have to make use of a dummy stamp-file, more or less
>> as attached.  Any objections?
> 
> Spitballing -- if you don't like the stamp file, you could add the
> check to the end of the $(shlib) rule, surrounded by an ifeq check.
> Then .DELETE_ON_ERROR should take care of the rest, I think.

Somewhere in the $(shlib) rule would seem most appropriate.  But I don't 
understand the rest: What ifeq, and why .DELETE_ON_ERROR?



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 01.07.21 00:41, Jacob Champion wrote:
>> Spitballing -- if you don't like the stamp file, you could add the
>> check to the end of the $(shlib) rule, surrounded by an ifeq check.
>> Then .DELETE_ON_ERROR should take care of the rest, I think.

> Somewhere in the $(shlib) rule would seem most appropriate.  But I don't 
> understand the rest: What ifeq, and why .DELETE_ON_ERROR?

The variant of this I'd been thinking of was

 $(shlib): $(OBJS) | $(SHLIB_PREREQS)
     $(LINK.shared) -o $@ $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK)
+ifneq (,$(SHLIB_EXTRA_ACTION))
+    $(SHLIB_EXTRA_ACTION)
+endif

(and similarly in several other places); then libpq's Makefile
could set SHLIB_EXTRA_ACTION to the desired thing.

The problem then is, what happens when the extra action fails?
Without .DELETE_ON_ERROR, the shlib is still there and the next
make run will think everything's good.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Jacob Champion
Date:
On Thu, 2021-07-01 at 14:14 -0400, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> > Somewhere in the $(shlib) rule would seem most appropriate.  But I don't 
> > understand the rest: What ifeq, and why .DELETE_ON_ERROR?
> 
> The variant of this I'd been thinking of was
> 
>  $(shlib): $(OBJS) | $(SHLIB_PREREQS)
>      $(LINK.shared) -o $@ $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK)
> +ifneq (,$(SHLIB_EXTRA_ACTION))
> +    $(SHLIB_EXTRA_ACTION)
> +endif
> 
> (and similarly in several other places); then libpq's Makefile
> could set SHLIB_EXTRA_ACTION to the desired thing.
> 
> The problem then is, what happens when the extra action fails?
> Without .DELETE_ON_ERROR, the shlib is still there and the next
> make run will think everything's good.

Yep, that was pretty much what was in my head. ifeq (or ifneq in your
example) to gate the extra nm check, and .DELETE_ON_ERROR to make the
failure stick for future make invocations.

--Jacob

Re: Preventing abort() and exit() calls in libpq

From
Peter Eisentraut
Date:
On 01.07.21 20:14, Tom Lane wrote:
> The variant of this I'd been thinking of was
> 
>   $(shlib): $(OBJS) | $(SHLIB_PREREQS)
>       $(LINK.shared) -o $@ $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK)
> +ifneq (,$(SHLIB_EXTRA_ACTION))
> +    $(SHLIB_EXTRA_ACTION)
> +endif
> 
> (and similarly in several other places); then libpq's Makefile
> could set SHLIB_EXTRA_ACTION to the desired thing.

Right, that looks sensible.  (Maybe the ifneq isn't actually necessary, 
since if the variable is not set, nothing would happen.)

> The problem then is, what happens when the extra action fails?
> Without .DELETE_ON_ERROR, the shlib is still there and the next
> make run will think everything's good.

Right.  .DELETE_ON_ERROR is already set in Makefile.global, so it's not 
necessary to set it again.



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 01.07.21 20:14, Tom Lane wrote:
>> The problem then is, what happens when the extra action fails?
>> Without .DELETE_ON_ERROR, the shlib is still there and the next
>> make run will think everything's good.

> Right.  .DELETE_ON_ERROR is already set in Makefile.global, so it's not 
> necessary to set it again.

Right.  Since we use that, we don't actually have that problem.
What we'd have instead is that debugging an unexpected failure
of the "extra action" would be painful, because there would be
no way short of modifying the Makefiles to create its input data.
So I think the other solution with a separate rule is better.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Alvaro Herrera
Date:
Now it's hoverfly:

! nm -A -u libpq.so.5 2>/dev/null | grep -v __cxa_atexit | grep exit
libpq.so.5: atexit               U           -
libpq.so.5: pthread_exit         U           -

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2021-07-02%2010%3A10%3A29

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Now it's hoverfly:
> ! nm -A -u libpq.so.5 2>/dev/null | grep -v __cxa_atexit | grep exit
> libpq.so.5: atexit               U           -
> libpq.so.5: pthread_exit         U           -

Ugh.  What in the world is producing those references?

(As I mentioned upthread, I'm quite suspicious of libpq trying to
perform any actions in an atexit callback, because of the uncertainty
about whether some later atexit callback could try to use libpq
functions.  So this seems like it might be an actual bug.)

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Jacob Champion
Date:
On Wed, 2021-06-30 at 10:42 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > On 2021-Jun-30, Tom Lane wrote:
> > > You mentioned __gcov_exit, but I'm not sure if we need an
> > > exception for that.  I see it referenced by the individual .o
> > > files, but the completed .so has no such reference, so at least
> > > on RHEL8 it's apparently satisfied during .so linkage.  Do you
> > > see something different?
> > Well, not really.  I saw it but only after I removed -fprofile-arcs from
> > Makefile.shlib's link line; but per my other email, that doesn't really
> > work.
> > Everything seems to work well for me after removing abort from that grep.
> 
> OK, thanks, will push a fix momentarily.

With latest HEAD, building with --enable-coverage still fails on my
Ubuntu 20.04:

    ! nm -A -u libpq.so.5.15 2>/dev/null | grep -v __cxa_atexit | grep exit
    libpq.so.5.15:                 U exit@@GLIBC_2.2.5

I don't see any exit references in the libpq objects or in
libpgport_shlib, so it seems like libpgcommon_shlib is the culprit... I
assume turning off optimizations leads to less dead code elimination?

--Jacob

Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Jacob Champion <pchampion@vmware.com> writes:
> With latest HEAD, building with --enable-coverage still fails on my
> Ubuntu 20.04:

>     ! nm -A -u libpq.so.5.15 2>/dev/null | grep -v __cxa_atexit | grep exit
>     libpq.so.5.15:                 U exit@@GLIBC_2.2.5

Hm, weird.  I don't see that here on RHEL8, and 
https://coverage.postgresql.org seems to be working so it doesn't fail on
Alvaro's Debian setup either.  What configure options are you using?
Does "nm -u" report "exit" being referenced from any *.o in libpq,
or from any *_shlib.o in src/port/ or src/common/ ?

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Jacob Champion
Date:
On Fri, 2021-07-02 at 18:20 -0400, Tom Lane wrote:
> What configure options are you using?

Just `./configure --enable-coverage`, nothing else. I distclean'd right
before for good measure.

> Does "nm -u" report "exit" being referenced from any *.o in libpq,
> or from any *_shlib.o in src/port/ or src/common/ ?

Only src/common:

    controldata_utils_shlib.o:
                 U close
                 U __errno_location
                 U exit
    ...
    fe_memutils_shlib.o:
                 U exit
    ...
    file_utils_shlib.o:
                 U close
                 U closedir
                 U __errno_location
                 U exit
    ...
    hex_shlib.o:
                 U exit
    ...
    psprintf_shlib.o:
                 U __errno_location
                 U exit
    ...
    stringinfo_shlib.o:
                 U __errno_location
                 U exit
    ...
    username_shlib.o:
                 U __errno_location
                 U exit
    ...

--Jacob

Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Jacob Champion <pchampion@vmware.com> writes:
> On Fri, 2021-07-02 at 18:20 -0400, Tom Lane wrote:
>> What configure options are you using?

> Just `./configure --enable-coverage`, nothing else. I distclean'd right
> before for good measure.

Hmph.  There's *something* different about your setup from what
either Alvaro or I tried.  What's the compiler (and version)?
What's the platform exactly?

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Jacob Champion
Date:
On Fri, 2021-07-02 at 18:45 -0400, Tom Lane wrote:
> Jacob Champion <pchampion@vmware.com> writes:
> > On Fri, 2021-07-02 at 18:20 -0400, Tom Lane wrote:
> > > What configure options are you using?
> > Just `./configure --enable-coverage`, nothing else. I distclean'd right
> > before for good measure.
> 
> Hmph.  There's *something* different about your setup from what
> either Alvaro or I tried.  What's the compiler (and version)?
> What's the platform exactly?

    $ gcc --version
    gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
    Copyright (C) 2019 Free Software Foundation, Inc.
    ...

    $ cat /etc/os-release
    NAME="Ubuntu"
    VERSION="20.04.2 LTS (Focal Fossa)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 20.04.2 LTS"
    VERSION_ID="20.04"
    ...

    $ uname -a
    Linux HOSTNAME 5.8.0-59-generic #66~20.04.1-Ubuntu SMP Thu Jun 17 11:14:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

--Jacob

Re: Preventing abort() and exit() calls in libpq

From
"alvherre@alvh.no-ip.org"
Date:
On 2021-Jul-02, Jacob Champion wrote:

> Only src/common:
> 
>     controldata_utils_shlib.o:
>                  U close
>                  U __errno_location
>                  U exit

Actually, I do see these in the .o files as well, but they don't make it
to the .a file.

gcc here is 8.3.0.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
"alvherre@alvh.no-ip.org" <alvherre@alvh.no-ip.org> writes:
> gcc here is 8.3.0.

Hmmm ... mine is 8.4.1.

I'm about to go out to dinner, but will check into this with some
newer gcc versions later.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Noah Misch
Date:
On Fri, Jul 02, 2021 at 11:20:17AM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Now it's hoverfly:
> > ! nm -A -u libpq.so.5 2>/dev/null | grep -v __cxa_atexit | grep exit
> > libpq.so.5: atexit               U           -
> > libpq.so.5: pthread_exit         U           -
> 
> Ugh.  What in the world is producing those references?

Those come from a statically-linked libldap_r:

$ nm -A -u /home/nm/sw/nopath/openldap-64/lib/libldap_r.a|grep exit
/home/nm/sw/nopath/openldap-64/lib/libldap_r.a[tpool.o]: .ldap_pvt_thread_exit U           -
/home/nm/sw/nopath/openldap-64/lib/libldap_r.a[thr_posix.o]: .pthread_exit        U           -
/home/nm/sw/nopath/openldap-64/lib/libldap_r.a[init.o]: .atexit              U           -



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Fri, Jul 02, 2021 at 11:20:17AM -0400, Tom Lane wrote:
>> Ugh.  What in the world is producing those references?

> Those come from a statically-linked libldap_r:

Blech!  I wonder if there is some way to avoid counting that.
It's not really hard to imagine that such a library might
contain an exit() call, for example, thus negating our test
altogether.

I'm now wondering about applying the test to *.o in libpq,
as well as libpgport_shlib.a and libpgcommon_shlib.a.
The latter would require some code changes, and it would make
the prohibition extend further than libpq alone.  On the bright
side, we could reinstate the check for abort().

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
I wrote:
> I'm now wondering about applying the test to *.o in libpq,
> as well as libpgport_shlib.a and libpgcommon_shlib.a.
> The latter would require some code changes, and it would make
> the prohibition extend further than libpq alone.  On the bright
> side, we could reinstate the check for abort().

After consuming a bit more caffeine, I'm afraid that won't work.
I'd imagined leaving, e.g., psprintf.c out of libpgcommon_shlib.a.
But if someone mistakenly introduced a psprintf call into libpq,
it'd still compile just fine; the symbol would be resolved against
psprintf in the calling application's code.  We'd only detect a
failure when trying to use libpq with an app that didn't contain
that function, which feels like something that our own testing
could miss.

What I'm now thinking about is restricting the test to only be run on
platforms where use of foo.a libraries is deprecated, so that we can
be pretty sure that we won't hit this situation.  Even if we only
run the test on Linux, that'd be plenty to catch any mistakes.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
I wrote:
> Hmmm ... mine is 8.4.1.
> I'm about to go out to dinner, but will check into this with some
> newer gcc versions later.

Tried --enable-coverage on Fedora 34 (with gcc 11.1.1) and sure
enough there's an exit() call being inserted.  I've pushed a fix
to just disable the check altogether in --enable-coverage builds.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Noah Misch
Date:
On Sat, Jul 03, 2021 at 10:45:59AM -0400, Tom Lane wrote:
> I'd imagined leaving, e.g., psprintf.c out of libpgcommon_shlib.a.
> But if someone mistakenly introduced a psprintf call into libpq,
> it'd still compile just fine; the symbol would be resolved against
> psprintf in the calling application's code.

I think that would fail to compile on Windows, where such references need
exported symbols.  We don't make an exports file for applications other than
postgres.exe.  So the strategy that inspired this may work.

> What I'm now thinking about is restricting the test to only be run on
> platforms where use of foo.a libraries is deprecated, so that we can
> be pretty sure that we won't hit this situation.  Even if we only
> run the test on Linux, that'd be plenty to catch any mistakes.

Hmm.  Static libraries are the rarer case on both AIX and Linux, but I'm not
aware of a relevant deprecation on either platform.  If it comes this to, I'd
be more inclined to control the Makefile rule with an environment variable
(e.g. ENFORCE_LIBC_CALL_RESTRICTIONS) instead of reacting to the platform.



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Sat, Jul 03, 2021 at 10:45:59AM -0400, Tom Lane wrote:
>> What I'm now thinking about is restricting the test to only be run on
>> platforms where use of foo.a libraries is deprecated, so that we can
>> be pretty sure that we won't hit this situation.  Even if we only
>> run the test on Linux, that'd be plenty to catch any mistakes.

> Hmm.  Static libraries are the rarer case on both AIX and Linux, but I'm not
> aware of a relevant deprecation on either platform.  If it comes this to, I'd
> be more inclined to control the Makefile rule with an environment variable
> (e.g. ENFORCE_LIBC_CALL_RESTRICTIONS) instead of reacting to the platform.

That'd require buildfarm owner intervention, as well as intervention
by users.  Which seems like exporting our problems onto them.  I'd
really rather not go that way if we can avoid it.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Noah Misch
Date:
On Sat, Jul 03, 2021 at 06:44:20PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Sat, Jul 03, 2021 at 10:45:59AM -0400, Tom Lane wrote:
> >> What I'm now thinking about is restricting the test to only be run on
> >> platforms where use of foo.a libraries is deprecated, so that we can
> >> be pretty sure that we won't hit this situation.  Even if we only
> >> run the test on Linux, that'd be plenty to catch any mistakes.
> 
> > Hmm.  Static libraries are the rarer case on both AIX and Linux, but I'm not
> > aware of a relevant deprecation on either platform.  If it comes this to, I'd
> > be more inclined to control the Makefile rule with an environment variable
> > (e.g. ENFORCE_LIBC_CALL_RESTRICTIONS) instead of reacting to the platform.
> 
> That'd require buildfarm owner intervention, as well as intervention
> by users.  Which seems like exporting our problems onto them.  I'd
> really rather not go that way if we can avoid it.

I like that goal, though we'll have to see how difficult it proves.  As of
today, a GNU/Linux user building against static OpenLDAP will get a failure,
right?  That would export work onto that user, spuriously.  Since the non-AIX
user count dwarfs the AIX user count, expect a user complaint from non-AIX
first.

We'd get something like 95% of the value by running the test on one Windows
buildfarm member and one non-Windows buildfarm member.  If you did gate the
check on an environment variable, there would be no need to angle for broad
adoption.  Still, I agree avoiding that configuration step is nice, all else
being equal.  A strategy not having either of those drawbacks would be to skip
the test if libpq.so contains a definition of libpq_unbind().  If any other
dependency contains exit calls, we'd likewise probe for one symbol of that
library and skip the test if presence of that symbol reveals static linking.
(That's maintenance-prone in its own way, but a maintenance-free strategy has
not appeared.)



Re: Preventing abort() and exit() calls in libpq

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Sat, Jul 03, 2021 at 06:44:20PM -0400, Tom Lane wrote:
>> That'd require buildfarm owner intervention, as well as intervention
>> by users.  Which seems like exporting our problems onto them.  I'd
>> really rather not go that way if we can avoid it.

> I like that goal, though we'll have to see how difficult it proves.  As of
> today, a GNU/Linux user building against static OpenLDAP will get a failure,
> right?  That would export work onto that user, spuriously.

As a former packager for Red Hat, my response would be "you're doing it
wrong".  Nobody on any Linux distro should *ever* statically link code
from one package into code from another, because they are going to create
untold pain for themselves when (not if) the first package is updated.
So I flat out reject that as a valid use-case.

It may be that that ethos is not so strongly baked-in on other platforms.
But I'm content to wait and see if there are complaints before rescinding
the automatic test; and if there are, I'd prefer to deal with it by just
backing off to running the test on Linux only.

> We'd get something like 95% of the value by running the test on one Windows
> buildfarm member and one non-Windows buildfarm member.

True.  But that just brings up the point that we aren't running the test
at all on MSVC builds right now.  I have no idea how to do that, do you?

> ...  A strategy not having either of those drawbacks would be to skip
> the test if libpq.so contains a definition of libpq_unbind().

I assume you meant some OpenLDAP symbol?

> If any other
> dependency contains exit calls, we'd likewise probe for one symbol of that
> library and skip the test if presence of that symbol reveals static linking.
> (That's maintenance-prone in its own way, but a maintenance-free strategy has
> not appeared.)

I'm more worried about the risk of failing to detect problems at all,
in case somebody fat-fingers things in a way that causes the test to
be skipped everywhere.

I'll keep that way in mind if we conclude that the existing way is
unworkable, but so far I don't think it is.

            regards, tom lane



Re: Preventing abort() and exit() calls in libpq

From
Noah Misch
Date:
On Fri, Jul 09, 2021 at 10:06:18AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Sat, Jul 03, 2021 at 06:44:20PM -0400, Tom Lane wrote:
> >> That'd require buildfarm owner intervention, as well as intervention
> >> by users.  Which seems like exporting our problems onto them.  I'd
> >> really rather not go that way if we can avoid it.
> 
> > I like that goal, though we'll have to see how difficult it proves.  As of
> > today, a GNU/Linux user building against static OpenLDAP will get a failure,
> > right?  That would export work onto that user, spuriously.
> 
> As a former packager for Red Hat, my response would be "you're doing it
> wrong".  Nobody on any Linux distro should *ever* statically link code
> from one package into code from another, because they are going to create
> untold pain for themselves when (not if) the first package is updated.
> So I flat out reject that as a valid use-case.
> 
> It may be that that ethos is not so strongly baked-in on other platforms.

Packagers do face more rules than users generally.

> But I'm content to wait and see if there are complaints before rescinding
> the automatic test; and if there are, I'd prefer to deal with it by just
> backing off to running the test on Linux only.

Okay.

> > We'd get something like 95% of the value by running the test on one Windows
> > buildfarm member and one non-Windows buildfarm member.
> 
> True.  But that just brings up the point that we aren't running the test
> at all on MSVC builds right now.  I have no idea how to do that, do you?

I don't.  But coverage via non-MSVC Windows is good enough.

> > ...  A strategy not having either of those drawbacks would be to skip
> > the test if libpq.so contains a definition of libpq_unbind().
> 
> I assume you meant some OpenLDAP symbol?

Yeah, that was supposed to say ldap_unbind().