Re: Regression tests fail with musl libc because libpq.so can't be loaded - Mailing list pgsql-bugs
From | Thomas Munro |
---|---|
Subject | Re: Regression tests fail with musl libc because libpq.so can't be loaded |
Date | |
Msg-id | CA+hUKG+Tq3GK7bPd03N0Eox3YY4-Hjd7qQjo_QZFjdbhTqQGQA@mail.gmail.com Whole thread Raw |
In response to | Re: Regression tests fail with musl libc because libpq.so can't be loaded (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Regression tests fail with musl libc because libpq.so can't be loaded
(Thomas Munro <thomas.munro@gmail.com>)
|
List | pgsql-bugs |
On Tue, Mar 19, 2024 at 3:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > (Hmm, I think it's not that unreasonable on their part to assume the > > initial environment is immutable if their implementation doesn't > > mutate it, and our doing so is undeniably UB; surprising, maybe, given > > that the technique works on that other popular brand of C library on > > that kind of kernel, not to mention dozens of old Unixen of yore... > > Does their implementation also ignore the effects of putenv() or > setenv() on LD_LIBRARY_PATH? They have no moral high ground > whatsoever if that's the case. But if it doesn't, an alternative > route to a solution could be to scan the original environment, strdup > and putenv each entry to move it to freshly malloc'd space, and > then reclaim the old environment area. Yes, the musl linker/loader ignores putenv()/setenv() changes to LD_LIBRARY_PATH after process start (that is, changes only effect the search path when injected into a new program with exec*()). As does glibc, it's just that it captures by copy instead of reference (according to one of the links above, I didn't check the source). So setenv() has no effect on dlopen() in *this* program, and using putenv in that way won't help. We simply can't move the value of LD_LIBRARY_PATH (though my patch could be a little sneakier and steal all the bytes right up to the = sign to get more space for our message!). One way to tell if a copy has been made is to trace a program that does: getenv("LD_LIBRARY_PATH")[2] = 'X'; dlopen("foo.so", RTLD_NOW | RTLD_GLOBAL); ... when run with LD_LIBRARY_PATH set to /asdf. On FreeBSD I see it tries to open "/aXdf...", so now I know that FreeBSD also captures it by reference like musl. But we don't use the clobber trick on FreeBSD, it has a proper setproctitle() function that knows how to negotiate with the kernel, so it doesn't matter. It also ignores changes made with setent()/putenv(), because those create fresh entries but leave the initial environment strings untouched. Solaris also ignores changes made after startup (it's in the dlopen man page), and from a very quick look at its ld_lib_setup() I think it achieved that with a copy. I believe its ancestor SunOS 4 invented all of these conventions (and the mmap/virtual memory concepts they rode in on), later nailed down to some degree in the System V ABI and very widely adopted, but I don't see anything in the latter that specifically addresses this point, eg LD_LIBRARY copy vs reference and interaction with dlopen() (perhaps I didn't look hard enough). I'm not sure what else you can point to to make strong claims about this stuff, but I bet every system ignores changes after startup, it's just that they found two ways to achieve that. POSIX says of dlopen that the "file [argument] is used in an implementation-defined manner", and of environ that we're welcome to swap a whole new environ, but doesn't seem to tell us anything about the one that is replaced (who owns it? is the initial one set up at execution time special? etc). The line banning manipulation of the pointers environ refers to doesn't exactly describe what we're doing (we're manipulating the strings pointed to by the *previous* environ). UB.
pgsql-bugs by date: