Re: Regression tests fail with musl libc because libpq.so can't be loaded - Mailing list pgsql-bugs
From | Wolfgang Walther |
---|---|
Subject | Re: Regression tests fail with musl libc because libpq.so can't be loaded |
Date | |
Msg-id | f98cd8de-1c66-491a-8409-e62c09932080@technowledgy.de Whole thread Raw |
In response to | Re: Regression tests fail with musl libc because libpq.so can't be loaded (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Re: Regression tests fail with musl libc because libpq.so can't be loaded |
List | pgsql-bugs |
Thomas Munro: > Of course we have to distinguish between the basic argv[] clobbering > trick which is barely even a trick, and the more advanced environ > stealing trick which confuses musl. Right. The latter not only confuses musl, but also makes /proc/<pid>/environ return garbage. This is also mentioned at the bottom of main.c, which has a workaround for the specific case of UBSan depending on that. This is kind of funny: Because we are relying on undefined behavior regarding the modification of environ, we need a workaround for the "UndefinedBehaviorSanitizer" - I guess by failing without this workaround, it wanted to tell us something.. This happens on glibc, too. So summarizing: 1. The simple approach is to use PS_USE_CLOBBER_ARGV on Linux only for glibc and other known-to-be-good-and-identifiable libc variants, otherwise default to PS_USE_NONE. This will not only keep the problem for /proc/../environ for glibc users, but also disable ps status for musl entirely. Considering that probably the biggest use-case for musl is to run postgres in containers, it's quite likely to actually run more than just one cluster on a single machine. In this case... ps status would be especially handy to identify which cluster a process belongs to. 2. The next proposal was to stop clobbering environ once LD_LIBRARY_PATH / LD_PRELOAD is found to keep those intact. This will keep ps status support on musl, which is good. But the /proc/.../environ problem will still be there, unchanged. Both of those approaches rely on the undefined behavior of clobbering environ. 3. The logical consequence of this is, to stop clobbering environ and use only the available argv space. However, this will quickly leave us with a very small ps status buffer to work with, making the feature less useful. Note, that this could happen theoretically by starting postgres with the fewest arguments and environment possible, too. Not sure what the minimal buffer size is that could be achieved that way. The point is: The buffer size is not guaranteed at all. 4. The upstream (musl) suggestion of which I sent a PoC was to "exec yourself with a bigger argv". This works. I chose to pad argv0 with trailing slashes. Those can safely be stripped away again, because any argv0 which would come with a trailing slash to start with, would not be the current executable, but a directory - so would fail exec immediately anyway. This keeps /proc/.../environ intact and does not rely on undefined behavior. Additionally, we get a guaranteed ps buffer size of 256, which is what we use on BSDs and Windows, too. I wonder why we actually fall back to PS_USE_NONE by default.. and how much of that is related to the environment clobbering to start with? Could we even use the exec-approach as the fallback in all other cases except BSDs and Windows and get rid of PS_USE_NONE? Clobbering only argv sure seems way safer to do than what we do right now. Best, Wolfgang
pgsql-bugs by date: