Thread: macOS SIP, next try

macOS SIP, next try

From

Peter Eisentraut

Date:

02 December 2020, 14:28:28

Previous discussions on macOS SIP:

https://www.postgresql.org/message-id/flat/561E73AB.8060800%40gmx.net
https://www.postgresql.org/message-id/flat/CA%2BTgmoYGi5oR8Rvb2-SY1_WEjd76H5gCqSukxFQ66qR7MewWAQ%40mail.gmail.com
https://www.postgresql.org/message-id/flat/6a4d6124-41f0-756b-0811-c5c5def7ef4b%402ndquadrant.com

Tests that use a temporary installation (i.e., make check) use a
platform-specific environment variable to make programs and libraries
find shared libraries in the temporary installation rather than in the
actual configured installation target directory. On macOS, this is
DYLD_LIBRARY_PATH. Ever since System Integrity Protection (SIP)
arrived, this hasn't worked anymore, because DYLD_LIBRARY_PATH gets
disabled by the system (in somewhat obscure ways). The workaround was
to either disable SIP or to do make install before make check.

The solution here is to use install_name_tool to edit the shared library
path in the binaries (programs and libraries) after installation into
the temporary prefix.

The solution is, I think, correct in principle, but the implementation
is hackish, since it hardcodes the names and versions of the interesting
shared libraries in an unprincipled way. More refinement is needed
here. Ideas welcome.

In order to allow all that, we need to link everything with the option
-headerpad_max_install_names so that there is enough room in the
binaries to put in the temporary path in place of the original one.
(The temporary path is strictly longer than the original one.) This
bloats the binaries, so it might only be appropriate for development
builds and should perhaps be hidden behind some option. Ideas also
welcome here.

The attached patch works for me. You can see that it disables setting
DYLD_LIBRARY_PATH, but the tests still work. Verification on other
system variantions welcome, of course.

Comments welcome. I think this direction is promising, but some details
need to be addressed.

Attachment

0001-WIP-Fix-temp-install-tests-to-work-with-macOS-SIP.patch

Re: macOS SIP, next try

From

Tom Lane

Date:

05 January 2021, 03:54:02

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> The solution here is to use install_name_tool to edit the shared library
> path in the binaries (programs and libraries) after installation into
> the temporary prefix.

Ah, this indeed seems like a promising direction.

> The solution is, I think, correct in principle, but the implementation
> is hackish, since it hardcodes the names and versions of the interesting
> shared libraries in an unprincipled way.  More refinement is needed
> here.  Ideas welcome.

Yeah, it's pretty hacky.  I tried the patch as given, and "make check"
failed with

error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/install_name_tool: can't
openfile: /Users/tgl/pgsql/tmp_install//Users/tgl/testversion/lib/*.so (No such file or directory) 
make: *** [temp-install] Error 1

evidently because there are no files named *.so in the "lib" directory
in my build.  I removed '$(abs_top_builddir)/tmp_install/$(libdir)'/*.so
from the "call temp_install_postprocess" invocation, and then it got
through "make temp-install", and most of the core regression tests worked.
But a couple of them failed with

+ERROR:  could not load library
"/Users/tgl/pgsql/tmp_install/Users/tgl/testversion/lib/postgresql/libpqwalreceiver.so":
dlopen(/Users/tgl/pgsql/tmp_install/Users/tgl/testversion/lib/postgresql/libpqwalreceiver.so,10): Library not loaded:
/Users/tgl/testversion/lib/libpq.5.dylib
+  Referenced from: /Users/tgl/pgsql/tmp_install/Users/tgl/testversion/lib/postgresql/libpqwalreceiver.so
+  Reason: image not found

What this looks like to me is some confusion about whether "pgsql"
or "postgresql" has been injected into the install path or not.
(This build is using /Users/tgl/testversion as the install --prefix;
I surmise that your path had "pgsql" or "postgres" in it already.)

One plausible way to fix this is to use "find" on the install tree
to locate files to modify, instead of hard-wiring knowledge into
the "call temp_install_postprocess" invocation.  A possibly better
way is to arrange to invoke install_name_tool once per installed
executable or shlib, at the time we do "make install" for it.
Not sure how hard the latter approach is given our Makefiles.

Another suggestion, after reading "man install_name_tool", is
that maybe you could avoid having to know the names of the specific
libraries if you didn't use per-library '-change' switches, but
instead installed the test-installation rpath with the -add_rpath
switch.  This would be closer to the spirit of the existing test
process anyway.

Aside from removing that annoying requirement, this method would
bound the amount of extra header space needed, so instead of
"-headerpad_max_install_names" we could use "-headerpad N" with
some not-hugely-generous N.  That ought to damp down the executable
bloat to the point where we'd not have to worry too much about it.

(Pokes around ... it appears that on my old dinosaur prairiedog,
install_name_tool exists and has the -change switch, but not
anything for rpath.  So we'd better investigate how old -add_rpath
is.  Maybe there's another way to set rpath that works further back?
At least the -headerpad switches exist there.)

I haven't experimented with any of these ideas, so maybe they won't
work for some reason.  But I think we ought to push forward looking
for a solution, because it's getting harder and harder to disable
SIP and still have a useful macOS installation :-(

            regards, tom lane

Re: macOS SIP, next try

From

Mark Dilger

Date:

05 January 2021, 04:03:27


> On Dec 2, 2020, at 6:28 AM, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> Previous discussions on macOS SIP:
>
> https://www.postgresql.org/message-id/flat/561E73AB.8060800%40gmx.net
> https://www.postgresql.org/message-id/flat/CA%2BTgmoYGi5oR8Rvb2-SY1_WEjd76H5gCqSukxFQ66qR7MewWAQ%40mail.gmail.com
> https://www.postgresql.org/message-id/flat/6a4d6124-41f0-756b-0811-c5c5def7ef4b%402ndquadrant.com

See also https://www.postgresql.org/message-id/flat/18012hGLG6HJ9pQDkHAMYuwQKg%40sparkpost.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: macOS SIP, next try

From

Tom Lane

Date:

05 January 2021, 18:24:11

Mark Dilger <mark.dilger@enterprisedb.com> writes:
> See also https://www.postgresql.org/message-id/flat/18012hGLG6HJ9pQDkHAMYuwQKg%40sparkpost.com

Yeah.  As I recall from that thread and prior ones, the bottleneck is
really just /bin/sh: something, either the kernel or sh itself, is
clearing out DYLD_LIBRARY_PATH when a shell script starts.  (Since
PATH is not reset at the same time, this seems the height of
stupidity/uselessness; but I digress.)  But if "make" has a setting
for that variable, the setting *is* successfully passed down to
executed programs.  Which implies that make doesn't use the shell while
running stuff, which seems odd.  But it gives us a possible escape
hatch.

I experimented a bit, and found that the attached very-quick-hack
patch is enough to get through "make check" and also the isolation
tests.  Unsurprisingly, none of the TAP tests work.  I experimented
a bit and it looks like "prove" may be failing to pass down
DYLD_LIBRARY_PATH to the test scripts, which would be annoying but
could be worked around (say, by getting TestLib.pm to pull the
value from somewhere).

Based on what I've got here, this avenue probably ends up messier
and more invasive than the install_name_tool avenue, since we'd
likely have to teach quite a few places about explicitly passing
down DYLD_LIBRARY_PATH.  It does have the advantage that "make
check" doesn't have to test executables that are different from
the bits we'll really install; but that's probably not worth all
that much.  Unless someone can think of a cute way to centralize
the changes some more, I'm inclined to think this isn't the way
forward.  But it seemed worth posting the results anyway.

            regards, tom lane

diff --git a/src/common/exec.c b/src/common/exec.c
index 6cbc820042..4de79f94ef 100644
--- a/src/common/exec.c
+++ b/src/common/exec.c
@@ -323,8 +323,9 @@ int
 find_other_exec(const char *argv0, const char *target,
                 const char *versionstr, char *retpath)
 {
-    char        cmd[MAXPGPATH];
+    char        cmd[MAXPGPATH * 2];
     char        line[MAXPGPATH];
+    char       *dlp = getenv("DYLD_LIBRARY_PATH");

     if (find_my_exec(argv0, retpath) < 0)
         return -1;
@@ -340,7 +341,11 @@ find_other_exec(const char *argv0, const char *target,
     if (validate_exec(retpath) != 0)
         return -1;

-    snprintf(cmd, sizeof(cmd), "\"%s\" -V", retpath);
+    if (dlp)
+        snprintf(cmd, sizeof(cmd), "DYLD_LIBRARY_PATH=\"%s\" \"%s\" -V",
+                 dlp, retpath);
+    else
+        snprintf(cmd, sizeof(cmd), "\"%s\" -V", retpath);

     if (!pipe_read_line(cmd, line, sizeof(line)))
         return -1;
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 5cfb4c4a49..dbde78608a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -77,6 +77,7 @@ bool        debug = false;
 char       *inputdir = ".";
 char       *outputdir = ".";
 char       *bindir = PGBINDIR;
+char       *setlibpath = NULL;
 char       *launcher = NULL;
 static _stringlist *loadextension = NULL;
 static int    max_connections = 0;
@@ -270,7 +271,8 @@ stop_postmaster(void)
         fflush(stderr);

         snprintf(buf, sizeof(buf),
-                 "\"%s%spg_ctl\" stop -D \"%s/data\" -s",
+                 "%s\"%s%spg_ctl\" stop -D \"%s/data\" -s",
+                 setlibpath,
                  bindir ? bindir : "",
                  bindir ? "/" : "",
                  temp_instance);
@@ -1135,7 +1137,8 @@ psql_command(const char *database, const char *query,...)

     /* And now we can build and execute the shell command */
     snprintf(psql_cmd, sizeof(psql_cmd),
-             "\"%s%spsql\" -X -c \"%s\" \"%s\"",
+             "%s\"%s%spsql\" -X -c \"%s\" \"%s\"",
+             setlibpath,
              bindir ? bindir : "",
              bindir ? "/" : "",
              query_escaped,
@@ -1187,7 +1190,7 @@ spawn_process(const char *cmdline)
          */
         char       *cmdline2;

-        cmdline2 = psprintf("exec %s", cmdline);
+        cmdline2 = psprintf("%sexec %s", setlibpath, cmdline);
         execl(shellprog, shellprog, "-c", cmdline2, (char *) NULL);
         fprintf(stderr, _("%s: could not exec \"%s\": %s\n"),
                 progname, shellprog, strerror(errno));
@@ -2290,6 +2293,15 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
     unlimit_core_size();
 #endif

+    {
+        char   *dlp = getenv("DYLD_LIBRARY_PATH");
+
+        if (dlp)
+            setlibpath = psprintf("DYLD_LIBRARY_PATH=\"%s\" ", dlp);
+        else
+            setlibpath = "";
+    }
+
     if (temp_instance)
     {
         FILE       *pg_conf;
@@ -2324,7 +2336,8 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
         /* initdb */
         header(_("initializing database system"));
         snprintf(buf, sizeof(buf),
-                 "\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1",
+                 "%s\"%s%sinitdb\" -D \"%s/data\" --no-clean --no-sync%s%s > \"%s/log/initdb.log\" 2>&1",
+                 setlibpath,
                  bindir ? bindir : "",
                  bindir ? "/" : "",
                  temp_instance,
@@ -2397,7 +2410,8 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
          * Check if there is a postmaster running already.
          */
         snprintf(buf2, sizeof(buf2),
-                 "\"%s%spsql\" -X postgres <%s 2>%s",
+                 "%s\"%s%spsql\" -X postgres <%s 2>%s",
+                 setlibpath,
                  bindir ? bindir : "",
                  bindir ? "/" : "",
                  DEVNULL, DEVNULL);

Re: macOS SIP, next try

From

Tom Lane

Date:

22 January 2021, 17:32:16

BTW, a couple other things that should be noted here:

* Per observation in a nearby thread, install_name_tool seems to
be provided by Apple's "Command Line Tools", but not by Xcode.
This is also true of bison and flex, but we don't require those
in a build-from-tarball.  So relying on install_name_tool would
represent an expansion of our requirements for end-user builds.
I don't think this is a fatal objection, not least because the
CLT are a great deal smaller than Xcode --- so we likely ought to
encourage people to install just the former.  But it's a change.

* By chance I came across a previous thread in which someone
suggested use of install_name_tool:

https://www.postgresql.org/message-id/flat/CAN-RpxCFWqPXQD8CqP3UBqdMwUgQWLG%2By7vQgxQdJR8aiKB89A%40mail.gmail.com

I griped in that thread that it didn't help for test executables
that don't get installed, such as isolationtester, because the
hack never got applied to them.  Re-reading Peter's patch with
that in mind, I see he replaces the path in isolationtester
in-place, which means that it works in "make check" but will
fail in "make installcheck".  So I'm not sure how to cope
with that, but it's a problem.

            regards, tom lane