Thread: `make check` doesn't pass on MacOS Catalina
Hi hackers,
While trying to build PostgreSQL from source (master branch, 95c3a195) on a MacBook I discovered that `make check` fails:
============== removing existing temp instance ==============
============== creating temporary instance ==============
============== initializing database system ==============
============== starting postmaster ==============
sh: line 1: 33559 Abort trap: 6 "psql" -X postgres < /dev/null 2> /dev/null
sh: line 1: 33562 Abort trap: 6 "psql" -X postgres < /dev/null 2> /dev/null
...
sh: line 1: 33742 Abort trap: 6 "psql" -X postgres < /dev/null 2> /dev/null
pg_regress: postmaster did not respond within 60 seconds
Examine /Users/eax/projects/c/postgresql/src/test/regress/log/postmaster.log for the reason
make[1]: *** [check] Error 2
make: *** [check] Error 2
============== creating temporary instance ==============
============== initializing database system ==============
============== starting postmaster ==============
sh: line 1: 33559 Abort trap: 6 "psql" -X postgres < /dev/null 2> /dev/null
sh: line 1: 33562 Abort trap: 6 "psql" -X postgres < /dev/null 2> /dev/null
...
sh: line 1: 33742 Abort trap: 6 "psql" -X postgres < /dev/null 2> /dev/null
pg_regress: postmaster did not respond within 60 seconds
Examine /Users/eax/projects/c/postgresql/src/test/regress/log/postmaster.log for the reason
make[1]: *** [check] Error 2
make: *** [check] Error 2
```
A little investigation revealed that pg_regres executes postgres like this:
```
PATH="/Users/eax/projects/c/postgresql/tmp_install/Users/eax/pginstall/bin:$PATH" DYLD_LIBRARY_PATH="/Users/eax/projects/c/postgresql/tmp_install/Users/eax/pginstall/lib" "postgres" -D "/Users/eax/projects/c/postgresql/src/test/regress/./tmp_check/data" -F -c "listen_addresses=" -k "/Users/eax/pgtmp/pg_regress-S34sXM" > "/Users/eax/projects/c/postgresql/src/test/regress/log/postmaster.log"
```
... and checks that it's online by executing:
```
PATH="/Users/eax/projects/c/postgresql/tmp_install/Users/eax/pginstall/bin:$PATH" DYLD_LIBRARY_PATH="/Users/eax/projects/c/postgresql/tmp_install/Users/eax/pginstall/lib" psql -X postgres
```
The last command fails with:
```
psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or directory. Is the server running locally and accepting connections on that socket?
```
This is because the actual path to the socket is:
```
~/pgtmp/pg_regress-S34sXM/.s.PGSQL.5432
```
While debugging this I also discovered that psql uses /usr/lib/libpq.5.dylib library, according to the `image list` command in LLDB. The library is provided with the system and can't be moved or deleted. In other words, it seems to ignore DYLD_LIBRARY_PATH. I've found an instruction [1] that suggests that this is a behavior of MacOS integrity protection and describes how it can be disabled. Sadly it made no difference in my case, psql still ignores DYLD_LIBRARY_PATH.
While I'm still in the progress of investigating this I just wanted to ask if anyone is developing on MacOS and observes anything similar and had any luck solving the problem? I tried to search through the mailing list but didn't find anything relevant. The complete script that reproduces the issue is attached. I'm using the same script on Ubuntu VM, where it works just fine.
Best regards,
Aleksander Alekseev
Attachment
Aleksander Alekseev <aleksander@timescale.com> writes: > While trying to build PostgreSQL from source (master branch, 95c3a195) on a > MacBook I discovered that `make check` fails: This is the usual symptom of not having disabled SIP :-(. If you don't want to do that, do "make install" before "make check". regards, tom lane
Hi Tom, Many thanks, running "make install" before "make check" helped. On Tue, Apr 20, 2021 at 6:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Aleksander Alekseev <aleksander@timescale.com> writes: > > While trying to build PostgreSQL from source (master branch, 95c3a195) on a > > MacBook I discovered that `make check` fails: > > This is the usual symptom of not having disabled SIP :-(. > > If you don't want to do that, do "make install" before "make check". > > regards, tom lane -- Best regards, Aleksander Alekseev
On 4/20/21 11:02 AM, Tom Lane wrote: > Aleksander Alekseev <aleksander@timescale.com> writes: >> While trying to build PostgreSQL from source (master branch, 95c3a195) on a >> MacBook I discovered that `make check` fails: > This is the usual symptom of not having disabled SIP :-(. > > If you don't want to do that, do "make install" before "make check". > > FYI the buildfarm client has a '--delay-check' option that does exactly this. It's useful on Alpine Linux as well as MacOS cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Xing GUO <higuoxing@gmail.com> writes: > Thank you very much. I'm facing the same problem yesterday. May I > suggest that document it in the installation guide on MacOS platform? It is documented --- see last para under https://www.postgresql.org/docs/current/installation-platform-notes.html#INSTALLATION-NOTES-MACOS regards, tom lane
On Tue, Apr 20, 2021 at 9:06 AM Andrew Dunstan <andrew@dunslane.net> wrote: > > On 4/20/21 11:02 AM, Tom Lane wrote: > > Aleksander Alekseev <aleksander@timescale.com> writes: > >> While trying to build PostgreSQL from source (master branch, 95c3a195) on a > >> MacBook I discovered that `make check` fails: > > This is the usual symptom of not having disabled SIP :-(. > > > > If you don't want to do that, do "make install" before "make check". > FYI the buildfarm client has a '--delay-check' option that does exactly > this. It's useful on Alpine Linux as well as MacOS I was trying to set up a buildfarm animal, and this exact problem lead to a few hours of debugging and hair-pulling. Can the default behaviour be changed in buildfarm client to perform `make check` only after `make install`. Current buildfarm client code looks something like: make(); make_check() unless $delay_check; ... other steps ... make_install(); ... other steps-2... make_check() if $delay_check; There are no comments as to why one should choose to use --delay-check ($delay_check). This email, and the pointer to the paragraph buried in the docs, shared by Tom, are the only two ways one can understand what is causing this failure, and how to get around it. Naive question: What's stopping us from rewriting the code as follows. make(); make_install(); make_check(); ... other steps ... ... other steps-2... # or move make_check() call here With a quick google search I could not find why --delay-check is necessary on Apline linux, as well; can you please elaborate. Best regards, Gurjeet http://Gurje.et
On 2022-08-06 Sa 06:49, Gurjeet Singh wrote: > On Tue, Apr 20, 2021 at 9:06 AM Andrew Dunstan <andrew@dunslane.net> wrote: >> On 4/20/21 11:02 AM, Tom Lane wrote: >>> Aleksander Alekseev <aleksander@timescale.com> writes: >>>> While trying to build PostgreSQL from source (master branch, 95c3a195) on a >>>> MacBook I discovered that `make check` fails: >>> This is the usual symptom of not having disabled SIP :-(. >>> >>> If you don't want to do that, do "make install" before "make check". >> FYI the buildfarm client has a '--delay-check' option that does exactly >> this. It's useful on Alpine Linux as well as MacOS > I was trying to set up a buildfarm animal, and this exact problem lead > to a few hours of debugging and hair-pulling. Can the default > behaviour be changed in buildfarm client to perform `make check` only > after `make install`. > > Current buildfarm client code looks something like: > > make(); > make_check() unless $delay_check; > ... other steps ... > make_install(); > ... other steps-2... > make_check() if $delay_check; > > There are no comments as to why one should choose to use --delay-check > ($delay_check). This email, and the pointer to the paragraph buried in > the docs, shared by Tom, are the only two ways one can understand what > is causing this failure, and how to get around it. > > Naive question: What's stopping us from rewriting the code as follows. > make(); > make_install(); > make_check(); > ... other steps ... > ... other steps-2... > # or move make_check() call here > > With a quick google search I could not find why --delay-check is > necessary on Apline linux, as well; can you please elaborate. > I came across this when I was working on setting up some Dockerfiles for the buildfarm. Apparently LD_LIBRARY_PATH doesn't work on Alpine, at least out of the box, as it uses a different linker, and "make check" relies on it (or the moral equivalent) if "make install" hasn't been run. In general we want to run "make check" as soon as possible after running "make" on the core code. That's why I didn't simply delay it unconditionally. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > On 2022-08-06 Sa 06:49, Gurjeet Singh wrote: >> There are no comments as to why one should choose to use --delay-check >> ($delay_check). This email, and the pointer to the paragraph buried in >> the docs, shared by Tom, are the only two ways one can understand what >> is causing this failure, and how to get around it. > In general we want to run "make check" as soon as possible after running > "make" on the core code. That's why I didn't simply delay it > unconditionally. In general --- that is, on non-broken platforms --- "make check" *should* work without a prior "make install". I am absolutely not in favor of changing the buildfarm so that it fails to detect the problem if we break that. But for sure it'd make sense to add some comments to the wiki and/or sample config file explaining that you need to set this option on systems X,Y,Z. On macOS you need to use it if you haven't disabled SIP. I don't have the details about any other problem platforms. regards, tom lane
Andrew Dunstan <andrew@dunslane.net> writes: > I came across this when I was working on setting up some Dockerfiles for > the buildfarm. Apparently LD_LIBRARY_PATH doesn't work on Alpine, at > least out of the box, as it uses a different linker, and "make check" > relies on it (or the moral equivalent) if "make install" hasn't been run. I did some quick googling on this point. We seem not to be the only project having linking issues on Alpine, and yet it does support LD_LIBRARY_PATH according to some fairly authoritative-looking pages, eg https://www.musl-libc.org/doc/1.0.0/manual.html I suspect the situation is similar to macOS, ie there is some limitation somewhere on whether LD_LIBRARY_PATH gets passed through. If memory serves, the problem on SIP-enabled Mac is that DYLD_LIBRARY_PATH is cleared upon invoking bash, so that we lose it anywhere that "make" invokes a shell to run a subprogram. (Hmm ... I wonder whether ninja uses the shell ...) I don't personally care at all about Alpine, but maybe somebody who does could dig a little harder and characterize the problem there better. regards, tom lane
Hi, On 2022-08-06 11:25:09 -0400, Tom Lane wrote: > (Hmm ... I wonder whether ninja uses the shell ...) It does, but even if it didn't, we'd use a shell somewhere below perl or pg_regress :(. The meson build should still work without disabling SIP, I did the necessary hackery to set up the rpath equivalent up relatively. So both the real install target and the tmp_install/ should find libraries within themselves. Greetings, Andres Freund
On 2022-08-06 Sa 11:25, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> I came across this when I was working on setting up some Dockerfiles for >> the buildfarm. Apparently LD_LIBRARY_PATH doesn't work on Alpine, at >> least out of the box, as it uses a different linker, and "make check" >> relies on it (or the moral equivalent) if "make install" hasn't been run. > I did some quick googling on this point. We seem not to be the only > project having linking issues on Alpine, and yet it does support > LD_LIBRARY_PATH according to some fairly authoritative-looking pages, eg > > https://www.musl-libc.org/doc/1.0.0/manual.html > > I suspect the situation is similar to macOS, ie there is some limitation > somewhere on whether LD_LIBRARY_PATH gets passed through. If memory > serves, the problem on SIP-enabled Mac is that DYLD_LIBRARY_PATH is > cleared upon invoking bash, so that we lose it anywhere that "make" > invokes a shell to run a subprogram. (Hmm ... I wonder whether ninja > uses the shell ...) I don't personally care at all about Alpine, but > maybe somebody who does could dig a little harder and characterize > the problem there better. > > We probably should care about Alpine, because it's a good distro to use as the basis for Docker images, being fairly secure, very small, and booting very fast. I'll dig some more, and possibly set up a (docker based) buildfarm instance. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 2022-08-06 Sa 12:10, Andrew Dunstan wrote: > On 2022-08-06 Sa 11:25, Tom Lane wrote: >> Andrew Dunstan <andrew@dunslane.net> writes: >>> I came across this when I was working on setting up some Dockerfiles for >>> the buildfarm. Apparently LD_LIBRARY_PATH doesn't work on Alpine, at >>> least out of the box, as it uses a different linker, and "make check" >>> relies on it (or the moral equivalent) if "make install" hasn't been run. >> I did some quick googling on this point. We seem not to be the only >> project having linking issues on Alpine, and yet it does support >> LD_LIBRARY_PATH according to some fairly authoritative-looking pages, eg >> >> https://www.musl-libc.org/doc/1.0.0/manual.html >> >> I suspect the situation is similar to macOS, ie there is some limitation >> somewhere on whether LD_LIBRARY_PATH gets passed through. If memory >> serves, the problem on SIP-enabled Mac is that DYLD_LIBRARY_PATH is >> cleared upon invoking bash, so that we lose it anywhere that "make" >> invokes a shell to run a subprogram. (Hmm ... I wonder whether ninja >> uses the shell ...) I don't personally care at all about Alpine, but >> maybe somebody who does could dig a little harder and characterize >> the problem there better. >> >> > > We probably should care about Alpine, because it's a good distro to use > as the basis for Docker images, being fairly secure, very small, and > booting very fast. > > I'll dig some more, and possibly set up a (docker based) buildfarm instance. > > It appears that LD_LIBRARY_PATH is supported on Alpine but it fails if chained, which seems somewhat braindead. The regression tests get errors like this: +ERROR: could not load library "/app/buildroot/HEAD/pgsql.build/tmp_install/app/buildroot/HEAD/inst/lib/postgresql/libpqwalreceiver.so": Error loading shared library libpq.so.5: No such file or directory (needed by /app/buildroot/HEAD/pgsql.build/tmp_install/app/buildroot/HEAD/inst/lib/postgresql/libpqwalreceiver.so) If the check stage is delayed until after the install stage the tests pass. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com