Thread: `make check` doesn't pass on MacOS Catalina

`make check` doesn't pass on MacOS Catalina

From
Aleksander Alekseev
Date:
Hi hackers,

While trying to build PostgreSQL from source (master branch, 95c3a195) on a MacBook I discovered that `make check` fails:

```
============== removing existing temp instance ==============
============== creating temporary instance ==============
============== initializing database system ==============
============== starting postmaster ==============
sh: line 1: 33559 Abort trap: 6           "psql" -X postgres < /dev/null 2> /dev/null
sh: line 1: 33562 Abort trap: 6           "psql" -X postgres < /dev/null 2> /dev/null
...
sh: line 1: 33742 Abort trap: 6           "psql" -X postgres < /dev/null 2> /dev/null

pg_regress: postmaster did not respond within 60 seconds
Examine /Users/eax/projects/c/postgresql/src/test/regress/log/postmaster.log for the reason
make[1]: *** [check] Error 2
make: *** [check] Error 2
```

A little investigation revealed that pg_regres executes postgres like this:

```
PATH="/Users/eax/projects/c/postgresql/tmp_install/Users/eax/pginstall/bin:$PATH" DYLD_LIBRARY_PATH="/Users/eax/projects/c/postgresql/tmp_install/Users/eax/pginstall/lib" "postgres" -D "/Users/eax/projects/c/postgresql/src/test/regress/./tmp_check/data" -F -c "listen_addresses=" -k "/Users/eax/pgtmp/pg_regress-S34sXM" > "/Users/eax/projects/c/postgresql/src/test/regress/log/postmaster.log"
```

... and checks that it's online by executing:

```
PATH="/Users/eax/projects/c/postgresql/tmp_install/Users/eax/pginstall/bin:$PATH" DYLD_LIBRARY_PATH="/Users/eax/projects/c/postgresql/tmp_install/Users/eax/pginstall/lib" psql -X postgres
```

The last command fails with:

```
psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or directory. Is the server running locally and accepting connections on that socket?
```

This is because the actual path to the socket is:

```
~/pgtmp/pg_regress-S34sXM/.s.PGSQL.5432
```

While debugging this I also discovered that psql uses /usr/lib/libpq.5.dylib library, according to the `image list` command in LLDB. The library is provided with the system and can't be moved or deleted. In other words, it seems to ignore DYLD_LIBRARY_PATH. I've found an instruction [1] that suggests that this is a behavior of MacOS integrity protection and describes how it can be disabled. Sadly it made no difference in my case, psql still ignores DYLD_LIBRARY_PATH.

While I'm still in the progress of investigating this I just wanted to ask if anyone is developing on MacOS and observes anything similar and had any luck solving the problem? I tried to search through the mailing list but didn't find anything relevant. The complete script that reproduces the issue is attached. I'm using the same script on Ubuntu VM, where it works just fine.


--
Best regards,
Aleksander Alekseev
Attachment

Re: `make check` doesn't pass on MacOS Catalina

From
Tom Lane
Date:
Aleksander Alekseev <aleksander@timescale.com> writes:
> While trying to build PostgreSQL from source (master branch, 95c3a195) on a
> MacBook I discovered that `make check` fails:

This is the usual symptom of not having disabled SIP :-(.

If you don't want to do that, do "make install" before "make check".

            regards, tom lane



Re: `make check` doesn't pass on MacOS Catalina

From
Aleksander Alekseev
Date:
Hi Tom,

Many thanks, running "make install" before "make check" helped.


On Tue, Apr 20, 2021 at 6:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Aleksander Alekseev <aleksander@timescale.com> writes:
> > While trying to build PostgreSQL from source (master branch, 95c3a195) on a
> > MacBook I discovered that `make check` fails:
>
> This is the usual symptom of not having disabled SIP :-(.
>
> If you don't want to do that, do "make install" before "make check".
>
>                         regards, tom lane



-- 
Best regards,
Aleksander Alekseev



Re: `make check` doesn't pass on MacOS Catalina

From
Andrew Dunstan
Date:
On 4/20/21 11:02 AM, Tom Lane wrote:
> Aleksander Alekseev <aleksander@timescale.com> writes:
>> While trying to build PostgreSQL from source (master branch, 95c3a195) on a
>> MacBook I discovered that `make check` fails:
> This is the usual symptom of not having disabled SIP :-(.
>
> If you don't want to do that, do "make install" before "make check".
>
>             




FYI the buildfarm client has a '--delay-check' option that does exactly
this. It's useful on Alpine Linux as well as MacOS


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: `make check` doesn't pass on MacOS Catalina

From
Tom Lane
Date:
Xing GUO <higuoxing@gmail.com> writes:
> Thank you very much. I'm facing the same problem yesterday. May I
> suggest that document it in the installation guide on MacOS platform?

It is documented --- see last para under

https://www.postgresql.org/docs/current/installation-platform-notes.html#INSTALLATION-NOTES-MACOS

            regards, tom lane



Re: `make check` doesn't pass on MacOS Catalina

From
Gurjeet Singh
Date:
On Tue, Apr 20, 2021 at 9:06 AM Andrew Dunstan <andrew@dunslane.net> wrote:
>
> On 4/20/21 11:02 AM, Tom Lane wrote:
> > Aleksander Alekseev <aleksander@timescale.com> writes:
> >> While trying to build PostgreSQL from source (master branch, 95c3a195) on a
> >> MacBook I discovered that `make check` fails:
> > This is the usual symptom of not having disabled SIP :-(.
> >
> > If you don't want to do that, do "make install" before "make check".

> FYI the buildfarm client has a '--delay-check' option that does exactly
> this. It's useful on Alpine Linux as well as MacOS

I was trying to set up a buildfarm animal, and this exact problem lead
to a few hours of debugging and hair-pulling. Can the default
behaviour be changed in buildfarm client to perform `make check` only
after `make install`.

Current buildfarm client code looks something like:

    make();
    make_check() unless $delay_check;
    ... other steps ...
    make_install();
    ... other steps-2...
    make_check() if $delay_check;

There are no comments as to why one should choose to use --delay-check
($delay_check). This email, and the pointer to the paragraph buried in
the docs, shared by Tom, are the only two ways one can understand what
is causing this failure, and how to get around it.

Naive question: What's stopping us from rewriting the code as follows.
    make();
    make_install();
    make_check();
    ... other steps ...
    ... other steps-2...
    # or move make_check() call here

With a quick google search I could not find why --delay-check is
necessary on Apline linux, as well; can you please elaborate.

Best regards,
Gurjeet
http://Gurje.et



Re: `make check` doesn't pass on MacOS Catalina

From
Andrew Dunstan
Date:
On 2022-08-06 Sa 06:49, Gurjeet Singh wrote:
> On Tue, Apr 20, 2021 at 9:06 AM Andrew Dunstan <andrew@dunslane.net> wrote:
>> On 4/20/21 11:02 AM, Tom Lane wrote:
>>> Aleksander Alekseev <aleksander@timescale.com> writes:
>>>> While trying to build PostgreSQL from source (master branch, 95c3a195) on a
>>>> MacBook I discovered that `make check` fails:
>>> This is the usual symptom of not having disabled SIP :-(.
>>>
>>> If you don't want to do that, do "make install" before "make check".
>> FYI the buildfarm client has a '--delay-check' option that does exactly
>> this. It's useful on Alpine Linux as well as MacOS
> I was trying to set up a buildfarm animal, and this exact problem lead
> to a few hours of debugging and hair-pulling. Can the default
> behaviour be changed in buildfarm client to perform `make check` only
> after `make install`.
>
> Current buildfarm client code looks something like:
>
>     make();
>     make_check() unless $delay_check;
>     ... other steps ...
>     make_install();
>     ... other steps-2...
>     make_check() if $delay_check;
>
> There are no comments as to why one should choose to use --delay-check
> ($delay_check). This email, and the pointer to the paragraph buried in
> the docs, shared by Tom, are the only two ways one can understand what
> is causing this failure, and how to get around it.
>
> Naive question: What's stopping us from rewriting the code as follows.
>     make();
>     make_install();
>     make_check();
>     ... other steps ...
>     ... other steps-2...
>     # or move make_check() call here
>
> With a quick google search I could not find why --delay-check is
> necessary on Apline linux, as well; can you please elaborate.
>

I came across this when I was working on setting up some Dockerfiles for
the buildfarm. Apparently LD_LIBRARY_PATH doesn't work on Alpine, at
least out of the box, as it uses a different linker, and "make check"
relies on it (or the moral equivalent) if "make install" hasn't been run.

In general we want to run "make check" as soon as possible after running
"make" on the core code. That's why I didn't simply delay it
unconditionally.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: `make check` doesn't pass on MacOS Catalina

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 2022-08-06 Sa 06:49, Gurjeet Singh wrote:
>> There are no comments as to why one should choose to use --delay-check
>> ($delay_check). This email, and the pointer to the paragraph buried in
>> the docs, shared by Tom, are the only two ways one can understand what
>> is causing this failure, and how to get around it.

> In general we want to run "make check" as soon as possible after running
> "make" on the core code. That's why I didn't simply delay it
> unconditionally.

In general --- that is, on non-broken platforms --- "make check"
*should* work without a prior "make install".  I am absolutely
not in favor of changing the buildfarm so that it fails to detect
the problem if we break that.  But for sure it'd make sense to add
some comments to the wiki and/or sample config file explaining
that you need to set this option on systems X,Y,Z.

On macOS you need to use it if you haven't disabled SIP.
I don't have the details about any other problem platforms.

            regards, tom lane



Re: `make check` doesn't pass on MacOS Catalina

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I came across this when I was working on setting up some Dockerfiles for
> the buildfarm. Apparently LD_LIBRARY_PATH doesn't work on Alpine, at
> least out of the box, as it uses a different linker, and "make check"
> relies on it (or the moral equivalent) if "make install" hasn't been run.

I did some quick googling on this point.  We seem not to be the only
project having linking issues on Alpine, and yet it does support
LD_LIBRARY_PATH according to some fairly authoritative-looking pages, eg

https://www.musl-libc.org/doc/1.0.0/manual.html

I suspect the situation is similar to macOS, ie there is some limitation
somewhere on whether LD_LIBRARY_PATH gets passed through.  If memory
serves, the problem on SIP-enabled Mac is that DYLD_LIBRARY_PATH is
cleared upon invoking bash, so that we lose it anywhere that "make"
invokes a shell to run a subprogram.  (Hmm ... I wonder whether ninja
uses the shell ...)  I don't personally care at all about Alpine, but
maybe somebody who does could dig a little harder and characterize
the problem there better.

            regards, tom lane



Re: `make check` doesn't pass on MacOS Catalina

From
Andres Freund
Date:
Hi,

On 2022-08-06 11:25:09 -0400, Tom Lane wrote:
> (Hmm ... I wonder whether ninja uses the shell ...)

It does, but even if it didn't, we'd use a shell somewhere below perl or
pg_regress :(.

The meson build should still work without disabling SIP, I did the necessary
hackery to set up the rpath equivalent up relatively. So both the real install
target and the tmp_install/ should find libraries within themselves.

Greetings,

Andres Freund



Re: `make check` doesn't pass on MacOS Catalina

From
Andrew Dunstan
Date:
On 2022-08-06 Sa 11:25, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> I came across this when I was working on setting up some Dockerfiles for
>> the buildfarm. Apparently LD_LIBRARY_PATH doesn't work on Alpine, at
>> least out of the box, as it uses a different linker, and "make check"
>> relies on it (or the moral equivalent) if "make install" hasn't been run.
> I did some quick googling on this point.  We seem not to be the only
> project having linking issues on Alpine, and yet it does support
> LD_LIBRARY_PATH according to some fairly authoritative-looking pages, eg
>
> https://www.musl-libc.org/doc/1.0.0/manual.html
>
> I suspect the situation is similar to macOS, ie there is some limitation
> somewhere on whether LD_LIBRARY_PATH gets passed through.  If memory
> serves, the problem on SIP-enabled Mac is that DYLD_LIBRARY_PATH is
> cleared upon invoking bash, so that we lose it anywhere that "make"
> invokes a shell to run a subprogram.  (Hmm ... I wonder whether ninja
> uses the shell ...)  I don't personally care at all about Alpine, but
> maybe somebody who does could dig a little harder and characterize
> the problem there better.
>
>             


We probably should care about Alpine, because it's a good distro to use
as the basis for Docker images, being fairly secure, very small, and
booting very fast.

I'll dig some more, and possibly set up a (docker based) buildfarm instance.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: `make check` doesn't pass on MacOS Catalina

From
Andrew Dunstan
Date:
On 2022-08-06 Sa 12:10, Andrew Dunstan wrote:
> On 2022-08-06 Sa 11:25, Tom Lane wrote:
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>> I came across this when I was working on setting up some Dockerfiles for
>>> the buildfarm. Apparently LD_LIBRARY_PATH doesn't work on Alpine, at
>>> least out of the box, as it uses a different linker, and "make check"
>>> relies on it (or the moral equivalent) if "make install" hasn't been run.
>> I did some quick googling on this point.  We seem not to be the only
>> project having linking issues on Alpine, and yet it does support
>> LD_LIBRARY_PATH according to some fairly authoritative-looking pages, eg
>>
>> https://www.musl-libc.org/doc/1.0.0/manual.html
>>
>> I suspect the situation is similar to macOS, ie there is some limitation
>> somewhere on whether LD_LIBRARY_PATH gets passed through.  If memory
>> serves, the problem on SIP-enabled Mac is that DYLD_LIBRARY_PATH is
>> cleared upon invoking bash, so that we lose it anywhere that "make"
>> invokes a shell to run a subprogram.  (Hmm ... I wonder whether ninja
>> uses the shell ...)  I don't personally care at all about Alpine, but
>> maybe somebody who does could dig a little harder and characterize
>> the problem there better.
>>
>>             
>
> We probably should care about Alpine, because it's a good distro to use
> as the basis for Docker images, being fairly secure, very small, and
> booting very fast.
>
> I'll dig some more, and possibly set up a (docker based) buildfarm instance.
>
>

It appears that LD_LIBRARY_PATH is supported on Alpine but it fails if
chained, which seems somewhat braindead. The regression tests get errors
like this:


+ERROR:  could not load library
"/app/buildroot/HEAD/pgsql.build/tmp_install/app/buildroot/HEAD/inst/lib/postgresql/libpqwalreceiver.so":
Error loading shared library libpq.so.5: No such file or directory
(needed by
/app/buildroot/HEAD/pgsql.build/tmp_install/app/buildroot/HEAD/inst/lib/postgresql/libpqwalreceiver.so)


If the check stage is delayed until after the install stage the tests pass.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com