pg_regress: lookup shellprog in $PATH - Mailing list pgsql-hackers

From Gurjeet Singh
Subject pg_regress: lookup shellprog in $PATH
Date
Msg-id CABwTF4VBTKLORbKMOT8qi7W76vQ5hsV73Q+raPpXFveoiWUwBQ@mail.gmail.com
Whole thread Raw
Responses Re: pg_regress: lookup shellprog in $PATH
List pgsql-hackers
I'm trying to build Postgres using the Nix language and the Nix package manager on macOS (see [1]). After some work I was able to build, and even run Postgres. But `make check` failed with the error

pg_regress: could not exec "sh": No such file or directory

The reason is that pg_regress uses execl() function to execute the shell recorded in its shellprog variable, whose value is provided using the SHELL variable via the makefile. Specifically, this error is because execl() function expects the full path of the executable as the first parameter, and it does _not_ perform a lookup in $PATH.

Using execl() in pg_regress has worked in the past, because the default value of SHELL used by GNUmake (and I presume other make implementations) is /bin/sh, and /bin/sh expected to be present on any Unix-like system.

But in Nixpkgs (the default and the central package repository of Nix/NixOS distro), they have chosen to patch GNU make, and turn the default value of SHELL from '/bin/sh' to just 'sh'; see their patch [2]. They did this because Nixpkgs consider any files outside the Nix Store (the path /nix/store/, by default) to be "impure". They want the packagers to use $PATH (consisting solely of paths that begin with /nix/store/...), to lookup their binaries and other files.

So when pg_regress tries to run a program (the postmaster, in this case), the execl() function complains that it could not find 'sh',  since there's no file ./sh in the directory where pg_regress is being run.

Please see attached the one-letter patch that fixes this problem. I have chosen to replace the execl() call with execlp(), which performs a lookup in $PATH, and finds the 'sh' to use for running the postmaster. This patch does _not_ cause 'make check' or any other failures when Postgres is built with non-Nix build tools available on macOS.

There is one other use of execl(), in pg_ctl.c, but that is safe from the behaviour introduced by Nixpkgs, since that call site uses the absolute path /bin/sh, and hence there's no ambiguity in where to look for the executable.

There are no previous uses of execlp() in Postgres, which made me rethink this patch. But I think it's safe to use execlp() since it's part of POSIX, and it's also supported by Windows (see [3]; they say the function name is "deprecated" but function is "supported" in the same paragraph!!).

There's one mention of execl in src/pl/plperl/ppport.h, and since it's a generated file, I believe now execlp also needs to be added to that list. But I'm not sure how to generate that file, or if it really needs to be generated and included in this patch; maybe the file is re-generated during a release process. Need advice on that.

GNU make's docs clearly explain (see [4]) the special handling of variable SHELL, and it seems impossible to pass this variable from an env variable down to the GNUmakefile of interest. The only alternative I see for us being able to pass a custom value via SHELL, is to detect and declare the SHELL variable in one of our higher-level files; and I don't think that'd be a good idea.

We could propose to Nixpkgs community that they stop patching make, and leave the default SHELL value alone. But I see very low likelihood of them accepting our arguments, or changing their ways.

It took many days of debugging, troubleshooting etc, to get to this root-cause. I first tried to coax autoconf, make, etc. to pass my custom SHELL through to pg_regress' makefile. Changing CONFIG_SHELL, or SHELL does not have any impact. Then I had to read the Nixpkgs code, and work out the archaic ways the packages are defined, and after much code wrangling I was able to find out that _they_ changed the default value of SHELL by patching the make sources.

The Nix language is not so bad, but the way it's used to write code in the Nix community leaves a lot to be desired; ad-hoc environment variable naming, polluting the built environment with all kinds of variables, almost non-existent comments, no standards on indentation, etc. These reasons made me decide to use the plain Nix language as much as possible, and not rely on Nixpkgs, whenever I can avoid it.

The Nixpkgs and NixOS distro includes all the supported versions of Postgres, so one would assume they would've also encountered, and solved, this problem. But they didn't. My best guess as to why, is, I believe they never bothered to run `make check` on their built binaries.

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Making Vars outer-join aware
Next
From: Nathan Bossart
Date:
Subject: Re: archive modules