Re: BUG #15525: Build failures when compiling Postgres with Make parallelization - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: BUG #15525: Build failures when compiling Postgres with Make parallelization |
Date | |
Msg-id | 28417.1543458718@sss.pgh.pa.us Whole thread Raw |
In response to | Re: BUG #15525: Build failures when compiling Postgres with Make parallelization (Jack Kelly <jack@jackkelly.name>) |
Responses |
Re: BUG #15525: Build failures when compiling Postgres with Make parallelization
|
List | pgsql-bugs |
Jack Kelly <jack@jackkelly.name> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> BTW, are you using the Apple-supplied make, or some other version? >> In the past we've had to fight with parallelism bugs in old gmake >> versions ... > Whatever nixpkgs asks for, which is probably gnumake, and it looks like > nixpkgs has gnumake 4.2.1. Hmm. Maybe the critical combination is macOS plus a non-Apple version of gmake? Doesn't make a lot of sense ... > A failing build log is attached to the nixpkgs GH issue at > https://github.com/NixOS/nixpkgs/files/2617687/build.log I note that it > calls `ar rcs libpgtypes.a ...` multiple times during the build, and I > speculate that these `ar` invocations start racing each other. After staring at that for awhile, I don't think that this is a bug in the PG makefiles. It looks like maybe it could be a clock skew problem. You can see that the pgtypeslib build is completing, and then ecpglib does submake-pgtypeslib which should find nothing to do, and indeed mostly it thinks it has nothing to do --- except it wants to rebuild libpgtypes.a. Which makes no sense, because that has the exact same dependencies as libpgtypes.so, which is not getting rebuilt. And concurrently, exactly the same thing is happening with libpq.a, but not libpq.so. ... and after contemplating my navel for awhile more, I believe I understand the problem. APFS has sub-second file timestamp resolution, which doesn't seem to be exposed in Apple's version of "ls", but you can find it out from stat(2). And what I'm seeing is that "ranlib" is truncating the timestamp of its output file to a one-second boundary: $ ls -ltr ... -rw-r--r-- 1 tgl admin 25888 Nov 2 11:37 interval.c -rw-r--r-- 1 tgl admin 20692 Nov 2 11:37 timestamp.c -rw-r--r-- 1 tgl admin 210640 Nov 28 21:04 libpgtypes.a -rw-r--r-- 1 tgl admin 43232 Nov 28 21:04 numeric.o -rw-r--r-- 1 tgl admin 18688 Nov 28 21:04 datetime.o -rw-r--r-- 1 tgl admin 5916 Nov 28 21:04 common.o -rw-r--r-- 1 tgl admin 76584 Nov 28 21:04 dt_common.o ... $ ~/a.out numeric.o mtime - Actual: 1543457047.155932 atime - Actual: 1543457047.651358 $ ~/a.out libpgtypes.a mtime - Actual: 1543457047.000000 atime - Actual: 1543457047.000000 (a.out is a stupid little program I made to print out the extended timespec fields from stat(2).) This is observable fact. Also observable fact is that Apple's gmake does not think libpgtypes.a needs to be rebuilt in this situation, which implies that it does its work with seconds-truncated file timestamps. Where I'm speculating a bit is to guess that nix's version of gmake thinks "whee, this filesystem has nanosecond timestamps, so I'll believe them". But given these facts, it's not much of a leap to conclude that nix's gmake is rebuilding the .a files based on them apparently being older than their inputs. Recommendations: 1. File a bug with Apple to tell them it's not nice that ranlib produces a file that appears older than its input files. 2. Pending some action from Apple, nix's build of gmake should not trust sub-second timestamps on Darwin. I suppose that this could be worked around with something like ifndef haslibarule $(stlib): $(OBJS) | $(SHLIB_PREREQS) rm -f $@ $(LINK.static) $@ $^ $(RANLIB) $@ + touch $@ endif #haslibarule but ick. Who's to say that ranlib is the only tool with such a problem? regards, tom lane
pgsql-bugs by date: