Re: BUG #15525: Build failures when compiling Postgres with Make parallelization - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #15525: Build failures when compiling Postgres with Make parallelization
Date
Msg-id 28417.1543458718@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #15525: Build failures when compiling Postgres with Make parallelization  (Jack Kelly <jack@jackkelly.name>)
Responses Re: BUG #15525: Build failures when compiling Postgres with Make parallelization  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-bugs
Jack Kelly <jack@jackkelly.name> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> BTW, are you using the Apple-supplied make, or some other version?
>> In the past we've had to fight with parallelism bugs in old gmake
>> versions ...

> Whatever nixpkgs asks for, which is probably gnumake, and it looks like
> nixpkgs has gnumake 4.2.1.

Hmm.  Maybe the critical combination is macOS plus a non-Apple
version of gmake?  Doesn't make a lot of sense ...

> A failing build log is attached to the nixpkgs GH issue at
> https://github.com/NixOS/nixpkgs/files/2617687/build.log I note that it
> calls `ar rcs libpgtypes.a ...` multiple times during the build, and I
> speculate that these `ar` invocations start racing each other.

After staring at that for awhile, I don't think that this is a bug in the
PG makefiles.  It looks like maybe it could be a clock skew problem.
You can see that the pgtypeslib build is completing, and then ecpglib
does submake-pgtypeslib which should find nothing to do, and indeed
mostly it thinks it has nothing to do --- except it wants to rebuild
libpgtypes.a.  Which makes no sense, because that has the exact same
dependencies as libpgtypes.so, which is not getting rebuilt.  And
concurrently, exactly the same thing is happening with libpq.a, but
not libpq.so.

... and after contemplating my navel for awhile more, I believe
I understand the problem.  APFS has sub-second file timestamp resolution,
which doesn't seem to be exposed in Apple's version of "ls", but you can
find it out from stat(2).  And what I'm seeing is that "ranlib" is
truncating the timestamp of its output file to a one-second boundary:

$ ls -ltr
...
-rw-r--r--  1 tgl  admin   25888 Nov  2 11:37 interval.c
-rw-r--r--  1 tgl  admin   20692 Nov  2 11:37 timestamp.c
-rw-r--r--  1 tgl  admin  210640 Nov 28 21:04 libpgtypes.a
-rw-r--r--  1 tgl  admin   43232 Nov 28 21:04 numeric.o
-rw-r--r--  1 tgl  admin   18688 Nov 28 21:04 datetime.o
-rw-r--r--  1 tgl  admin    5916 Nov 28 21:04 common.o
-rw-r--r--  1 tgl  admin   76584 Nov 28 21:04 dt_common.o
...

$ ~/a.out numeric.o
mtime - Actual: 1543457047.155932
atime - Actual: 1543457047.651358
$ ~/a.out libpgtypes.a
mtime - Actual: 1543457047.000000
atime - Actual: 1543457047.000000

(a.out is a stupid little program I made to print out the extended
timespec fields from stat(2).)

This is observable fact.  Also observable fact is that Apple's gmake
does not think libpgtypes.a needs to be rebuilt in this situation,
which implies that it does its work with seconds-truncated file
timestamps.  Where I'm speculating a bit is to guess that nix's
version of gmake thinks "whee, this filesystem has nanosecond
timestamps, so I'll believe them".  But given these facts, it's
not much of a leap to conclude that nix's gmake is rebuilding the .a
files based on them apparently being older than their inputs.

Recommendations:

1. File a bug with Apple to tell them it's not nice that ranlib
produces a file that appears older than its input files.

2. Pending some action from Apple, nix's build of gmake should not
trust sub-second timestamps on Darwin.

I suppose that this could be worked around with something like

 ifndef haslibarule
 $(stlib): $(OBJS) | $(SHLIB_PREREQS)
     rm -f $@
     $(LINK.static) $@ $^
     $(RANLIB) $@
+    touch $@
 endif #haslibarule
 
but ick.  Who's to say that ranlib is the only tool with such a problem?

            regards, tom lane


pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #15460: Error while creating index or constraint
Next
From: Thomas Munro
Date:
Subject: Re: BUG #15525: Build failures when compiling Postgres with Make parallelization