Thread: [HACKERS] Race to build pg_isolation_regress in "make -j check-world"

[HACKERS] Race to build pg_isolation_regress in "make -j check-world"

From
Noah Misch
Date:
I've been enjoying the speed of parallel check-world, but I get spurious
failures from makefile race conditions.  Commit c66b438 fixed the simple ones.
More tricky is this problem of multiple "make" processes entering
src/test/regress concurrently, which causes failures like these:

  gcc: error: pg_regress.o: No such file or directory
  make[4]: *** [pg_isolation_regress] Error 1

  /bin/sh: ../../../src/test/isolation/pg_isolation_regress: Permission denied
  make -C test_extensions check
  make[2]: *** [check] Error 126
  make[2]: Leaving directory `/home/nm/src/pg/backbranch/10/src/test/isolation'

  /bin/sh: ../../../../src/test/isolation/pg_isolation_regress: Text file busy
  make[3]: *** [isolationcheck] Error 126
  make[3]: Leaving directory `/home/nm/src/pg/backbranch/10/src/test/modules/snapshot_too_old'

This is reproducible since commit 2038bf4 or earlier; "make -j check-world"
had worse problems before that era.  A workaround is to issue "make -j; make
-j -C src/test/isolation" before the check-world.  This problem doesn't affect
src/test/regress/pg_regress.  Every top-level "make" or "make install",
including temp-install, builds pg_regress.

I tried fixing this by building src/test/isolation at the same times we run
install-temp.  Naturally, that didn't help installcheck-world.  It also caused
multiple "make" processes to enter src/port concurrently.  I could fix both
check-world and installcheck-world with the attached hack of building
src/test/isolation during every top-level build or install.

The problem of multiple "make" processes in a directory (especially src/port)
shows up elsewhere.  In a cleaned tree, "make -j -C src/bin" or "make -j
installcheck-world" will do it.  For more-prominent use cases, src/Makefile
prevents this with ".NOTPARALLEL:" and building first the directories that are
frequent submake targets.  Perhaps we could fix the general problem with
directory locking; targets that call "$(MAKE) -C FOO" would first sleep until
FOO's lock is available.  That could be tricky to make robust.

For now, I propose back-patching the attached, sad hack.  Better ideas?

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: Race to build pg_isolation_regress in "make -j check-world"

From
Noah Misch
Date:
On Mon, Nov 06, 2017 at 12:07:52AM -0800, Noah Misch wrote:
> I've been enjoying the speed of parallel check-world, but I get spurious
> failures from makefile race conditions.  Commit c66b438 fixed the simple ones.
> More tricky is this problem of multiple "make" processes entering
> src/test/regress concurrently, which causes failures like these:
> 
>   gcc: error: pg_regress.o: No such file or directory
>   make[4]: *** [pg_isolation_regress] Error 1
> 
>   /bin/sh: ../../../src/test/isolation/pg_isolation_regress: Permission denied
>   make -C test_extensions check
>   make[2]: *** [check] Error 126
>   make[2]: Leaving directory `/home/nm/src/pg/backbranch/10/src/test/isolation'
> 
>   /bin/sh: ../../../../src/test/isolation/pg_isolation_regress: Text file busy
>   make[3]: *** [isolationcheck] Error 126
>   make[3]: Leaving directory `/home/nm/src/pg/backbranch/10/src/test/modules/snapshot_too_old'
> 
> This is reproducible since commit 2038bf4 or earlier; "make -j check-world"
> had worse problems before that era.  A workaround is to issue "make -j; make
> -j -C src/test/isolation" before the check-world.

Commit de0aca6 fixed that problem, but I now see similar trouble from multiple
"make" processes running "make -C contrib/test_decoding install" concurrently.
This is a risk for any directory named in an EXTRA_INSTALL variable of more
than one makefile.  Under the right circumstances, this would affect
contrib/hstore and others in addition to contrib/test_decoding.  That brings
me back to the locking idea:

> The problem of multiple "make" processes in a directory (especially src/port)
> shows up elsewhere.  In a cleaned tree, "make -j -C src/bin" or "make -j
> installcheck-world" will do it.  For more-prominent use cases, src/Makefile
> prevents this with ".NOTPARALLEL:" and building first the directories that are
> frequent submake targets.  Perhaps we could fix the general problem with
> directory locking; targets that call "$(MAKE) -C FOO" would first sleep until
> FOO's lock is available.  That could be tricky to make robust.

If one is willing to assume that a lock-holding process never crashes, locking
in a shell script is simple: mkdir to lock, rmdir to unlock.  I don't want to
assume that.  The bakery algorithm provides convenient opportunities for
checking whether the last locker crashed; I have attached a shell script
demonstrating this approach.  Better ideas?  Otherwise, I'll look into
integrating this design into the makefiles.

Thanks,
nm

Attachment

Re: Race to build pg_isolation_regress in "make -j check-world"

From
Noah Misch
Date:
On Sat, Dec 16, 2017 at 08:06:05PM -0800, Noah Misch wrote:
> On Mon, Nov 06, 2017 at 12:07:52AM -0800, Noah Misch wrote:

> I now see similar trouble from multiple
> "make" processes running "make -C contrib/test_decoding install" concurrently.
> This is a risk for any directory named in an EXTRA_INSTALL variable of more
> than one makefile.  Under the right circumstances, this would affect
> contrib/hstore and others in addition to contrib/test_decoding.  That brings
> me back to the locking idea:
> 
> > The problem of multiple "make" processes in a directory (especially src/port)
> > shows up elsewhere.  In a cleaned tree, "make -j -C src/bin" or "make -j
> > installcheck-world" will do it.  For more-prominent use cases, src/Makefile
> > prevents this with ".NOTPARALLEL:" and building first the directories that are
> > frequent submake targets.  Perhaps we could fix the general problem with
> > directory locking; targets that call "$(MAKE) -C FOO" would first sleep until
> > FOO's lock is available.  That could be tricky to make robust.
> 
> If one is willing to assume that a lock-holding process never crashes, locking
> in a shell script is simple: mkdir to lock, rmdir to unlock.  I don't want to
> assume that.  The bakery algorithm provides convenient opportunities for
> checking whether the last locker crashed; I have attached a shell script
> demonstrating this approach.  Better ideas?  Otherwise, I'll look into
> integrating this design into the makefiles.

Performance has been the principal snare.  I value "make -j" being fast when
there's little to rebuild, but that shell script approach slowed an empty
build by 340% (GNU/Linux w/ SSD) to 2300% (Cygwin).  In a build having nothing
to do, merely adding a no-op wrapper around "make -C" (does nothing but
execvp() the real GNU make) slowed the build by over 10%[1].  To get what I
considered decent performance took several design changes:

1. Replace the shell script implementation of the bakery algorithm with a C
   program that issues fcntl(F_SETLK) on the target directory's makefile.

2. Stop re-entering widely-referenced directories like src/common and src/port
   dozens of times per build.  Instead, enter them before any "make -C", then
   assume they're built if $(MAKELEVEL) is nonzero.  This reduced the total
   number of "make -C" calls and lock acquisitions from 276 to 147.

3. Lock only directories known to be entered more than once per top-level
   target.  Preliminarily, this reduced the 147 lock acquisitions to 25.

I regret that (3) entails ongoing maintenance of a whitelist of such
directories; adding a "make -C" call can expand the list.  However, I can
automate verification of that whitelist.  If I abandon (3), an empty build in
an NFS-mounted source directory slows from 6.4s to 22s; with all three design
changes intact, it slows to 6.6s.  I think that justifies keeping (3).

I considered abandoning (1).  An empty build would then slow from 6.6s to 7.7s
on NFS-mounted sources, and it would slow from 2.1s to 6.8s on Cygwin.  (I
expect similar on MSYS.)  That is perhaps acceptable.  It would save a few
lines of code and perhaps avoid portability snags.  Unlike abandoning (3), I
would not expect long-term maintenance savings.  For systems with low PID
entropy[2], (1) also eliminates unreliable code for detecting that a lock
holder has died.  I plan to keep all three design changes.


The non-performance problems that arose were easy to fix:

a. In addition to avoiding overlapping "make install" invocations in each
   directory, we must ensure no directory is installed more than once per
   tmp_install.  Otherwise, the second "make install" could unlink or
   overwrite a file while another process runs a test that uses the file.  I
   fixed this with stamp files like we use elsewhere in Makefile.global.

b. Our "make -C" call graph has cycles[3].  To avoid deadlocks, I accumulated
   lock-holding PIDs in an environment variable.  If the holder of a desired
   lock appears in that variable, then that holder is sleeping until after the
   present process completes.  We can then proceed without a lock acquisition.

Comments?  Otherwise, I'll look at finishing the patch series.

Thanks,
nm


[1] I was surprised to see that GNU make is this efficient at determining
    there's nothing to rebuild.

[2] For example, Cygwin reuses PIDs as often as every 645 processes.  Also,
    Cygwin processes have both a Cygwin PID and a Windows PID, and kill() can
    match either PID.  The shell script relied on kill(PID, 0) to deduce that
    a lock holder had died.  We may have ended up needing a secondary factor,
    like process start time, on such systems.

[3] For example, src/backend depends on libpgport_srv.a, and src/port depends
    on submake-errcodes of src/backend.


Re: Race to build pg_isolation_regress in "make -j check-world"

From
Noah Misch
Date:
On Mon, Jan 01, 2018 at 05:39:56PM -0800, Noah Misch wrote:
> On Sat, Dec 16, 2017 at 08:06:05PM -0800, Noah Misch wrote:
> > On Mon, Nov 06, 2017 at 12:07:52AM -0800, Noah Misch wrote:
> 
> > I now see similar trouble from multiple
> > "make" processes running "make -C contrib/test_decoding install" concurrently.
> > This is a risk for any directory named in an EXTRA_INSTALL variable of more
> > than one makefile.  Under the right circumstances, this would affect
> > contrib/hstore and others in addition to contrib/test_decoding.  That brings
> > me back to the locking idea:
> > 
> > > The problem of multiple "make" processes in a directory (especially src/port)
> > > shows up elsewhere.  In a cleaned tree, "make -j -C src/bin" or "make -j
> > > installcheck-world" will do it.  For more-prominent use cases, src/Makefile
> > > prevents this with ".NOTPARALLEL:" and building first the directories that are
> > > frequent submake targets.  Perhaps we could fix the general problem with
> > > directory locking; targets that call "$(MAKE) -C FOO" would first sleep until
> > > FOO's lock is available.  That could be tricky to make robust.

> Performance has been the principal snare.  I value "make -j" being fast when
> there's little to rebuild, but that shell script approach slowed an empty
> build by 340% (GNU/Linux w/ SSD) to 2300% (Cygwin).  In a build having nothing
> to do, merely adding a no-op wrapper around "make -C" (does nothing but
> execvp() the real GNU make) slowed the build by over 10%[1].  To get what I
> considered decent performance took several design changes:

> 3. Lock only directories known to be entered more than once per top-level
>    target.  Preliminarily, this reduced the 147 lock acquisitions to 25.
> 
> I regret that (3) entails ongoing maintenance of a whitelist of such
> directories; adding a "make -C" call can expand the list.  However, I can
> automate verification of that whitelist.

As I developed this more, I found it dissatisfying.  To verify the whitelist,
one empties the whitelist, rebuilds each makefile target, and reassembles the
whitelist from the list of directories visited more than once within a target.
That's too expensive to run all the time (e.g. with every "make check").  I
could make a buildfarm animal turn red when the whitelist is out of date, but
I wouldn't enjoy discovering or fixing whitelist drift that way.  Hence, I'm
back to preferring fixes targeting the known problems:

- Fix conflicting writes in src/test/regress:
  https://postgr.es/m/flat/20181224034411.GA3224776%40rfd.leadboat.com

- Fix conflicting EXTRA_INSTALL, detailed above.  In the MAKELEVEL-0 part of
  the temp-install Makefile target, serially process all applicable
  EXTRA_INSTALL.  See attached patch.  On my usual development system, this
  adds <2s to "make check-world" and <0.1s to "make check" or "make -C
  contrib/earthdistance check".

To corroborate the sufficiency of those two patches, I ran 110 iterations of
"make -j40" and "make -j40 check-world" without encountering a failure
attributable to parallel make.  I also ran those targets at excess parallelism
(-j120) and audited the list of commands appearing more than once:

  make clean && make -j120 2>&1 | sort | uniq -c | sort -rn
  make -j120 check-world 2>&1   | sort | uniq -c | sort -rn

Unlike the locking method, this doesn't fix clean-tree "make -j -C src/bin" or
clean-tree "make -j installcheck-world".

Thanks,
nm

Attachment