Re: Race to build pg_isolation_regress in "make -j check-world" - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Race to build pg_isolation_regress in "make -j check-world"
Date
Msg-id 20171217040605.GA2421254@rfd.leadboat.com
Whole thread Raw
In response to [HACKERS] Race to build pg_isolation_regress in "make -j check-world"  (Noah Misch <noah@leadboat.com>)
Responses Re: Race to build pg_isolation_regress in "make -j check-world"  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Mon, Nov 06, 2017 at 12:07:52AM -0800, Noah Misch wrote:
> I've been enjoying the speed of parallel check-world, but I get spurious
> failures from makefile race conditions.  Commit c66b438 fixed the simple ones.
> More tricky is this problem of multiple "make" processes entering
> src/test/regress concurrently, which causes failures like these:
> 
>   gcc: error: pg_regress.o: No such file or directory
>   make[4]: *** [pg_isolation_regress] Error 1
> 
>   /bin/sh: ../../../src/test/isolation/pg_isolation_regress: Permission denied
>   make -C test_extensions check
>   make[2]: *** [check] Error 126
>   make[2]: Leaving directory `/home/nm/src/pg/backbranch/10/src/test/isolation'
> 
>   /bin/sh: ../../../../src/test/isolation/pg_isolation_regress: Text file busy
>   make[3]: *** [isolationcheck] Error 126
>   make[3]: Leaving directory `/home/nm/src/pg/backbranch/10/src/test/modules/snapshot_too_old'
> 
> This is reproducible since commit 2038bf4 or earlier; "make -j check-world"
> had worse problems before that era.  A workaround is to issue "make -j; make
> -j -C src/test/isolation" before the check-world.

Commit de0aca6 fixed that problem, but I now see similar trouble from multiple
"make" processes running "make -C contrib/test_decoding install" concurrently.
This is a risk for any directory named in an EXTRA_INSTALL variable of more
than one makefile.  Under the right circumstances, this would affect
contrib/hstore and others in addition to contrib/test_decoding.  That brings
me back to the locking idea:

> The problem of multiple "make" processes in a directory (especially src/port)
> shows up elsewhere.  In a cleaned tree, "make -j -C src/bin" or "make -j
> installcheck-world" will do it.  For more-prominent use cases, src/Makefile
> prevents this with ".NOTPARALLEL:" and building first the directories that are
> frequent submake targets.  Perhaps we could fix the general problem with
> directory locking; targets that call "$(MAKE) -C FOO" would first sleep until
> FOO's lock is available.  That could be tricky to make robust.

If one is willing to assume that a lock-holding process never crashes, locking
in a shell script is simple: mkdir to lock, rmdir to unlock.  I don't want to
assume that.  The bakery algorithm provides convenient opportunities for
checking whether the last locker crashed; I have attached a shell script
demonstrating this approach.  Better ideas?  Otherwise, I'll look into
integrating this design into the makefiles.

Thanks,
nm

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Custom compression methods
Next
From: Andres Freund
Date:
Subject: Re: Protect syscache from bloating with negative cache entries