Thread: Parallel make problem with git master
I am seeing the following compile problem with gmake -j2: /bin/sh ../../../config/install-sh -c -d '/usr/local/pgsql/lib' /bin/sh ../../../../config/install-sh -c -m 644 ./plpgsql.control '/usr/local/pgsql/share/extension' /bin/sh ../../../config/install-sh -c -m 644 ./plperl.control '/usr/local/pgsql/share/extension' /bin/sh ../../../../config/install-sh -c -m 644 ./plpgsql--1.0.sql '/usr/local/pgsql/share/extension' /bin/sh ../../../config/install-sh -c -m 644 ./plperl--1.0.sql '/usr/local/pgsql/share/extension' /bin/sh ../../../../config/install-sh -c -m 644 ./plpgsql--unpackaged--1.0.sql '/usr/local/pgsql/share/extension' /bin/sh ../../../config/install-sh -c -m 644 ./plperl--unpackaged--1.0.sql '/usr/local/pgsql/share/extension' /bin/sh ../../../../config/install-sh -c -d '/usr/local/pgsql/share/extension' /bin/sh ../../../config/install-sh -c -m 644 ./plperlu.control '/usr/local/pgsql/share/extension' mkdir: /usr/local/pgsql/share/extension: File exists /bin/sh ../../../config/install-sh -c -m 644 ./plperlu--1.0.sql '/usr/local/pgsql/share/extension' mkdir: /usr/local/pgsql/share/extension: File exists gmake[3]: *** [installdirs] Error 1 gmake[3]: Leaving directory `/usr/var/local/src/gen/pgsql/postgresql/src/pl/plpgsql/src' gmake[2]: *** [install] Error 2 gmake[2]: Leaving directory `/usr/var/local/src/gen/pgsql/postgresql/src/pl/plpgsql' gmake[1]: *** [install-plpgsql-recurse] Error 2 gmake[1]: *** Waiting for unfinished jobs.... /bin/sh ../../../config/install-sh -c -m 644 ./plperlu--unpackaged--1.0.sql '/usr/local/pgsql/share/extension' /bin/sh ../../../config/install-sh -c -d '/usr/local/pgsql/share/extension' /bin/sh ../../../config/install-sh -c -m 755 plperl.so '/usr/local/pgsql/lib/plperl.so' mkdir: /usr/local/pgsql/share/extension: File exists mkdir: /usr/local/pgsql/share/extension: File exists gmake[2]: *** [installdirs] Error 1 gmake[2]: Leaving directory `/usr/var/local/src/gen/pgsql/postgresql/src/pl/plperl' gmake[1]: *** [install-plperl-recurse] Error 2 gmake[1]: Leaving directory `/usr/var/local/src/gen/pgsql/postgresql/src/pl' gmake: *** [install-pl-recurse] Error 2 This only happens with parallel gmake and I think is caused by the assumption that "mkdir extension" will happen before any files are installed, which doesn't happen with parallel gmake. I have fixed the bug with the attached, applied patch which moves 'installdirs' to a dependency of the extension directory file install, rather than a more top-level target so the parallel gmake always creates the directory first. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/pl/plperl/GNUmakefile b/src/pl/plperl/GNUmakefile index 71e2cef..155b60f 100644 --- a/src/pl/plperl/GNUmakefile +++ b/src/pl/plperl/GNUmakefile @@ -74,14 +74,14 @@ Util.c: Util.xs $(PERL) $(perl_privlibexp)/ExtUtils/xsubpp -typemap $(perl_privlibexp)/ExtUtils/typemap $< >$@ -install: all installdirs install-lib install-data +install: all install-lib install-data installdirs: installdirs-lib $(MKDIR_P) '$(DESTDIR)$(datadir)/extension' uninstall: uninstall-lib uninstall-data -install-data: +install-data: installdirs @for file in $(addprefix $(srcdir)/, $(DATA)); do \ echo "$(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'"; \ $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'; \ diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile index d748ef6..52fbc1c 100644 --- a/src/pl/plpgsql/src/Makefile +++ b/src/pl/plpgsql/src/Makefile @@ -27,14 +27,14 @@ all: all-lib include $(top_srcdir)/src/Makefile.shlib -install: all installdirs install-lib install-data +install: all install-lib install-data installdirs: installdirs-lib $(MKDIR_P) '$(DESTDIR)$(datadir)/extension' uninstall: uninstall-lib uninstall-data -install-data: +install-data: installdirs @for file in $(addprefix $(srcdir)/, $(DATA)); do \ echo "$(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'"; \ $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'; \ diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile index baf22f3..86d8741 100644 --- a/src/pl/plpython/Makefile +++ b/src/pl/plpython/Makefile @@ -106,14 +106,14 @@ all: all-lib distprep: spiexceptions.h -install: all installdirs install-lib install-data +install: all install-lib install-data installdirs: installdirs-lib $(MKDIR_P) '$(DESTDIR)$(datadir)/extension' uninstall: uninstall-lib uninstall-data -install-data: +install-data: installdirs @for file in $(addprefix $(srcdir)/, $(DATA)); do \ echo "$(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'"; \ $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'; \ diff --git a/src/pl/tcl/Makefile b/src/pl/tcl/Makefile index c7797c6..faffd09 100644 --- a/src/pl/tcl/Makefile +++ b/src/pl/tcl/Makefile @@ -54,7 +54,7 @@ all: all-lib $(MAKE) -C modules $@ -install: all installdirs install-lib install-data +install: all install-lib install-data $(MAKE) -C modules $@ installdirs: installdirs-lib @@ -64,7 +64,7 @@ installdirs: installdirs-lib uninstall: uninstall-lib uninstall-data $(MAKE) -C modules $@ -install-data: +install-data: installdirs @for file in $(addprefix $(srcdir)/, $(DATA)); do \ echo "$(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'"; \ $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'; \
On Sat, 2011-03-05 at 18:33 -0500, Bruce Momjian wrote: > I am seeing the following compile problem with gmake -j2: > For what it's worth, I'm still seeing this problem too: http://archives.postgresql.org/pgsql-hackers/2010-12/msg00123.php I can reproduce it every time. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > For what it's worth, I'm still seeing this problem too: > http://archives.postgresql.org/pgsql-hackers/2010-12/msg00123.php > I can reproduce it every time. I think what is happening here is that make launches concurrent sub-jobs to do "make install" in each of interfaces/libpq and interfaces/ecpg, and the latter launches a sub-sub-job to do "make all" in interfaces/libpq, and make has no idea that these are duplicate sub-jobs so it actually tries to run both concurrently. Whereupon you get all sorts of fun failures. I'm not sure if there is any cure that's not worse than the disease. FWIW, doing a parallel "make all" works perfectly reliably for me. regards, tom lane
I wrote: > I think what is happening here is that make launches concurrent sub-jobs > to do "make install" in each of interfaces/libpq and interfaces/ecpg, > and the latter launches a sub-sub-job to do "make all" in > interfaces/libpq, and make has no idea that these are duplicate sub-jobs > so it actually tries to run both concurrently. Whereupon you get all > sorts of fun failures. I'm not sure if there is any cure that's not > worse than the disease. BTW, how many people here have read "Recursive Make Considered Harmful"? http://aegis.sourceforge.net/auug97.pdf Because what we're presently doing looks mighty similar to what he's saying doesn't work and can't be made to work. regards, tom lane
On Mon, Mar 7, 2011 at 10:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> I think what is happening here is that make launches concurrent sub-jobs >> to do "make install" in each of interfaces/libpq and interfaces/ecpg, >> and the latter launches a sub-sub-job to do "make all" in >> interfaces/libpq, and make has no idea that these are duplicate sub-jobs >> so it actually tries to run both concurrently. Whereupon you get all >> sorts of fun failures. I'm not sure if there is any cure that's not >> worse than the disease. > > BTW, how many people here have read "Recursive Make Considered Harmful"? > > http://aegis.sourceforge.net/auug97.pdf > > Because what we're presently doing looks mighty similar to what he's > saying doesn't work and can't be made to work. I'm not sure whether it makes sense to go that far or not. But I think it'd make sense to at least try this for the backend. It does seem pretty silly to have a Makefile in every single directory. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 03/07/2011 10:28 PM, Tom Lane wrote: > I wrote: >> I think what is happening here is that make launches concurrent sub-jobs >> to do "make install" in each of interfaces/libpq and interfaces/ecpg, >> and the latter launches a sub-sub-job to do "make all" in >> interfaces/libpq, and make has no idea that these are duplicate sub-jobs >> so it actually tries to run both concurrently. Whereupon you get all >> sorts of fun failures. I'm not sure if there is any cure that's not >> worse than the disease. > BTW, how many people here have read "Recursive Make Considered Harmful"? > > http://aegis.sourceforge.net/auug97.pdf > > Because what we're presently doing looks mighty similar to what he's > saying doesn't work and can't be made to work. > > Oh, yes, I read it a long time ago, before I started doing Postgres work. I recall vaguely thinking about it when I began with Postgres, but I thought people smarter than me had probably worked out the problems :-) (Working with people smarter than me is one of the things I like about Postgres work.) cheers andrew
Excerpts from Robert Haas's message of mar mar 08 10:38:29 -0300 2011: > On Mon, Mar 7, 2011 at 10:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wrote: > >> I think what is happening here is that make launches concurrent sub-jobs > >> to do "make install" in each of interfaces/libpq and interfaces/ecpg, > >> and the latter launches a sub-sub-job to do "make all" in > >> interfaces/libpq, and make has no idea that these are duplicate sub-jobs > >> so it actually tries to run both concurrently. Whereupon you get all > >> sorts of fun failures. I'm not sure if there is any cure that's not > >> worse than the disease. > > > > BTW, how many people here have read "Recursive Make Considered Harmful"? > > > > http://aegis.sourceforge.net/auug97.pdf > > > > Because what we're presently doing looks mighty similar to what he's > > saying doesn't work and can't be made to work. Yeah, I read it some years ago and considered it, but it was too disruptive or I was too new here, maybe both :-) The bit I looked at, at the time, was src/backend/mb/conversion_procs, because that was where the biggest hit on parallelization was taken (a single lib at a time -- the real time CPU usage chart clearly showed the problem. Not sure if that's still a problem). > I'm not sure whether it makes sense to go that far or not. But I > think it'd make sense to at least try this for the backend. It does > seem pretty silly to have a Makefile in every single directory. We already do that for the backend. Not exactly a single Makefile, but the dependencies are all declared in indirectly in src/backend/Makefile with the common.mk tricks. Where it doesn't work is in the other subdirs, c.f. the current problem with interfaces/libpq and interfaces/ecpg. It would be a lot more difficult to fix there, I think, but maybe I'm wrong. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Mar 8, 2011 at 9:07 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > The bit I looked at, at the time, was src/backend/mb/conversion_procs, > because that was where the biggest hit on parallelization was taken (a > single lib at a time -- the real time CPU usage chart clearly showed the > problem. Not sure if that's still a problem). I think it is, based on having noticed it spend what seemed like a disproportionate amount of time on that stuff when building, but I haven't actually tried to measure it. >> I'm not sure whether it makes sense to go that far or not. But I >> think it'd make sense to at least try this for the backend. It does >> seem pretty silly to have a Makefile in every single directory. > > We already do that for the backend. Not exactly a single Makefile, but > the dependencies are all declared in indirectly in src/backend/Makefile > with the common.mk tricks. I'm not sure that's really the same thing. It'd be interesting to redo it with just one Makefile and see whether it's faster. > Where it doesn't work is in the other subdirs, c.f. the current problem > with interfaces/libpq and interfaces/ecpg. It would be a lot more > difficult to fix there, I think, but maybe I'm wrong. Yeah, that's a problem. I wondered if supplying -p to mkdir would ameliorate the problem to some degree... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Alvaro Herrera <alvherre@commandprompt.com> writes: > Where it doesn't work is in the other subdirs, c.f. the current problem > with interfaces/libpq and interfaces/ecpg. It would be a lot more > difficult to fix there, I think, but maybe I'm wrong. Right, it's specifically the interdependence between ecpg and libpq that's causing the main symptom Jeff is complaining of. Although when I was trying "make -j12 install" starting from a clean tree yesterday, I did see at least one failure in the backend. It's all pretty timing-dependent --- if you look at the make output, you can clearly see that the same sub-make tasks get launched repeatedly due to various makefiles trying to force prerequisites in other parts of the tree to be up to date. (Which is exactly one of the band-aid fixes that Miller talks about.) If two such tasks get launched close enough to the same time, they both try to do the same work, and then you get failures like "ln" complaining that the target is already there, or "ar" complaining that somebody corrupted its output file, etc etc. I think Miller's analysis is dead on and we ought to think seriously about adopting his approach. Obviously this is not a small task... regards, tom lane
On mån, 2011-03-07 at 13:51 -0800, Jeff Davis wrote: > On Sat, 2011-03-05 at 18:33 -0500, Bruce Momjian wrote: > > I am seeing the following compile problem with gmake -j2: > > > > For what it's worth, I'm still seeing this problem too: > > http://archives.postgresql.org/pgsql-hackers/2010-12/msg00123.php > > I can reproduce it every time. Fixed.
On mån, 2011-03-07 at 22:28 -0500, Tom Lane wrote: > BTW, how many people here have read "Recursive Make Considered > Harmful"? > > http://aegis.sourceforge.net/auug97.pdf > > Because what we're presently doing looks mighty similar to what he's > saying doesn't work and can't be made to work. Yes, that's the better solution. It will probably just upset a lot of people's thinking. The main problem way back when I last considered this seriously was that it wasn't clear how many compilers don't support -o with -c. The paper doesn't offer a clear solution to that, but it might be that the problem is effectively gone now.