Thread: Parallel make problem with git master

Parallel make problem with git master

From
Bruce Momjian
Date:
I am seeing the following compile problem with gmake -j2:

    /bin/sh ../../../config/install-sh -c -d '/usr/local/pgsql/lib'
    /bin/sh ../../../../config/install-sh -c -m 644 ./plpgsql.control '/usr/local/pgsql/share/extension'
    /bin/sh ../../../config/install-sh -c -m 644 ./plperl.control '/usr/local/pgsql/share/extension'
    /bin/sh ../../../../config/install-sh -c -m 644 ./plpgsql--1.0.sql '/usr/local/pgsql/share/extension'
    /bin/sh ../../../config/install-sh -c -m 644 ./plperl--1.0.sql '/usr/local/pgsql/share/extension'
    /bin/sh ../../../../config/install-sh -c -m 644 ./plpgsql--unpackaged--1.0.sql '/usr/local/pgsql/share/extension'
    /bin/sh ../../../config/install-sh -c -m 644 ./plperl--unpackaged--1.0.sql '/usr/local/pgsql/share/extension'
    /bin/sh ../../../../config/install-sh -c -d '/usr/local/pgsql/share/extension'
    /bin/sh ../../../config/install-sh -c -m 644 ./plperlu.control '/usr/local/pgsql/share/extension'
    mkdir: /usr/local/pgsql/share/extension: File exists
    /bin/sh ../../../config/install-sh -c -m 644 ./plperlu--1.0.sql '/usr/local/pgsql/share/extension'
    mkdir: /usr/local/pgsql/share/extension: File exists
    gmake[3]: *** [installdirs] Error 1
    gmake[3]: Leaving directory `/usr/var/local/src/gen/pgsql/postgresql/src/pl/plpgsql/src'
    gmake[2]: *** [install] Error 2
    gmake[2]: Leaving directory `/usr/var/local/src/gen/pgsql/postgresql/src/pl/plpgsql'
    gmake[1]: *** [install-plpgsql-recurse] Error 2
    gmake[1]: *** Waiting for unfinished jobs....
    /bin/sh ../../../config/install-sh -c -m 644 ./plperlu--unpackaged--1.0.sql '/usr/local/pgsql/share/extension'
    /bin/sh ../../../config/install-sh -c -d '/usr/local/pgsql/share/extension'
    /bin/sh ../../../config/install-sh -c -m 755  plperl.so '/usr/local/pgsql/lib/plperl.so'
    mkdir: /usr/local/pgsql/share/extension: File exists
    mkdir: /usr/local/pgsql/share/extension: File exists
    gmake[2]: *** [installdirs] Error 1
    gmake[2]: Leaving directory `/usr/var/local/src/gen/pgsql/postgresql/src/pl/plperl'
    gmake[1]: *** [install-plperl-recurse] Error 2
    gmake[1]: Leaving directory `/usr/var/local/src/gen/pgsql/postgresql/src/pl'
    gmake: *** [install-pl-recurse] Error 2

This only happens with parallel gmake and I think is caused by the
assumption that "mkdir extension" will happen before any files are
installed, which doesn't happen with parallel gmake.

I have fixed the bug with the attached, applied patch which moves
'installdirs' to a dependency of the extension directory file install,
rather than a more top-level target so the parallel gmake always creates
the directory first.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/pl/plperl/GNUmakefile b/src/pl/plperl/GNUmakefile
index 71e2cef..155b60f 100644
--- a/src/pl/plperl/GNUmakefile
+++ b/src/pl/plperl/GNUmakefile
@@ -74,14 +74,14 @@ Util.c: Util.xs
     $(PERL) $(perl_privlibexp)/ExtUtils/xsubpp -typemap $(perl_privlibexp)/ExtUtils/typemap $< >$@


-install: all installdirs install-lib install-data
+install: all install-lib install-data

 installdirs: installdirs-lib
     $(MKDIR_P) '$(DESTDIR)$(datadir)/extension'

 uninstall: uninstall-lib uninstall-data

-install-data:
+install-data: installdirs
     @for file in $(addprefix $(srcdir)/, $(DATA)); do \
       echo "$(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'"; \
       $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'; \
diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile
index d748ef6..52fbc1c 100644
--- a/src/pl/plpgsql/src/Makefile
+++ b/src/pl/plpgsql/src/Makefile
@@ -27,14 +27,14 @@ all: all-lib
 include $(top_srcdir)/src/Makefile.shlib


-install: all installdirs install-lib install-data
+install: all install-lib install-data

 installdirs: installdirs-lib
     $(MKDIR_P) '$(DESTDIR)$(datadir)/extension'

 uninstall: uninstall-lib uninstall-data

-install-data:
+install-data: installdirs
     @for file in $(addprefix $(srcdir)/, $(DATA)); do \
       echo "$(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'"; \
       $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'; \
diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile
index baf22f3..86d8741 100644
--- a/src/pl/plpython/Makefile
+++ b/src/pl/plpython/Makefile
@@ -106,14 +106,14 @@ all: all-lib
 distprep: spiexceptions.h


-install: all installdirs install-lib install-data
+install: all install-lib install-data

 installdirs: installdirs-lib
     $(MKDIR_P) '$(DESTDIR)$(datadir)/extension'

 uninstall: uninstall-lib uninstall-data

-install-data:
+install-data: installdirs
     @for file in $(addprefix $(srcdir)/, $(DATA)); do \
       echo "$(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'"; \
       $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'; \
diff --git a/src/pl/tcl/Makefile b/src/pl/tcl/Makefile
index c7797c6..faffd09 100644
--- a/src/pl/tcl/Makefile
+++ b/src/pl/tcl/Makefile
@@ -54,7 +54,7 @@ all: all-lib
     $(MAKE) -C modules $@


-install: all installdirs install-lib install-data
+install: all install-lib install-data
     $(MAKE) -C modules $@

 installdirs: installdirs-lib
@@ -64,7 +64,7 @@ installdirs: installdirs-lib
 uninstall: uninstall-lib uninstall-data
     $(MAKE) -C modules $@

-install-data:
+install-data: installdirs
     @for file in $(addprefix $(srcdir)/, $(DATA)); do \
       echo "$(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'"; \
       $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/extension'; \

Re: Parallel make problem with git master

From
Jeff Davis
Date:
On Sat, 2011-03-05 at 18:33 -0500, Bruce Momjian wrote:
> I am seeing the following compile problem with gmake -j2:
> 

For what it's worth, I'm still seeing this problem too:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg00123.php

I can reproduce it every time.

Regards,Jeff Davis



Re: Parallel make problem with git master

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> For what it's worth, I'm still seeing this problem too:
> http://archives.postgresql.org/pgsql-hackers/2010-12/msg00123.php
> I can reproduce it every time.

I think what is happening here is that make launches concurrent sub-jobs
to do "make install" in each of interfaces/libpq and interfaces/ecpg,
and the latter launches a sub-sub-job to do "make all" in
interfaces/libpq, and make has no idea that these are duplicate sub-jobs
so it actually tries to run both concurrently.  Whereupon you get all
sorts of fun failures.  I'm not sure if there is any cure that's not
worse than the disease.

FWIW, doing a parallel "make all" works perfectly reliably for me.
        regards, tom lane


Re: Parallel make problem with git master

From
Tom Lane
Date:
I wrote:
> I think what is happening here is that make launches concurrent sub-jobs
> to do "make install" in each of interfaces/libpq and interfaces/ecpg,
> and the latter launches a sub-sub-job to do "make all" in
> interfaces/libpq, and make has no idea that these are duplicate sub-jobs
> so it actually tries to run both concurrently.  Whereupon you get all
> sorts of fun failures.  I'm not sure if there is any cure that's not
> worse than the disease.

BTW, how many people here have read "Recursive Make Considered Harmful"?

http://aegis.sourceforge.net/auug97.pdf

Because what we're presently doing looks mighty similar to what he's
saying doesn't work and can't be made to work.
        regards, tom lane


Re: Parallel make problem with git master

From
Robert Haas
Date:
On Mon, Mar 7, 2011 at 10:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> I think what is happening here is that make launches concurrent sub-jobs
>> to do "make install" in each of interfaces/libpq and interfaces/ecpg,
>> and the latter launches a sub-sub-job to do "make all" in
>> interfaces/libpq, and make has no idea that these are duplicate sub-jobs
>> so it actually tries to run both concurrently.  Whereupon you get all
>> sorts of fun failures.  I'm not sure if there is any cure that's not
>> worse than the disease.
>
> BTW, how many people here have read "Recursive Make Considered Harmful"?
>
> http://aegis.sourceforge.net/auug97.pdf
>
> Because what we're presently doing looks mighty similar to what he's
> saying doesn't work and can't be made to work.

I'm not sure whether it makes sense to go that far or not.  But I
think it'd make sense to at least try this for the backend.  It does
seem pretty silly to have a Makefile in every single directory.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Parallel make problem with git master

From
Andrew Dunstan
Date:

On 03/07/2011 10:28 PM, Tom Lane wrote:
> I wrote:
>> I think what is happening here is that make launches concurrent sub-jobs
>> to do "make install" in each of interfaces/libpq and interfaces/ecpg,
>> and the latter launches a sub-sub-job to do "make all" in
>> interfaces/libpq, and make has no idea that these are duplicate sub-jobs
>> so it actually tries to run both concurrently.  Whereupon you get all
>> sorts of fun failures.  I'm not sure if there is any cure that's not
>> worse than the disease.
> BTW, how many people here have read "Recursive Make Considered Harmful"?
>
> http://aegis.sourceforge.net/auug97.pdf
>
> Because what we're presently doing looks mighty similar to what he's
> saying doesn't work and can't be made to work.
>
>             

Oh, yes, I read it a long time ago, before I started doing Postgres 
work. I recall vaguely thinking about it when I began with Postgres, but 
I thought people smarter than me had probably worked out the problems 
:-) (Working with people smarter than me is one of the things I like 
about Postgres work.)

cheers

andrew


Re: Parallel make problem with git master

From
Alvaro Herrera
Date:
Excerpts from Robert Haas's message of mar mar 08 10:38:29 -0300 2011:
> On Mon, Mar 7, 2011 at 10:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I wrote:
> >> I think what is happening here is that make launches concurrent sub-jobs
> >> to do "make install" in each of interfaces/libpq and interfaces/ecpg,
> >> and the latter launches a sub-sub-job to do "make all" in
> >> interfaces/libpq, and make has no idea that these are duplicate sub-jobs
> >> so it actually tries to run both concurrently.  Whereupon you get all
> >> sorts of fun failures.  I'm not sure if there is any cure that's not
> >> worse than the disease.
> >
> > BTW, how many people here have read "Recursive Make Considered Harmful"?
> >
> > http://aegis.sourceforge.net/auug97.pdf
> >
> > Because what we're presently doing looks mighty similar to what he's
> > saying doesn't work and can't be made to work.

Yeah, I read it some years ago and considered it, but it was too
disruptive or I was too new here, maybe both :-)

The bit I looked at, at the time, was src/backend/mb/conversion_procs,
because that was where the biggest hit on parallelization was taken (a
single lib at a time -- the real time CPU usage chart clearly showed the
problem.  Not sure if that's still a problem).

> I'm not sure whether it makes sense to go that far or not.  But I
> think it'd make sense to at least try this for the backend.  It does
> seem pretty silly to have a Makefile in every single directory.

We already do that for the backend.  Not exactly a single Makefile, but
the dependencies are all declared in indirectly in src/backend/Makefile
with the common.mk tricks.

Where it doesn't work is in the other subdirs, c.f. the current problem
with interfaces/libpq and interfaces/ecpg.  It would be a lot more
difficult to fix there, I think, but maybe I'm wrong.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Parallel make problem with git master

From
Robert Haas
Date:
On Tue, Mar 8, 2011 at 9:07 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> The bit I looked at, at the time, was src/backend/mb/conversion_procs,
> because that was where the biggest hit on parallelization was taken (a
> single lib at a time -- the real time CPU usage chart clearly showed the
> problem.  Not sure if that's still a problem).

I think it is, based on having noticed it spend what seemed like a
disproportionate amount of time on that stuff when building, but I
haven't actually tried to measure it.

>> I'm not sure whether it makes sense to go that far or not.  But I
>> think it'd make sense to at least try this for the backend.  It does
>> seem pretty silly to have a Makefile in every single directory.
>
> We already do that for the backend.  Not exactly a single Makefile, but
> the dependencies are all declared in indirectly in src/backend/Makefile
> with the common.mk tricks.

I'm not sure that's really the same thing.  It'd be interesting to
redo it with just one Makefile and see whether it's faster.

> Where it doesn't work is in the other subdirs, c.f. the current problem
> with interfaces/libpq and interfaces/ecpg.  It would be a lot more
> difficult to fix there, I think, but maybe I'm wrong.

Yeah, that's a problem.  I wondered if supplying -p to mkdir would
ameliorate the problem to some degree...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Parallel make problem with git master

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Where it doesn't work is in the other subdirs, c.f. the current problem
> with interfaces/libpq and interfaces/ecpg.  It would be a lot more
> difficult to fix there, I think, but maybe I'm wrong.

Right, it's specifically the interdependence between ecpg and libpq
that's causing the main symptom Jeff is complaining of.  Although when
I was trying "make -j12 install" starting from a clean tree yesterday,
I did see at least one failure in the backend.  It's all pretty
timing-dependent --- if you look at the make output, you can clearly
see that the same sub-make tasks get launched repeatedly due to various
makefiles trying to force prerequisites in other parts of the tree to be
up to date.  (Which is exactly one of the band-aid fixes that Miller
talks about.)  If two such tasks get launched close enough to the same
time, they both try to do the same work, and then you get failures like
"ln" complaining that the target is already there, or "ar" complaining
that somebody corrupted its output file, etc etc.

I think Miller's analysis is dead on and we ought to think seriously
about adopting his approach.  Obviously this is not a small task...
        regards, tom lane


Re: Parallel make problem with git master

From
Peter Eisentraut
Date:
On mån, 2011-03-07 at 13:51 -0800, Jeff Davis wrote:
> On Sat, 2011-03-05 at 18:33 -0500, Bruce Momjian wrote:
> > I am seeing the following compile problem with gmake -j2:
> > 
> 
> For what it's worth, I'm still seeing this problem too:
> 
> http://archives.postgresql.org/pgsql-hackers/2010-12/msg00123.php
> 
> I can reproduce it every time.

Fixed.



Re: Parallel make problem with git master

From
Peter Eisentraut
Date:
On mån, 2011-03-07 at 22:28 -0500, Tom Lane wrote:
> BTW, how many people here have read "Recursive Make Considered
> Harmful"?
> 
> http://aegis.sourceforge.net/auug97.pdf
> 
> Because what we're presently doing looks mighty similar to what he's
> saying doesn't work and can't be made to work.

Yes, that's the better solution.  It will probably just upset a lot of
people's thinking.

The main problem way back when I last considered this seriously was that
it wasn't clear how many compilers don't support -o with -c.  The paper
doesn't offer a clear solution to that, but it might be that the problem
is effectively gone now.